All things Sysadmin http://northernmost.org/blog Just another manic Monday Wed, 06 Feb 2013 22:04:36 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.1 How does MySQL hide the command line password in ps? http://northernmost.org/blog/how-does-mysql-hide-the-command-line-password-in-ps/ http://northernmost.org/blog/how-does-mysql-hide-the-command-line-password-in-ps/#comments Sat, 10 Mar 2012 10:34:51 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=425 Continue reading ]]> I saw this question asked today, and thought I’d write a quick post about it.
Giving passwords on the command line isn’t necessarily a fantastic idea – but you can sort of see where they’re coming from. Configuration files and environment variables are better, but just slightly. Security is a night mare!

But if you do decide to write an application which takes a password (or any other sensitive information) on the command line, you can prevent other users on the system from easily seeing it like this:

#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>

int main(int argc, char *argv[]){

int i = 0;
pid_t mypid = getpid();
if (argc == 1)
return 1;
printf("argc = %d and arguments are:\n", argc);
for (i ; i < argc ; i++)
printf("%d = %s\n" ,i, argv[i]);
printf("Replacing first argument with x:es... Now open another terminal and run: ps p %d\n", (int)mypid);
fflush(stdout);
memset(argv[1], 'x', strlen(argv[1]));
getc(stdin);
return 0;
}

A sample run looks like this:
$ ./pwhide abcd
argc = 2 and arguments are:
0 = ./pwhide
1 = abcd
Replacing first argument with x:es... Now run: ps p 27913

<In another terminal>
$ ps p 27913
PID TTY STAT TIME COMMAND
27913 pts/1 S+ 0:00 ./pwhide xxxx

In the interest of brevity, the above code isn't very portable - but it works on Linux and hopefully the point of it comes across. In other environments, such as FreeBSD, you have the setproctitle() syscall to do the dirty work for you. The key thing here is the overwriting of argv[1]
Because the size of argv[] is allocated when the program starts, you can't easily obfuscate the length of the password. I say easily - because of course there is a way.

]]>
http://northernmost.org/blog/how-does-mysql-hide-the-command-line-password-in-ps/feed/ 0
Font rendering – no more jealousy http://northernmost.org/blog/font-rendering-no-more-jealousy/ http://northernmost.org/blog/font-rendering-no-more-jealousy/#comments Tue, 28 Feb 2012 23:14:52 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=417 Continue reading ]]> I suppose this kind of content is what most people use twitter for these days. But since I’ve remained strong and stayed well away from that, I suppose I will have to be a tad retro and write a short blog post about it.
If you like me are an avid Fedora user, I’m sure you’ve thrown glances at colleague’s or friend’s Ubuntu machines and thought that there was something that was slightly different about the way it looked (aside from the obvious Gnome vs Unity differences). Shinier somehow… So had I, but I mainly dismissed it as a case of “the grass is always greener…”.

It turns out that the grass actually IS greener. Tonight I stumbled upon this. It’s a patched version of freetype. For what I assume are political reasons (free as in speech), Fedora ships a Freetype version without subpixel rendering. These patches fixes that and other things.

With a default configuration file of 407 lines, it’s quite extensible and configurable as well. Lucky, I quite like the default!

If you’re not entirely happy with the way your fonts look on Fedora – it’s well worth a look

]]>
http://northernmost.org/blog/font-rendering-no-more-jealousy/feed/ 0
Transactions and code testing http://northernmost.org/blog/transactions-and-code-testing/ http://northernmost.org/blog/transactions-and-code-testing/#comments Thu, 18 Aug 2011 01:56:49 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=402 Continue reading ]]> A little while ago I worked with a customer to migrate their DB from using MyISAM to InnoDB (something I definitely don’t mind doing!)
I set up a smaller test instance with all tables using the InnoDB engine as part of the testing. I instructed them to thoroughly test their application against this test instance and let me know if they identified any issues.

They reported back that everything seemed fine, and we went off to do the actual migration. Everything went according to plan and things seemed well.
After a while they started seeing some discrepancies in the stock portion of their application. The data didn’t add up with what they expected and stock levels seemed surprisingly high. A crontabbed program was responsible for periodically updating the stock count of products, so this was of course the first place I looked.
I ran it manually and looked at its output; it was very verbose and reported some 2000 products had been updated. But looking at the actual DB, this was far from the case.

Still having the test environment available, I ran it a few times against that and could see the com_update and com_insert counters being incremented, so I knew the queries were making it there. But the data remained intact. At this point, I had a gut feeling what was going on.. so to confirm this, I enabled query logging to see what was actually going on. It didn’t take me long to spot the problem. On the second line of the log, I saw this:

40 Query set autocommit=0

The program responsible for updating the stock levels was a python script using MySQLDB. I couldn’t see any traces of autocommit being set explicitly, so I went on assuming that it was off by default (which turned out to be correct). After adding cursor.commit()* after the relevant queries had been sent to the server, everything was back to normal as far as stock levels were concerned.
Since the code itself was seeing its own transaction, calls such as cursor.rowcount which the testers had relied on were all correct.

But the lesson here; when testing your software from a database point of view, don’t blindly trust what your code tells you it’s done, make sure it’s actually done it by verifying the data!
A lot of things can happen to data between your program and the platters. Its transaction can deadlock and be rolled back, it can be reading cached data, it can get lost in a crashing message queue, etc.

* As a rule of thumb, I’m rather against setting a blanket autocommit=1 in code, I’ve seen that come back to haunt developers in the past. I’m a strong advocate for explicit transaction handling.

]]>
http://northernmost.org/blog/transactions-and-code-testing/feed/ 0
Find out what is using your swap http://northernmost.org/blog/find-out-what-is-using-your-swap/ http://northernmost.org/blog/find-out-what-is-using-your-swap/#comments Fri, 27 May 2011 22:52:37 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=383 Continue reading ]]> Have you ever logged in to a server, ran `free`, seen that a bit of swap is used and wondered what’s in there? It’s usually not very indicative of anything, or even overly helpful knowing what’s in there, mostly it’s a curiosity thing.

Either way, starting from kernel 2.6.16, we can find out using smaps which can be found in the proc filesystem. I’ve written a simple bash script which prints out all running processes and their swap usage.
It’s quick and dirty, but does the job and can easily be modified to work on any info exposed in /proc/$PID/smaps
If I find the time and inspiration, I might tidy it up and extend it a bit to cover some more alternatives. The output is in kilobytes.

#!/bin/bash
# Get current swap usage for all running processes
# Erik Ljungstrom 27/05/2011
SUM=0
OVERALL=0
for DIR in `find /proc/ -maxdepth 1 -type d | egrep "^/proc/[0-9]"` ; do
PID=`echo $DIR | cut -d / -f 3`
PROGNAME=`ps -p $PID -o comm --no-headers`
for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'`
do
let SUM=$SUM+$SWAP
done
echo "PID=$PID - Swap used: $SUM - ($PROGNAME )"
let OVERALL=$OVERALL+$SUM
SUM=0

done
echo "Overall swap used: $OVERALL"

This will need to be ran as root for it to be able to gather accurate numbers. It will still work even if you don’t, but it will report 0 for any processes not owned by your user.
Needless to say, it’s Linux only. The output is ordered alphabetically according to your locale (which admittedly isn’t a great thing since we’re dealing with numbers), but you can easily apply your standard shell magic to the output. For instance, to find the process with most swap used, just run the script like so:

$ ./getswap.sh | sort -n -k 5
Don’t want to see stuff that’s not using swap at all?
$ ./getswap.sh | egrep -v "Swap used: 0" |sort -n -k 5

… and so on and so forth

]]>
http://northernmost.org/blog/find-out-what-is-using-your-swap/feed/ 40
Example using Cassandra with Thrift in C++ http://northernmost.org/blog/example-using-cassandra-with-thrift-in-c/ http://northernmost.org/blog/example-using-cassandra-with-thrift-in-c/#comments Sat, 21 May 2011 20:28:02 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=362 this, but due to the API changes, this is now outdated (it's still worth a read). So in the hope that nobody else will have to spend the better part of a day piecing things together to achieve even the most basic thing, here's an example which works with Cassandra 0.7 and Thrift 0.6. Continue reading ]]> Due to a very exciting, recently launched project at work, I’ve had to interface with Cassandra through C++ code. As anyone who has done this can testify, the API docs are vague at best, and there are very few examples out there. The constant API changes between 0.x versions and the fact that the Cassandra API has its docs and Thrift has its own, but there is nothing bridging the two isn’t helpful either.
So at the moment it is very much a case of dissecting header files and looking at implementation in the Thrift generated source files.

The only somewhat useful example of using Cassandra with C++ one can find online is this, but due to the API changes, this is now outdated (it’s still worth a read).

So in the hope that nobody else will have to spend the better part of a day piecing things together to achieve even the most basic thing, here’s an example which works with Cassandra 0.7 and Thrift 0.6.

First of all, create a new keyspace and a column family, using cassandra-cli:

[default@unknown] create keyspace nm_example;
c647b2c0-83e2-11e0-9eb2-e700f669bcfc
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] use nm_example;
Authenticated to keyspace: nm_example
[default@nm_example] create column family nm_cfamily with comparator=BytesType and default_validation_class=BytesType;
30466721-83e3-11e0-9eb2-e700f669bcfc
Waiting for schema agreement...
... schemas agree across the cluster
[default@nm_example]

Now go to the directory where you have cassandra installed and enter the interface/directory and run: thrift –gen cpp cassandra.thrift
This will create the gen-cpp/ directory. From this directory, you need to copy all files bar the Cassandra_server.skeleton.cpp one to wherever you intend to keep your sources.
Here’s some example code which inserts, retrieves, updates, retrieves and deletes keys:

#include "Cassandra.h"

#include <protocol/TBinaryProtocol.h>
#include <thrift/transport/TSocket.h>
#include <thrift/transport/TTransportUtils.h>

using namespace std;
using namespace apache::thrift;
using namespace apache::thrift::protocol;
using namespace apache::thrift::transport;
using namespace org::apache::cassandra;
using namespace boost;

static string host("127.0.0.1");
static int port= 9160;

int64_t getTS(){
/* If you're doing things quickly, you may want to make use of tv_usec
* or something here instead
*/
time_t ltime;
ltime=time(NULL);
return (int64_t)ltime;

}

int main(){
shared_ptr socket(new TSocket(host, port));
shared_ptr transport(new TFramedTransport(socket));
shared_ptr protocol(new TBinaryProtocol(transport));
CassandraClient client(protocol);

const string& key="your_key";

ColumnPath cpath;
ColumnParent cp;

ColumnOrSuperColumn csc;
Column c;

c.name.assign("column_name");
c.value.assign("Data for our key to go into column_name");
c.timestamp = getTS();
c.ttl = 300;

cp.column_family.assign("nm_cfamily");
cp.super_column.assign("");

cpath.column_family.assign("nm_cfamily");
/* This is required - thrift 'feature' */
cpath.__isset.column = true;
cpath.column="column_name";

try {
transport->open();
cout << "Set keyspace to 'dpdns'.." << endl;
client.set_keyspace("nm_example");

cout << "Insert key '" << key << "' in column '" << c.name << "' in column family '" << cp.column_family << "' with timestamp " << c.timestamp << "..." << endl;
client.insert(key, cp, c, org::apache::cassandra::ConsistencyLevel::ONE);

cout << "Retrieve key '" << key << "' from column '" << cpath.column << "' in column family '" << cpath.column_family << "' again..." << endl;
client.get(csc, key, cpath, org::apache::cassandra::ConsistencyLevel::ONE);
cout << "Value read is '" << csc.column.value << "'..." << endl;

c.timestamp++;
c.value.assign("Updated data going into column_name");
cout << "Update key '" << key << "' in column with timestamp " << c.timestamp << "..." << endl;
client.insert(key, cp, c, org::apache::cassandra::ConsistencyLevel::ONE);

cout << "Retrieve updated key '" << key << "' from column '" << cpath.column << "' in column family '" << cpath.column_family << "' again..." << endl;
client.get(csc, key, cpath, org::apache::cassandra::ConsistencyLevel::ONE);
cout << "Updated value is: '" << csc.column.value << "'" << endl;

cout << "Remove the key '" << key << "' we just retrieved. Value '" << csc.column.value << "' timestamp " << csc.column.timestamp << " ..." << endl;
client.remove(key, cpath, csc.column.timestamp, org::apache::cassandra::ConsistencyLevel::ONE);

transport->close();
}
catch (NotFoundException &nf){
cerr << "NotFoundException ERROR: "<< nf.what() << endl;
}
catch (InvalidRequestException &re) {
cerr << "InvalidRequest ERROR: " << re.why << endl;
}
catch (TException &tx) {
cerr << "TException ERROR: " << tx.what() << endl;
}

return 0;
}

Say we've called the file cassandra_example.cpp, and you have the files mentioned above in the same directory, you can comile things like this:

$ g++ -lthrift -Wall cassandra_example.cpp cassandra_constants.cpp Cassandra.cpp cassandra_types.cpp -o cassandra_example
$ ./cassandra_example
Set keyspace to 'nm_example'..
Insert key 'your_key' in column 'column_name' in column family 'nm_cfamily' with timestamp 1306008338...
Retrieve key 'your_key' from column 'column_name' in column family 'nm_cfamily' again...
Value read is 'Data for our key to go into column_name'...
Update key 'your_key' in column with timestamp 1306008339...
Retrieve updated key 'your_key' from column 'column_name' in column family 'nm_cfamily' again...
Updated value is: 'Updated data going into column_name'
Remove the key 'your_key' we just retrieved. Value 'Updated data going into column_name' timestamp 1306008339 ...

As my WP template isn't very suitable for code, I've put up the cpp file here for download
Another thing worth mentioning is Padraig O'Sullivan's libcassandra, which may or may not be worth a look depending on what you want to do and what versions of Thrift and Cassandra you're tied to.

]]>
http://northernmost.org/blog/example-using-cassandra-with-thrift-in-c/feed/ 4
Site slow after scaling out? Yeah, possibly! http://northernmost.org/blog/site-slow-after-scaling-out-yeah-possibly/ http://northernmost.org/blog/site-slow-after-scaling-out-yeah-possibly/#comments Tue, 29 Mar 2011 22:25:40 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=333 Continue reading ]]> Every now and then, we have customers who outgrow their single server setup. The next natural step is of course splitting the web layer from the DB layer. So they get another server, and move the database to that.

So far so good! A week or so later, we often get the call “Our page load time is higher now than before the upgrade! We’ve got twice as much hardware, and it’s slower! You have broken it!”
It’s easy to see where they’re coming from. It makes sense, right?

That is until you factor in the newly introduced network topology! Today it’s not unusual (that’s not to say it’s acceptable or optimal) for your average
wordpress/drupal/joomla/otherspawnofsatan site to run 40-50 queries per page load. Quite often even more!

Based on a tcpdump session of a reasonably average query (if there is such a thing), connecting to a server, authenticating, sending a query and receiving a 5 row result set of 1434 bytes yields 25 packets being sent between my laptop and a remote DB server on the same wired, non-congested network. A normal, average latency of TCP/IP over Ethernet is ~0.2 ms for the size of packets we’re talking here.
So, doing the maths, you’re seeing 25*0.2*50= 250ms in just network latency per page load for your SQL queries. This is obviously a lot more than you see over a local UNIX socket.

This is inevitable, laws of physics. It is nothing you, your sysadmin and/or your hosting company can do anything about. There may however be something your developer can do about the amount of queries!
You also shouldn’t confuse response-times with availability. Your response times may be slower, but you can (hopefully) serve a lot more users with this setup!

Sure, there are technologies out there which have considerably less latency than ethernet, but they come with quite the price-tag, and there are more often than not quite a few avenues to go down before it makes sense to start looking at that kind of thing.

You could also potentially looking at running the full stack on both machines using master/master replication for your DBs, and load balance your front-ends and have them both read locally, but only write to one node at a time! That kind of DB scenario is something fairly easily set up using mmm for MySQL. But in my experience, this often ends up more costly and potentially introducing more complexities than it solves.
I’m an avid advocate for keeping server roles separate as much as possible!

]]>
http://northernmost.org/blog/site-slow-after-scaling-out-yeah-possibly/feed/ 0
A look at MySQL 5.5 semi synchronous replication http://northernmost.org/blog/a-look-at-mysql-5-5-semi-synchronous-replication/ http://northernmost.org/blog/a-look-at-mysql-5-5-semi-synchronous-replication/#comments Sat, 09 Oct 2010 20:19:30 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=288 Continue reading ]]> Now that MySQL 5.5 is in RC, I decided to have a look at the semi synchronous replication. It’s easy to get going, and from my very initial tests appear to be working a treat.

This mode of replication is called semi synchronous due to the fact that it only guarantees that at least one of the slaves have written the transaction to disk in its relay log, not actually committed it to its data files. It guarantees that the data exists by some means somewhere, but not that it’s retrievable through a MySQL client.

Semi sync is available as a plugin, and if you compile from source, you’ll need to do –with-plugins=semisync….
So far, the semisync plugin can only be built as a dynamic module, so you’ll need to install it once you’ve got your instance up and running. To do this, you do as with any other plugin:
install plugin rpl_semi_sync_master soname 'semisync_master.so';
install plugin rpl_semi_sync_slave soname 'semisync_slave.so';

You might get an 1126 error and a message saying “Can’t open shared library..”, then you most likely need to set the plugin_dir variable in my.cnf and give MySQL a restart.
If you’re using a master/slave pair, you obviously won’t need to load both modules as above. You load the slave one on your slave, and the master one on your master. Once you’ve done this, you’ll have entries for these modules in the mysql.plugin table.
When you have confirmed that you do, you can safely add the pertinent variables to your my.cnf, the values I used (in addition to the normal replication settings) for my master/master sandboxes were:
plugin_dir=/opt/mysql-5.5.6-rc/lib/mysql/plugin/
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=10000
rpl_semi_sync_slave_enabled=1
rpl_semi_sync_master_trace_level=64
rpl_semi_sync_slave_trace_level=64
rpl_semi_sync_master_wait_no_slave=1

Note that you probably won’t want to use these values for _trace_level in production due to the verbosity in the log! I just enabled these while testing.
Also note that the timeout is in milliseconds.
You can also set these on the fly with SET GLOBAL (thanks Oracle!), just make sure the slave is stopped before doing this, as it needs to be enabled during the handshake with the master for the semisync to kick in.

The timeout is the amount of time the master will lock and wait for a slave to acknowledge the write before giving up on the whole idea of semi synchronous operation and continue as normal.
If you want to monitor this, you can use the status variable Rpl_semi_sync_master_status which is set to Off when this happens.
If this condition should be avoided altogether, you would need to set a large enough value for the timeout and a low enough monitoring threshold as there doesn’t seem to be a way to force MySQL to wait forever for a slave to appear.

If you’re running an automated failover setup, you’ll want to set the timeout higher than your heartbeat, so ensuring no committed data is lost. Then you might also want to set the timeout considerably lower initially on the passive master so that you don’t end up waiting on the master we know is unhealthy and have just failed over from.

Before implementing this in production, I would strongly recommend running a few performance tests against your setup as this will slow things down considerably for some workloads. Each transaction has to be written to the binlog, read over the wire and written to the relay log, and then lastly flushed to disk before each DML statement returns. You will almost definitely benefit in batching up queries into larger transactions rather than using the default auto commit mode as this will increase the frequency of the steps.
Update: Even though the manual clearly states that the event has to be flushed to disk, this doesn’t actually appear to be the case (see comments). The above still stands, but the impact may not be as great as first thought

When I find the time, I will run some benchmarks on this.

Lastly, please note that this is written while MySQL 5.5 is still in release candidate stage, so while unlikely, things are subject to change. So please be mindful of this in future comments.

]]>
http://northernmost.org/blog/a-look-at-mysql-5-5-semi-synchronous-replication/feed/ 11
GlusterFS init script and Puppet http://northernmost.org/blog/glusterfs-init-script-and-puppet/ http://northernmost.org/blog/glusterfs-init-script-and-puppet/#comments Mon, 09 Aug 2010 20:43:49 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=278 Continue reading ]]> The other day I had quite the head scratcher. I was setting up a new environment for a customer which included the usual suspects in a LAMP stack spread across a few virtual machines in an ESXi cluster.
As the project is quite volatile in terms of requirements, amount of servers, server roles, location etc. I decided to start off using Puppet to make my life easier further down the road.

I got most of it set up, and got started on writing up the glusterfs Puppet module. Fairly straight forward, a few directories, configuration files and a mount point. Then I came to the Service declaration, and of course we want this to be running at all times, so I went on and wrote:

service { "glusterfsd":
ensure => running,
enable => true,
hasrestart => true,
hasstatus => true,
}

expecting glusterfsd to be running shortly after I purposefully stopped it. But it didn’t. So I dove into puppet (Yay Ruby!) and deduced that the way it determines whether something is running or not is the return code of:
/sbin/service servicename status

So a quick look in the init script which ships with glusterfs-server shows that it calls the stock init function “status” on glusterfsd, which is perfectly fine, but then it doesn’t exit with the return code from this function, it simply runs out of scope and exits with the default value of 0.

So to get around this, I made a quick change to the init script and used the return code from the “status” function (/etc/rc.d/init.d/functions on RHEL5) and exited with $?, and Puppet had glusterfsd running within minutes.

I couldn’t find anything when searching for this, so I thought I’d make a note of it here.

]]>
http://northernmost.org/blog/glusterfs-init-script-and-puppet/feed/ 2
Legitimate emails being dropped by Spamassassin in RHEL5 http://northernmost.org/blog/legitimate-emails-being-dropped-by-spamassassin-in-rhel5/ http://northernmost.org/blog/legitimate-emails-being-dropped-by-spamassassin-in-rhel5/#comments Wed, 26 May 2010 13:21:08 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=266 Continue reading ]]> Over the past few months, an increasing number of customers have complained that their otherwise OK spam filters have started dropping an inordinate amount of legitimate emails.
The first reaction is of course to increase the score required to be filtered, but that just opens up for more spam. I looked in the quarantine on one of these servers, and ran a few of the legitimate ones through spamassassin in debug mode. I noticed one particular rule which was prevalent in the vast majority of the emails. Here’s an example:

...
[2162] dbg: learn: initializing learner
[2162] dbg: check: is spam? score=4.004 required=6
[2162] dbg: check: tests=FH_DATE_PAST_20XX,HTML_MESSAGE,SPF_HELO_PASS
...

4 is obviously quite a high score for an email whose only flaw is being in HTML. But FH_DATE_PAST_20XX caught my eye in all of the outputs. So to the rule files:

$ grep FH_DATE_PAST_20XX /usr/share/spamassassin/72_active.cf
##{ FH_DATE_PAST_20XX
header FH_DATE_PAST_20XX Date =~ /20[1-9][0-9]/ [if-unset: 2006]
describe FH_DATE_PAST_20XX The date is grossly in the future.
##} FH_DATE_PAST_20XX

Aha. This is a problem. With 50_scores.cf containing this:

$ grep FH_DATE_PAST /usr/share/spamassassin/50_scores.cf
score FH_DATE_PAST_20XX 2.075 3.384 3.554 3.188 # n=2

there’s no wonder emails are getting dropped! I guess this is a problem one can expect when running a distribution with packages 6 years old and neglect to frequently (or at least every once in a while) update the rules!

Luckily, this rule is gone altogether from RHEL6′s version of spamassassin.

]]>
http://northernmost.org/blog/legitimate-emails-being-dropped-by-spamassassin-in-rhel5/feed/ 0
Control groups in RHEL6 http://northernmost.org/blog/control-groups-in-rhel6/ http://northernmost.org/blog/control-groups-in-rhel6/#comments Thu, 13 May 2010 17:12:54 +0000 Erik Ljungstrom http://northernmost.org/blog/?p=253 Continue reading ]]> One new feature that I’m very enthusiastic about in RHEL6 is Control Groups (cgroup for short). It allows you to create groups and allocate resources to these. You can then bunch your applications into groups at your heart’s content.

It’s relatively simple to set up, and configuration can be done in two different ways. You can use the supplied cgset command, or if you’re accustomed to doing it the usual way when dealing with kernel settings, you can simply echo values into the pseudo-files under the control group.

Here’s a controlgroup in action:

[root@rhel6beta cgtest]# grep $$ /cgroup/gen/group1/tasks
1138
[root@rhel6beta cgtest]# cat /cgroup/gen/group1/memory.limit_in_bytes
536870912
[root@rhel6beta cgtest]# gcc alloc.c -o alloc && ./alloc
Allocating 642355200 bytes of RAM,,,
Killed
[root@rhel6beta cgtest]# echo `echo 1024*1024*1024| bc` > /cgroup/gen/group1/memory.limit_in_bytes
[root@rhel6beta cgtest]# ./alloc
Allocating 642355200 bytes of RAM,,,
Successfully allocated 642355200 bytes of RAM, captn' Erik...
[root@rhel6beta cgtest]#

The first line shows that the shell which launches the app is under the control of the cgroup group1, so subsequently all it’s child processes are subject to the same restrictions.

As you can also see, the initial memory limit in the group is 512M. Alloc is a simple C app I wrote which calloc()s 612M of RAM (for demonstrative purposes, I’ve disabled swap on the system altogether). At the first run, the kernel kills the process in the same way it would if the whole system had run out of memory. The kernel message also indicates that the control group ran out of memory, and not the system as a whole:

...
May 13 17:56:20 rhel6beta kernel: Memory cgroup out of memory: kill process 1710 (alloc) score 9861 or a child
May 13 17:56:20 rhel6beta kernel: Killed process 1710 (alloc)

Unfortunately it doesn’t indicate which cgroup the process belonged to. Maybe it should?

cgroups doesn’t just give you the ability to limit the amount of RAM, it has a lot of tuneables. You can even set swappiness on a per-group basis! You can limit the devices applications are allowed to access, you can freeze processes as well as tag outgoing network packets with a class ID, in case you want to do shaping or profiling on your network! Perfect if you want to prioritise SSH traffic over anything else, so you can comfortably worked even when your uplink is saturated. Furthermore, you can easily get an overview of memory usage, CPU accounting etc. of applications in any given group.

All this means you can clearly separate resources and to quite a large extent ensure that some applications won’t starve the whole system, or each other from resources. Very handy, no more waiting for half an hour for the swap to fill up and OOM to kick (and often chose the wrong PID) in when customer’s applications have run astray.

A much welcomed addition to RHEL!

]]>
http://northernmost.org/blog/control-groups-in-rhel6/feed/ 0