All things Sysadmin
Just another manic Monday

Legitimate emails being dropped by Spamassassin in RHEL5

May 26th, 2010 by Erik Ljungstrom

Over the past few months, an increasing number of customers have complained that their otherwise OK spam filters have started dropping an inordinate amount of legitimate emails.
The first reaction is of course to increase the score required to be filtered, but that just opens up for more spam. I looked in the quarantine on one of these servers, and ran a few of the legitimate ones through spamassassin in debug mode. I noticed one particular rule which was prevalent in the vast majority of the emails. Here’s an example:

...
[2162] dbg: learn: initializing learner
[2162] dbg: check: is spam? score=4.004 required=6
[2162] dbg: check: tests=FH_DATE_PAST_20XX,HTML_MESSAGE,SPF_HELO_PASS
...

4 is obviously quite a high score for an email whose only flaw is being in HTML. But FH_DATE_PAST_20XX caught my eye in all of the outputs. So to the rule files:

$ grep FH_DATE_PAST_20XX /usr/share/spamassassin/72_active.cf
##{ FH_DATE_PAST_20XX
header FH_DATE_PAST_20XX Date =~ /20[1-9][0-9]/ [if-unset: 2006]
describe FH_DATE_PAST_20XX The date is grossly in the future.
##} FH_DATE_PAST_20XX

Aha. This is a problem. With 50_scores.cf containing this:

$ grep FH_DATE_PAST /usr/share/spamassassin/50_scores.cf
score FH_DATE_PAST_20XX 2.075 3.384 3.554 3.188 # n=2

there’s no wonder emails are getting dropped! I guess this is a problem one can expect when running a distribution with packages 6 years old and neglect to frequently (or at least every once in a while) update the rules!

Luckily, this rule is gone altogether from RHEL6′s version of spamassassin.

Posted in Sundry sysadmin | No Comments »

Control groups in RHEL6

May 13th, 2010 by Erik Ljungstrom

One new feature that I’m very enthusiastic about in RHEL6 is Control Groups (cgroup for short). It allows you to create groups and allocate resources to these. You can then bunch your applications into groups at your heart’s content.

It’s relatively simple to set up, and configuration can be done in two different ways. You can use the supplied cgset command, or if you’re accustomed to doing it the usual way when dealing with kernel settings, you can simply echo values into the pseudo-files under the control group.

Here’s a controlgroup in action:

[root@rhel6beta cgtest]# grep $$ /cgroup/gen/group1/tasks
1138
[root@rhel6beta cgtest]# cat /cgroup/gen/group1/memory.limit_in_bytes
536870912
[root@rhel6beta cgtest]# gcc alloc.c -o alloc && ./alloc
Allocating 642355200 bytes of RAM,,,
Killed
[root@rhel6beta cgtest]# echo `echo 1024*1024*1024| bc` > /cgroup/gen/group1/memory.limit_in_bytes
[root@rhel6beta cgtest]# ./alloc
Allocating 642355200 bytes of RAM,,,
Successfully allocated 642355200 bytes of RAM, captn' Erik...
[root@rhel6beta cgtest]#

The first line shows that the shell which launches the app is under the control of the cgroup group1, so subsequently all it’s child processes are subject to the same restrictions.

As you can also see, the initial memory limit in the group is 512M. Alloc is a simple C app I wrote which calloc()s 612M of RAM (for demonstrative purposes, I’ve disabled swap on the system altogether). At the first run, the kernel kills the process in the same way it would if the whole system had run out of memory. The kernel message also indicates that the control group ran out of memory, and not the system as a whole:

...
May 13 17:56:20 rhel6beta kernel: Memory cgroup out of memory: kill process 1710 (alloc) score 9861 or a child
May 13 17:56:20 rhel6beta kernel: Killed process 1710 (alloc)

Unfortunately it doesn’t indicate which cgroup the process belonged to. Maybe it should?

cgroups doesn’t just give you the ability to limit the amount of RAM, it has a lot of tuneables. You can even set swappiness on a per-group basis! You can limit the devices applications are allowed to access, you can freeze processes as well as tag outgoing network packets with a class ID, in case you want to do shaping or profiling on your network! Perfect if you want to prioritise SSH traffic over anything else, so you can comfortably worked even when your uplink is saturated. Furthermore, you can easily get an overview of memory usage, CPU accounting etc. of applications in any given group.

All this means you can clearly separate resources and to quite a large extent ensure that some applications won’t starve the whole system, or each other from resources. Very handy, no more waiting for half an hour for the swap to fill up and OOM to kick (and often chose the wrong PID) in when customer’s applications have run astray.

A much welcomed addition to RHEL!

Posted in Sundry sysadmin | No Comments »

Boot loader not working in rhel6 beta under xen

April 22nd, 2010 by Erik Ljungstrom

Just a heads up I thought I’d share in the hope that it’ll save someone some time, when installing RHEL6 beta under Xen, be aware that pygrub currently can’t handle /boot being on ext4 (which is the default). So in order to run rhel6 under xen, ensure that you modify the partition layout during the installation process.

This turned out to be a real head scratcher for me, and initially I thought the problem was something else as Xen wasn’t being very helpful with error messages.

Hopefully there’ll be an update for this soon!

Posted in Uncategorized | No Comments »

Building Hiphop PHP gotcha

February 21st, 2010 by Erik Ljungstrom

Tonight I’ve delved into the world of Facebook’s HipHop for PHP. Let me early on point out that I’m not doing so because I believe that I will need it any time soon, but I am convinced  that I without a shadow of a doubt  will be approached by customers who think they do, and I rather not have opinions or advise against things I haven’t tried myself or at least have a very good understanding of.

Unfortunately I set about this task on an RHEL 5.4 box, and it hasn’t been a walk in the park. Quite a few dependencies were out of date or didn’t exist in the repositories, libicu, boost, onig, tbb etc.

Though, CMake did a good job of telling me what was wrong, so it wasn’t a huge deal, I just compiled the missing pieces from source and put them in $CMAKE_PREFIX_PATH. One thing CMake didn’t pick up on however, was that the flex version shipped with current RHEL is rather outdated. Once I thought I had everything configured, I set about the compilation, and my joy was swiftly abrupted by this:

[  3%] [FLEX][XHPScanner] Building scanner with flex /usr/bin/flex version 2.5.4
/usr/bin/flex: unknown flag '-'.  For usage, try /usr/bin/flex --help

Not entirely sure what it was actually doing here, I took the shortcut of replacing /usr/bin/flex with a shell script which just exited after putting $@ in a file in /tmp/ and re-ran `make`. Looking in the resulting file, this is the argument flex is given:

-C --header-file=scanner.lex.hpp -o/home/erik/dev/hiphop-php/src/third_party/xhp/xhp/scanner.lex.cpp /home/erik/dev/hiphop-php/src/third_party/xhp/xhp/scanner.l

To me that looks quite valid, and there’s certainly no single – in that command line.

Long story short, flex introduced –header-file in a relatively “recent” version (2.5.33 it seems, but I may be wrong on that one, doesn’t matter). Unlike most other programs (using getopt), it won’t tell you “Invalid option ‘–header-file’”. So after compiling a newer version of flex, I was sailing again.

Posted in Sundry sysadmin, Webservers | 1 Comment »

Development, just as important as dual NICs

February 13th, 2010 by Erik Ljungstrom

There is a popular saying which I find you can apply  to most things in life; “You get what you pay for”. Sadly, this does not seem to apply for software development in any way. You who know me know that I work for a reasonably sized hosting company in the upper market segment. We have thousands of servers and hundreds of customers, so after a while you get a quite decent overview of how things work and a vast arsenal of “stories from the trenches”.

So here’s a small tip; ensure that your developers know what they are doing! It will save you a lot of hassle and money in the long run.

Without having made a science out of it, I can confidently say that at the very least 95% of the downtime I see on a daily basis is due to faulty code in the applications running on our servers.

So after you’ve demanded dual power feeds to your rack, bonded NICs and a gazillion physical paths to your dual controller SAN, it would make sense to apply the same attitude towards your developers. After all, they are carbon based humans and are far more likely to break than your silicon NIC. Now unfortunately it is not as simple as “if I pay someone a lot of money and let them do their thing, I will get good solid code out of it”, so a great deal of due diligence is required in this part of your environment as well. I have seen more plain stupid things coming from 50k pa. people than I care to mention, and I have seen plain brilliant things coming out of college kids’ basements.

This is important not only from an availability point of view, it’s also about running cost. The amount of hardware in our data centers which is completely redundant, and would easily be made obsolete with a bit of code and database tweaking is frightening. So you think you may have cut a great deal when someone said they could build your e-commerce system in 3 months for 10k less than other people have quoted you. But in actual fact, all you have done is got someone to effectively re-brand a bloated, way too generic, stock framework/product which the developer has very little insight into and control over. Yes, it works if you “click here, there and then that button”, the right thing does appear on the screen. But only after executing hundreds of SQL queries, looking for your session in three different places, done four HTTP redirects, read five config files and included 45 other source files. Needless to say, those one-off 10k you think you have saved, will be swallowed in recurring hardware cost in no time. You have probably also severely limited your ability to scale things up in the future.

So in summary, don’t cheap out on your development but at the same time don’t think that throwing money at people will make them write good code. Ask someone else to look things over every now and then, even if it will cost you a little bit. Use the budget you were planning on spending on the SEO consultant. Let it take time.

Posted in misc | 1 Comment »

« Previous Entries