Development, just as important as dual NICs

There is a popular saying which I find you can apply  to most things in life; “You get what you pay for”. Sadly, this does not seem to apply for software development in any way. You who know me know that I work for a reasonably sized hosting company in the upper market segment. We have thousands of servers and hundreds of customers, so after a while you get a quite decent overview of how things work and a vast arsenal of “stories from the trenches”.

So here’s a small tip; ensure that your developers know what they are doing! It will save you a lot of hassle and money in the long run.

Without having made a science out of it, I can confidently say that at the very least 95% of the downtime I see on a daily basis is due to faulty code in the applications running on our servers.

So after you’ve demanded dual power feeds to your rack, bonded NICs and a gazillion physical paths to your dual controller SAN, it would make sense to apply the same attitude towards your developers. After all, they are carbon based humans and are far more likely to break than your silicon NIC. Now unfortunately it is not as simple as “if I pay someone a lot of money and let them do their thing, I will get good solid code out of it”, so a great deal of due diligence is required in this part of your environment as well. I have seen more plain stupid things coming from 50k pa. people than I care to mention, and I have seen plain brilliant things coming out of college kids’ basements.

This is important not only from an availability point of view, it’s also about running cost. The amount of hardware in our data centers which is completely redundant, and would easily be made obsolete with a bit of code and database tweaking is frightening. So you think you may have cut a great deal when someone said they could build your e-commerce system in 3 months for 10k less than other people have quoted you. But in actual fact, all you have done is got someone to effectively re-brand a bloated, way too generic, stock framework/product which the developer has very little insight into and control over. Yes, it works if you “click here, there and then that button”, the right thing does appear on the screen. But only after executing hundreds of SQL queries, looking for your session in three different places, done four HTTP redirects, read five config files and included 45 other source files. Needless to say, those one-off 10k you think you have saved, will be swallowed in recurring hardware cost in no time. You have probably also severely limited your ability to scale things up in the future.

So in summary, don’t cheap out on your development but at the same time don’t think that throwing money at people will make them write good code. Ask someone else to look things over every now and then, even if it will cost you a little bit. Use the budget you were planning on spending on the SEO consultant. Let it take time.

Posted in misc | 3 Comments

GlusterFS tcp_nodelay patch update

As mentioned in my previous post, I wrote a patch for GlusterFS to increase its performance when operating on many smaller files. Someone told me the other day that this functionality has been pushed to the git repository. Would have been good to have heard about this sooner…

So all of you who emailed me positive feedback and asked to make it a tuneable in the translator config (thanks!) – please check out  the above link to the git repository.

On another note, it seems as if they’re breaking away from having the protocol version bound to the release version, good progress in my opinion!

Posted in misc | Tagged , , | 3 Comments

Improving GlusterFS performance

I’ve had a closer look at glusterfs in the last few days following the release of version 2.0.1. We often get customers approaching us with web apps dealing with user generated content which needs to be uploaded. If you have two or more servers in a load balanced environment, you usually have a few options, an NFS/CIFS share on one of them (single point of failure – failover NFS is, well…), a SAN (expensive), MogileFS (good, but alas not application agnostic),  periodically rsync/tar | nc files between the nodes (messy, not application agnostic and slow), store files in a database (not ideal for a number of reasons). There are a few other approaches and combinations of the above, but neither is perfect. GlusterFS solves this. It’s fast, instant and redundant! 

I’ve got four machines set up, two acting as redundant servers. Since they’re effectively acting as a RAID 1, each write is done twice over the wire, but that’s kind of inevitable. They’re all connected in a private isolated gigabit network. When dealing with larger files (a la `cp yourfavouritedistro.iso /mnt/gluster`) the throughput is really good at around 20-25 MB/s leaving the client. CPU usage on the client doing the copy was in the realms of 20-25% on a dual core. Very good so far! 

Then I tried many frequent filesystem operations, untarring the 2.6.9 linux kernel from and onto the mount.  Not so brilliant! It took 23-24 minutes from start to finish. The 2.6.9 kernel contain 17477 files and the average size is just a few kilobytes. This is obviously a lot of smaller bursts of network traffic!

After seeing this, I dove into the source code to have a look, when I reached the socket code, I realised that the performance for smaller files would probably be improved by a lot if Nagle’s algorithm was disabled on the socket. Said and done, I added a few setsockopt()s and went to test. The kernel tree now extracted in 1m 20s!

Of course there’s always the drawback.. In this case it is that larger files take longer to transfer as the raw throughput is decreasing (kernel buffer is a lot faster than a cat5!). Copying a 620 MB ISO from local disk onto the mount takes 1.20 s with the vanilla version of GlusterFS, and 3m 34s with Nagle’s algorithm disabled. 

I’m not seeing any performance hit on sustained transfer of larger files, but at the moment I’m guessing I’m hitting another bottleneck before that becomes a problem, as it “in theory” should have a slight negative impact in this case.

If you want to have a look at it, you can find the patch here. Just download to the source directory and do patch -p1 < glusterfs-2.0.1-patch-erik.diff  and then proceed to build as normal.

Until I’ve done some more testing on it and received some feedback, I won’t bother making it a tuneable in the vol-file just in case it’d be wasted effort!

Posted in misc | Tagged , , | 5 Comments

Don’t fix, work around – MySQL

I attended the MySQL EMEA conference last thursday where I enjoyed a talk from Ivan Zoratti titled “Scaling Up, Scaling Out, Virtualization – What should you do with MySQL?”

They have changed their minds quite a bit. Virtualisation in production is no longer a solid no-no according to them (a lot of people would argue). Solaris containers, anyone?

As most of us know by now, MySQL struggles to utilise multiple cores efficiently. This has been the case for quite some time by now, and people like Google and Percona has grown tired of waiting for MySQL to fix it.

Sun decided to not go down the route of reviewing and accepting the patches, but are now suggesting – are you sitting down? – running multiple instances on the same hardware.
I’m not against this from a technical point of view as it currently actually does improve performance on multiple-core-multiple-disk systems (for an unpatched version) for some workloads, but the fact that they have gone to openly and officially suggest workarounds to their own problem rather than fixing the source of the problem is disturbing.

Granted, I suppose it makes sense to suggest larger boxes if you’ve been bought by a big-iron manufacturer. Also, I should be fair and note that Ivan at least didn’t say scaling out was a negative thing and that it’s still a good option.

If anyone asks me though, I think I’ll keep scaling outwards and use the more sensible version of MySQL

Posted in MySQL | Tagged , , | 3 Comments

Flush bash_history after each command

If you, like me, often work in a lot of terminals on a lot of servers, or even a lot of terminals on the same one, you may recognise the frustration of a lost bash history.
I don’t always gracefully log out of my sessions, so every so often my ~/.bash_history isn’t written and all my flashy commands are lost (the history buffer is only committed when you log out, everything that you see in `history` is not actually written to disk). I quite often find myself rewriting the same one-liners or long option list just because I closed my konsole or SecureCRT window without first logging out of all the sessions properly.

So I put some effort into finding a solution to this, and whilst reading through the bash manpage, I saw PROMPT_COMMAND. *pling*
export PROMPT_COMMAND='history -a'

To quote the manpage: “If set, the value is executed as a command prior to issuing each primary prompt.”
So every time my command has finished, it appends the unwritten history item to ~/.bash_history before displaying the prompt (only $PS1) again.

So after putting that line in /etc/bashrc I don’t have to find myself reinventing wheels or lose valuable seconds re-typing stuff just because I was lazy with my terminals.

This is one of those things that I should have done ages ago, but never took the time to.

Posted in Sundry sysadmin | Tagged , | 4 Comments