Improving GlusterFS performance

I’ve had a closer look at glusterfs in the last few days following the release of version 2.0.1. We often get customers approaching us with web apps dealing with user generated content which needs to be uploaded. If you have two or more servers in a load balanced environment, you usually have a few options, an NFS/CIFS share on one of them (single point of failure – failover NFS is, well…), a SAN (expensive), MogileFS (good, but alas not application agnostic),  periodically rsync/tar | nc files between the nodes (messy, not application agnostic and slow), store files in a database (not ideal for a number of reasons). There are a few other approaches and combinations of the above, but neither is perfect. GlusterFS solves this. It’s fast, instant and redundant! 

I’ve got four machines set up, two acting as redundant servers. Since they’re effectively acting as a RAID 1, each write is done twice over the wire, but that’s kind of inevitable. They’re all connected in a private isolated gigabit network. When dealing with larger files (a la `cp yourfavouritedistro.iso /mnt/gluster`) the throughput is really good at around 20-25 MB/s leaving the client. CPU usage on the client doing the copy was in the realms of 20-25% on a dual core. Very good so far! 

Then I tried many frequent filesystem operations, untarring the 2.6.9 linux kernel from and onto the mount.  Not so brilliant! It took 23-24 minutes from start to finish. The 2.6.9 kernel contain 17477 files and the average size is just a few kilobytes. This is obviously a lot of smaller bursts of network traffic!

After seeing this, I dove into the source code to have a look, when I reached the socket code, I realised that the performance for smaller files would probably be improved by a lot if Nagle’s algorithm was disabled on the socket. Said and done, I added a few setsockopt()s and went to test. The kernel tree now extracted in 1m 20s!

Of course there’s always the drawback.. In this case it is that larger files take longer to transfer as the raw throughput is decreasing (kernel buffer is a lot faster than a cat5!). Copying a 620 MB ISO from local disk onto the mount takes 1.20 s with the vanilla version of GlusterFS, and 3m 34s with Nagle’s algorithm disabled. 

I’m not seeing any performance hit on sustained transfer of larger files, but at the moment I’m guessing I’m hitting another bottleneck before that becomes a problem, as it “in theory” should have a slight negative impact in this case.

If you want to have a look at it, you can find the patch here. Just download to the source directory and do patch -p1 < glusterfs-2.0.1-patch-erik.diff  and then proceed to build as normal.

Until I’ve done some more testing on it and received some feedback, I won’t bother making it a tuneable in the vol-file just in case it’d be wasted effort!

About Erik Ljungstrom

I'm Erik Ljungstrom and I work in a datacenter as a technical team leader. In this blog I will mostly jot down things I consider noteworthy things I encounter in my work. For more information, please see http://northernmost.org
This entry was posted in misc and tagged , , . Bookmark the permalink.

5 Responses to Improving GlusterFS performance

  1. Omar says:

    It would be nice if gluster would make this a tuneable option to better support a mix of large and small files.

    I’ll give your patch a try and see.

  2. Hi Omar,

    Looking at the recent git commits for glusterfs, it appears as if they’ve implemented this patch. Would’ve been nice if someone would have told me, but hey.. if you want this as tuneable, use the git version.
    Judging from your comment though, you’re looking for something a bit more sophisticated than on/off, which isn’t really feasible.

  3. Pingback: GlusterFS tcp_nodelay patch update | All things Sysadmin

  4. Zeck says:

    Such a tunable option would be great for mailservers. The mailbox storage contains mainly little files, but many of them. As I experienced the stock GFS has not enough performance when you try to open large mailboxes through webmail (Squirrelmail) client, and when the incoming traffic is high the SMTP server starts to starve.

  5. Ben Golub says:

    Gluster is now on version 3.1.2

    Gluster recently published an extensive doc on performance, including results of tests for AWS vs. bare metal, NFS vs. Native FUSE, etc.
    Includes specific tips for optimizing performance under different workloads.

    http://www.gluster.com/products/performance-in-a-gluster-system-white-paper/

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>