Btrfs-filesystem-to-end-all-filesystems

There are some good stuff on the horizon! It’s called is is an btrfs (“butter-fs”). It was originally announced/”released” over a year ago by our friends at Oracle and has, in my opinion, not quite received the attention it deserves. I’m keeping a close eye on the very intensive devlopment of this as the feature list is very interesting from several aspects. It’s got some of the big names behind it and will undoubtedly be widely deployed and accepted into the vanilla kernel once stable.

btrfs, like ZFS, implements copy-on-write model, so yes – it will be able to do snapshots! Writeable ones at that. In fact, it’s got the ability to do snapshots of snapshots! Quasi-MVC filesystem! COW unfortunately makes a filesystem more prone to fragmentation, but luckily btrfs comes with online defragmentation and fs check abilities. The speed of read and write operations will obviously be impaired during such operations, but there’s always ways around that in most performance sensitive setups! If not, there should be! Sadly, COW isn’t that good of a choice for database workloads. But fret not, COW can be disabled with a mount option (-o nodatacow). This doesn’t mean you will lose the snapshot ability, as btrfs ignores this option if a data extent is referenced by more than one snapshot, so COW will, as far as I understand, be enabled from that you initiate a snapshot and stay that way until you’re done with it.

Early benchmarks show that btrfs is extremely fast at writing, and a little poorer at reading. It will be interesting to see how these numbers change as development proceeds. If added features will have any negative impact on performance. As a side note – I was quite surprised to see the poor numbers for ext3 in these benchmarks!

So if you’re a DBA and your data fits in memory, this filesystem will be right up your alley. With a reasonable amount of tables and some proper values for innodb_open_files and table_cache, I wouldn’t expect any remarkable difference in day-to-day database operation since the real bottleneck usually is in the hardware. This is generally speaking of course. I’m sure there are workloads out there which will benefit a lot more than “the norm”. Likewise, people with awkward read heavy setups with a lot of data in a lot of files may probably be better off not using btrfs. If you, like myself, often use blinks of an eye as a unit, you know what I’m talking about.

Yet another interesting functionality built in is the multiple device support. I will not call it a substitute for proper hardware based RAID, but could well be one for LVM (bearing the snapshots in mind as well)!

Another thing worth keeping an eye on is a related project; CRFS which may turn out to be a worthy NFS replacement. While it’s planned to get failover capabilities, I would much rather have seen a client-agnostic MogileFS-style implementation.

Sadly, they are not production ready yet. By far. But it’s something to look forward to. I’ll give it a version or two until I will put it under the microscope further and chuck some real world load onto it. Can’t wait!

Sep 4th, 2008