InnoDB Plugin Compression With Benchmarks
Following the announcement of InnoDB Plugin 1.01 I was keen to get some time over to give it a try (yes, I’ve been busy and this post has been in edit for some time now). I first tried the shortcut and downloaded the precompiled plugin, but got linker errors and messages of the wrong API (turns out it’s built against 5.1.23 and I was trying on 5.1.22). I also had an issue with the RHEL4 specific RPMs and the glibc specific plugin. Seeing where this was going, I figured it was quicker to recompile the lot from source. This was a painless process.
As I am quite intrigued by the compression facility, I shortly after the compilation proceeded by trying to create a table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Primary mistake: Not reading the docs properly. So after sticking innodb_file_per_table=1 and innodb_file_format=“Barracuda” into my.cnf and restart MySQL, the creation worked fine. I created three tables, 3 with compression and 1 without:
1 2 3 4 |
|
I then proceeded to insert 50000 rows and inspect the on-disk size:
1 2 3 4 5 |
|
So far so good! 30% with 1K page size (referred to as KEY_BLOCK_SIZE at table creation because MySQL doesn’t allow storage engines to add their own syntax (another FIXME for our friends at MySQL? At least one should get the option in my opinion)).
But what is the performance like? Both the insert and select benchmarks are done with to a ramdisk. I want to know the impact on the CPUs and the difference between the different compression rates, not the performance of the disks:
1 2 3 4 |
|
Pretty much what I expected for the inserts.
Selects are another story altogether though:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Rather random from the looks of things. Not entirely sure if that’s down to poor benchmarking tools or if I’ve overseen something in regards to how select queries are performed. Yes, the query cache was disabled and the server was rebooted between the runs. I was so perplexed that I ran the tests on two other machines, and both with and without using a ramdisk for storage, but the variation is there regardless. Does anyone have some deeper insight into how the bits and bytes of this works?
My only half-feasible theory at this point is that the reading of the data takes 2-2.5x the time it takes to decompress it. So in a scenario with reading 1K and 4K pages; while MySQL reads the additional 3K, it’s already well on the way of decompressing the 1K since it’s obviously quicker to read 1K than 4, and in the end of the day, they sum up to pretty much the same amount of time - just in different ways (CPU or storage).
So as an example - say a SELECT takes 1 second: For the 1K page, 0.3s would be spent reading it and 0.7s would be spent decompressing it. For the 4K page, 0.7s would be spent reading it and 0.3s would be spent decompressing it.
I suppose this could be tested with a really slow single core CPU and fast storage, or vice versa.