InnoDB Plugin Compression With Benchmarks

Following the announcement of InnoDB Plugin 1.01 I was keen to get some time over to give it a try (yes, I’ve been busy and this post has been in edit for some time now). I first tried the shortcut and downloaded the precompiled plugin, but got linker errors and messages of the wrong API (turns out it’s built against 5.1.23 and I was trying on 5.1.22). I also had an issue with the RHEL4 specific RPMs and the glibc specific plugin. Seeing where this was going, I figured it was quicker to recompile the lot from source. This was a painless process.

As I am quite intrigued by the compression facility, I shortly after the compilation proceeded by trying to create a table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> CREATE TABLE compressed_4 (id int(11) primary key, txt char(8)) 
Engine=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
Query OK, 0 rows affected, 5 warnings (0.00 sec)

mysql> show warnings;
+---------+------+----------------------------------------------------------------+
| Level   | Code | Message                                                        |
+---------+------+----------------------------------------------------------------+
| Warning | 1478 | InnoDB: KEY_BLOCK_SIZE requires innodb_file_per_table.         |
| Warning | 1478 | InnoDB: KEY_BLOCK_SIZE requires innodb_file_format > Antelope. |
| Warning | 1478 | InnoDB: ignoring KEY_BLOCK_SIZE=4.                             |
| Warning | 1478 | InnoDB: ROW_FORMAT=COMPRESSED requires innodb_file_per_table.  |
| Warning | 1478 | InnoDB: assuming ROW_FORMAT=COMPACT.                           |
+---------+------+----------------------------------------------------------------+
5 rows in set (0.00 sec)

Primary mistake: Not reading the docs properly. So after sticking innodb_file_per_table=1 and innodb_file_format=“Barracuda” into my.cnf and restart MySQL, the creation worked fine. I created three tables, 3 with compression and 1 without:

1
2
3
4
CREATE TABLE compressed_4 (id int(11) PRIMARY KEY, txt char(8)) Engine=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
CREATE TABLE compressed_8 (id int(11) PRIMARY KEY, txt char(8)) Engine=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
CREATE TABLE compressed_1 (id int(11) PRIMARY KEY, txt char(8)) Engine=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
CREATE TABLE noncompressed_4 (id int(11) PRIMARY KEY, txt char(8)) Engine=InnoDB;

I then proceeded to insert 50000 rows and inspect the on-disk size:

1
2
3
4
5
# du /var/lib/mysql_i/testing/*.ibd
3076    /var/lib/mysql_i/testing/compressed_1.ibd
4100    /var/lib/mysql_i/testing/compressed_4.ibd
6156    /var/lib/mysql_i/testing/compressed_8.ibd
10256   /var/lib/mysql_i/testing/noncompressed_4.ibd

So far so good! 30% with 1K page size (referred to as KEY_BLOCK_SIZE at table creation because MySQL doesn’t allow storage engines to add their own syntax (another FIXME for our friends at MySQL? At least one should get the option in my opinion)).

But what is the performance like? Both the insert and select benchmarks are done with to a ramdisk. I want to know the impact on the CPUs and the difference between the different compression rates, not the performance of the disks:

1
2
3
4
1K insert time: 7.4322438240051 seconds.
4K insert time: 5.1487679481506 seconds.
8K insert time: 4.8629088401794 seconds.
NC insert time: 3.5483801364899 seconds. (No Compression)

Pretty much what I expected for the inserts.

Selects are another story altogether though:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
1K select time: 1.8464620113373 seconds.
4K select time: 1.7520101070404 seconds.
8K select time: 1.6863808631897 seconds.
NC select time: 1.631724023819 seconds. (No Compression)

1K select time: 1.5653350353241 seconds.
4K select time: 1.8094820976257 seconds.
8K select time: 1.7305459976196 seconds.
NC select time: 1.8121519088745 seconds. (No Compression)

1K select time: 1.7385520935059 seconds.
4K select time: 1.8407368659973 seconds.
8K select time: 1.8273019790649 seconds.
NC select time: 1.7326860427856 seconds. (No Compression)

Rather random from the looks of things. Not entirely sure if that’s down to poor benchmarking tools or if I’ve overseen something in regards to how select queries are performed. Yes, the query cache was disabled and the server was rebooted between the runs. I was so perplexed that I ran the tests on two other machines, and both with and without using a ramdisk for storage, but the variation is there regardless. Does anyone have some deeper insight into how the bits and bytes of this works?

My only half-feasible theory at this point is that the reading of the data takes 2-2.5x the time it takes to decompress it. So in a scenario with reading 1K and 4K pages; while MySQL reads the additional 3K, it’s already well on the way of decompressing the 1K since it’s obviously quicker to read 1K than 4, and in the end of the day, they sum up to pretty much the same amount of time - just in different ways (CPU or storage).

So as an example - say a SELECT takes 1 second: For the 1K page, 0.3s would be spent reading it and 0.7s would be spent decompressing it. For the 4K page, 0.7s would be spent reading it and 0.3s would be spent decompressing it.

I suppose this could be tested with a really slow single core CPU and fast storage, or vice versa.

May 29th, 2008