Paramy - Import Dumps in a Flash

Follwing an interesting and lengthy discussion one evening on whether multi threaded data importing should be part of the client or the server, Domas went ahead and scratched his itch to write something and came up with paramy.

Basically it’s a multithreaded client. Most servers these days have, or certainly should have, multiple disks and multiple CPU cores and reasonably fast storage. So using a single threaded client to insert those hundreds of thousands or millions of records doesn’t make that much sense today. There’s quite a lot of time to save. I’ve ran some tests on MySQL 5.1.24 and compared the results with those from the stock mysql client:

1 million rows, 15k SAS drives in RAID 10, 4 cores @ 2.33GHz, 6G of ram, 4 threads for paramy. Table is int, char(8), char(8), InnoDB plugin 1.0.1 (given enough time, I’ll compare these to MySQL 5.0 and 4.0 as well at a later stage):

Extended insert format:

1
2
3
4
paramy: 58786.74 inserts/s - 0m4.977s
stock: 25898.35 inserts/s - 0m9.588s
-------------------------------------
51% gain

Single row format:

1
2
3
4
paramy: 24329.54 inserts/s - 0m25.221s
stock: 16642.02 inserts/s - 0m52.753s
-------------------------------------
47% gain

So, effectively cutting the insert time in half. It can’t handle auto incremented fields very well yet, but domas' working code (not publicly available yet as far as I know) handles these with 1.3 core speed.

Be careful not to over-do it though, as more threads doesn’t necessarily equal more throughput. Take CPU cores and disks into account when setting the amount. I ran paramy with a few different number of threads. As stated above I ran these test on a 4 core system:

1
2
3
4
5
n threads, single row format:
2 threads: 0m39.700s
4 threads: 0m25.221s
6 threads: 0m31.889s
8 threads: 0m32.761s

I tested paramy on an RHEL4 system, and experienced an issue to begin with. After reporting this, it was concluded that it was related to the LOCK TABLES, so after a swish grep -v "LOCK TABLES" paramy.sql > paramy_nolock.sql, it was good to go.

All in all - this tool is bound to save me hours and hours of time in the future! Good work indeed! 

May 30th, 2008