<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>All things Sysadmin &#187; MySQL</title>
	<atom:link href="http://northernmost.org/blog/category/all-things-mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://northernmost.org/blog</link>
	<description>Just another manic Monday</description>
	<lastBuildDate>Thu, 18 Aug 2011 01:56:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Transactions and code testing</title>
		<link>http://northernmost.org/blog/transactions-and-code-testing/</link>
		<comments>http://northernmost.org/blog/transactions-and-code-testing/#comments</comments>
		<pubDate>Thu, 18 Aug 2011 01:56:49 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=402</guid>
		<description><![CDATA[A little while ago I worked with a customer to migrate their DB from using MyISAM to InnoDB (something I definitely don&#8217;t mind doing!) I set up a smaller test instance with all tables using the InnoDB engine as part &#8230; <a href="http://northernmost.org/blog/transactions-and-code-testing/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A little while ago I worked with a customer to migrate their DB from using MyISAM to InnoDB (something I definitely don&#8217;t mind doing!)<br />
I set up a smaller test instance with all tables using the InnoDB engine as part of the testing. I instructed them to thoroughly test their application against this test instance and let me know if they identified any issues. </p>
<p>They reported back that everything seemed fine, and we went off to do the actual migration. Everything went according to plan and things seemed well.<br />
After a while they started seeing some discrepancies in the stock portion of their application. The data didn&#8217;t add up with what they expected and stock levels seemed surprisingly high. A crontabbed program was responsible for periodically updating the stock count of products, so this was of course the first place I looked.<br />
I ran it manually and looked at its output; it was very verbose and reported some 2000 products had been updated. But looking at the actual DB, this was far from the case. </p>
<p>Still having the test environment available, I ran it a few times against that and could see the com_update and com_insert counters being incremented, so I knew the queries were making it there. But the data remained intact. At this point, I had a gut feeling what was going on.. so to confirm this, I enabled query logging to see what was actually going on. It didn&#8217;t take me long to spot the problem. On the second line of the log, I saw this:</p>
<p><code>		   40 Query	set autocommit=0<br />
</code></p>
<p>The program responsible for updating the stock levels was a python script using <a href="http://mysql-python.sourceforge.net/">MySQLDB</a>. I couldn&#8217;t see any traces of autocommit being set explicitly, so I went on assuming that it was off by default (which turned out to be <a href="http://www.python.org/dev/peps/pep-0249/">correct</a>). After adding cursor.commit()* after the relevant queries had been sent to the server, everything was back to normal as far as stock levels were concerned.<br />
Since the code itself was seeing its own transaction, calls such as cursor.rowcount which the testers had relied on were all correct.</p>
<p>But the lesson here; when testing your software from a database point of view, don&#8217;t blindly trust what your code tells you it&#8217;s done, make sure it&#8217;s actually done it by verifying the data!<br />
A lot of things can happen to data between your program and the platters. Its transaction can deadlock and be rolled back, it can be reading cached data, it can get lost in a crashing message queue, etc.</p>
<p><em>* As a rule of thumb, I&#8217;m rather against setting a blanket autocommit=1 in code, I&#8217;ve seen that come back to haunt developers in the past. I&#8217;m a strong advocate for explicit transaction handling. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/transactions-and-code-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Site slow after scaling out? Yeah, possibly!</title>
		<link>http://northernmost.org/blog/site-slow-after-scaling-out-yeah-possibly/</link>
		<comments>http://northernmost.org/blog/site-slow-after-scaling-out-yeah-possibly/#comments</comments>
		<pubDate>Tue, 29 Mar 2011 22:25:40 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Sundry sysadmin]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=333</guid>
		<description><![CDATA[So, doing the maths, you're seeing 25*0.2*50= 250ms in just network latency per page load for your SQL queries. This is obviously a lot more than you see over a local UNIX socket.  <a href="http://northernmost.org/blog/site-slow-after-scaling-out-yeah-possibly/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Every now and then, we have customers who outgrow their single server setup. The next natural step is of course splitting the web layer from the DB layer. So they get another server, and move the database to that.</p>
<p>So far so good! A week or so later, we often get the call <em>&#8220;Our page load time is higher now than before the upgrade! We&#8217;ve got twice as much hardware, and it&#8217;s slower! You have broken it!&#8221;</em><br />
It&#8217;s easy to see where they&#8217;re coming from. It makes sense, right? </p>
<p>That is until you factor in the newly introduced network topology! Today it&#8217;s not unusual (that&#8217;s not to say it&#8217;s acceptable or optimal) for your average<br />
wordpress/drupal/joomla/otherspawnofsatan site to run 40-50 queries per page load. Quite often even more! </p>
<p>Based on a tcpdump session of a reasonably average query (if there is such a thing), connecting to a server, authenticating, sending a query and receiving a 5 row result set of 1434 bytes yields 25 packets being sent between my laptop and a remote DB server on the same wired, non-congested network. A normal, average latency of TCP/IP over Ethernet is ~0.2 ms for the size of packets we&#8217;re talking here.<br />
So, doing the maths, you&#8217;re seeing 25*0.2*50= 250ms in just network latency per page load for your SQL queries. This is obviously a lot more than you see over a local UNIX socket. </p>
<p>This is inevitable, laws of physics. It is nothing you, your sysadmin and/or your hosting company can do anything about. There may however be something your developer can do about the amount of queries!<br />
You also shouldn&#8217;t confuse response-times with availability. Your response times may be slower, but you can (hopefully) serve a lot more users with this setup! </p>
<p>Sure, there are <a href="http://www.dolphinics.com/">technologies</a> out there which have considerably less latency than ethernet, but they come with quite the price-tag, and there are more often than not quite a few avenues to go down before it makes sense to start looking at that kind of thing. </p>
<p>You could also potentially looking at running the full stack on both machines using master/master replication for your DBs, and load balance your front-ends and have them both read locally, but only write to one node at a time! That kind of DB scenario is something fairly easily set up using <a href="http://mysql-mmm.org/">mmm</a> for MySQL. But in my experience, this often ends up more costly and potentially introducing more complexities than it solves.<br />
I&#8217;m an avid advocate for keeping server roles separate as much as possible! </p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/site-slow-after-scaling-out-yeah-possibly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A look at MySQL 5.5 semi synchronous replication</title>
		<link>http://northernmost.org/blog/a-look-at-mysql-5-5-semi-synchronous-replication/</link>
		<comments>http://northernmost.org/blog/a-look-at-mysql-5-5-semi-synchronous-replication/#comments</comments>
		<pubDate>Sat, 09 Oct 2010 20:19:30 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=288</guid>
		<description><![CDATA[This mode of replication is called semisynchronous due to the fact that it only guarantees that at least one of the slaves have written the transaction to disk in its relay log, not actually committed it to its data files. It guarantees that the data exists by some means somewhere, but not that it's retrievable through a MySQL client.  <a href="http://northernmost.org/blog/a-look-at-mysql-5-5-semi-synchronous-replication/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Now that MySQL 5.5 is in RC, I decided to have a look at the semi synchronous replication. It&#8217;s easy to get going, and from my very initial tests appear to be working a treat. </p>
<p>This mode of replication is called <strong>semi</strong> synchronous due to the fact that it only guarantees that at least one of the slaves have written the transaction to disk in its relay log, not actually committed it to its data files. It guarantees that the data exists by some means somewhere, but <strong>not</strong> that it&#8217;s retrievable through a MySQL client. </p>
<p>Semi sync is available as a plugin, and if you compile from source, you&#8217;ll need to do &#8211;with-plugins=semisync&#8230;.<br />
So far, the semisync plugin can only be built as a dynamic module, so you&#8217;ll need to install it once you&#8217;ve got your instance up and running. To do this, you do as with any other plugin:<br />
<code>install plugin rpl_semi_sync_master soname 'semisync_master.so';<br />
install plugin rpl_semi_sync_slave soname 'semisync_slave.so';</code></p>
<p>You might get an 1126 error and a message saying &#8220;Can&#8217;t open shared library..&#8221;, then you most likely need to set the <em>plugin_dir</em> variable in my.cnf and give MySQL a restart.<br />
If you&#8217;re using a master/slave pair, you obviously won&#8217;t need to load both modules as above. You load the slave one on your slave, and the master one on your master. Once you&#8217;ve done this, you&#8217;ll have entries for these modules in the mysql.plugin table.<br />
When you have confirmed that you do, you can safely add the pertinent variables to your my.cnf, the values I used (in addition to the normal replication settings) for my master/master sandboxes were:<br />
<code>plugin_dir=/opt/mysql-5.5.6-rc/lib/mysql/plugin/<br />
rpl_semi_sync_master_enabled=1<br />
rpl_semi_sync_master_timeout=10000<br />
rpl_semi_sync_slave_enabled=1<br />
rpl_semi_sync_master_trace_level=64<br />
rpl_semi_sync_slave_trace_level=64<br />
rpl_semi_sync_master_wait_no_slave=1</code></p>
<p>Note that you probably won&#8217;t want to use these values for _trace_level in production due to the verbosity in the log! I just enabled these while testing.<br />
Also note that the timeout is in milliseconds.<br />
You can also set these on the fly with SET GLOBAL (thanks Oracle!), just make sure the slave is stopped before doing this, as it needs to be enabled during the handshake with the master for the semisync to kick in.  </p>
<p>The timeout is the amount of time the master will lock and wait for a slave to acknowledge the write before giving up on the whole idea of semi synchronous operation and continue as normal.<br />
If you want to monitor this, you can use the status variable Rpl_semi_sync_master_status which is set to Off when this happens.<br />
If this condition should be avoided altogether, you would need to set a large enough value for the timeout and a low enough monitoring threshold as there doesn&#8217;t seem to be a way to force MySQL to wait forever for a slave to appear.</p>
<p>If you&#8217;re running an automated failover setup, you&#8217;ll want to set the timeout higher than your heartbeat, so ensuring no committed data is lost. Then you might also want to set the timeout considerably lower initially on the passive master so that you don&#8217;t end up waiting on the master we know is unhealthy and have just failed over from.</p>
<p>Before implementing this in production, I would strongly recommend running a few performance tests against your setup as this will slow things down considerably for some workloads. Each transaction has to be written to the binlog, read over the wire and written to the relay log, and then lastly flushed to disk before each DML statement returns. You will almost definitely benefit in batching up queries into larger transactions rather than using the default auto commit mode as this will increase the frequency of the steps.<br />
<strong>Update:</strong> Even though the manual clearly states that the event has to be flushed to disk, this doesn&#8217;t actually appear to be the case (see comments). The above still stands, but the impact may not be as great as first thought</p>
<p>When I find the time, I will run some benchmarks on this.  </p>
<p>Lastly, please note that this is written while MySQL 5.5 is still in release candidate stage, so while unlikely, things are subject to change. So please be mindful of this in future comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/a-look-at-mysql-5-5-semi-synchronous-replication/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t fix, work around &#8211; MySQL</title>
		<link>http://northernmost.org/blog/dont-fix-work-around-mysql/</link>
		<comments>http://northernmost.org/blog/dont-fix-work-around-mysql/#comments</comments>
		<pubDate>Sun, 26 Oct 2008 20:06:15 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[sun]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=167</guid>
		<description><![CDATA[Sun decided to not go down the route of reviewing and accepting the patches, but are now suggesting - are you sitting down? - running multiple instances on the same hardware.  <a href="http://northernmost.org/blog/dont-fix-work-around-mysql/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I attended the MySQL EMEA conference last thursday where I enjoyed a talk from <a href="http://izoratti.blogspot.com">Ivan Zoratti</a> titled &#8220;Scaling Up, Scaling Out, Virtualization &#8211; What should you do with MySQL?&#8221;</p>
<p>They have changed their minds quite a bit. Virtualisation in production is no longer a solid no-no according to them (a lot of people would argue). <a href="http://www.sun.com/bigadmin/content/zones/">Solaris containers</a>, anyone?</p>
<p>As most of us know by now, MySQL struggles to utilise multiple cores efficiently. This has been the case for quite some time by now, and people like <a href="http://code.google.com/p/google-mysql-tools/wiki/SmpPerformance">Google</a> and <a href="http://www.mysqlperformanceblog.com/2008/09/05/new-patches-new-builds/#comment-362746">Percona</a> has grown tired of waiting for MySQL to fix it. </p>
<p>Sun decided to not go down the route of reviewing and accepting the patches, but are now suggesting &#8211; are you sitting down? &#8211; running multiple instances on the same hardware.<br />
I&#8217;m not against this from a technical point of view as it currently actually does improve performance on multiple-core-multiple-disk systems (for an unpatched version) for some workloads, but the fact that they have gone to openly and officially suggest workarounds to their own problem rather than fixing the source of the problem is disturbing. </p>
<p>Granted, I suppose it makes sense to suggest larger boxes if you&#8217;ve been bought by a big-iron manufacturer. Also, I should be fair and note that Ivan at least didn&#8217;t say scaling out was a negative thing and that it&#8217;s still a good option.</p>
<p>If anyone asks me though, I think I&#8217;ll keep scaling outwards and use <a href="http://ourdelta.org">the more sensible version of MySQL</a></p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/dont-fix-work-around-mysql/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Resyncing slaves with slaves</title>
		<link>http://northernmost.org/blog/resyncing-slaves-with-slaves/</link>
		<comments>http://northernmost.org/blog/resyncing-slaves-with-slaves/#comments</comments>
		<pubDate>Sat, 05 Jul 2008 02:44:17 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[resync slave]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=31</guid>
		<description><![CDATA[When dealing with replicated setups with two or more slaves sharing a master, it appears as if a lot of people overlook the obvious. You don't need to take your master down to resync a slave. I was hoping I wouldn't need to post about this, but I see people taking down their masters when they have perfectly healthy slaves way too often to let it slip. 
 <a href="http://northernmost.org/blog/resyncing-slaves-with-slaves/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When dealing with replicated setups with two or more slaves sharing a master, it appears as if a <strong>lot </strong>of people overlook the obvious. You don&#8217;t need to take your master down to resync a slave. I was hoping I wouldn&#8217;t need to post about this, but I see people taking down their masters when they have perfectly healthy slaves way too often to let it slip. </p>
<p>You&#8217;ve got everything you need on the other slave(s). Provided that it&#8217;s in good health, you&#8217;ve got all the data, the master&#8217;s binlog file name and position. Run SHOW SLAVE STATUS\G on the slave, take note of Relay_Master_Log_File and Exec_Master_Log_Pos which are the same as what you&#8217;d get from SHOW MASTER STATUS\G on the master instance, minus the lag which is irrelevant in this case. Then proceed to sync the data from the healthy slave and use the above values in the CHANGE MASTER TO statement (obviously setting MASTER_HOST to the real master, not the other slave).  </p>
<p>Happy higher availability!</p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/resyncing-slaves-with-slaves/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MySQL load balancing with mylbhelper</title>
		<link>http://northernmost.org/blog/mysql-load-balancing-with-mylbhelper/</link>
		<comments>http://northernmost.org/blog/mysql-load-balancing-with-mylbhelper/#comments</comments>
		<pubDate>Sun, 29 Jun 2008 02:59:10 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Sundry sysadmin]]></category>
		<category><![CDATA[hardware load balancing]]></category>
		<category><![CDATA[mylbhelper]]></category>
		<category><![CDATA[mysql load balancing]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=30</guid>
		<description><![CDATA[A customer of ours was in this particular situation. They had a very decent hardware load balancer for their webservers with capacity to spare. So they ended up load balancing the mysql instances through the same device and using a piece of software I've written called <a href="http://mylbhelper.northernmost.org">mylbhelper</a>. 
 <a href="http://northernmost.org/blog/mysql-load-balancing-with-mylbhelper/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When your app outgrows its single DB, the next logical step is attaching a slave to your current DB server to spread the read load. To do this efficiently, you will need a load balancer. If your app and growth is somewhat normal, you will at this point already have at least two front-end servers. Chances are that these are load balanced as well. So when it&#8217;s time to load balance your DB servers, you&#8217;ll already have the means to do this at hand.</p>
<p>In the highly likely &#8211; yet unfortunate &#8211; event that you have a load balancer without MySQL capabilities, you can always set up a generic TCP cluster (all traffic on port 3306 goes there and there). The down side of this is that there is no L7 checks &#8211; something you <strong>really</strong> want when load balancing your backend. The best you can do is L4 (is there anything accepting connections on this port?) Needless to say, there are a lot of problems that can impact your application which doesn&#8217;t make MySQL stop listen on its port. Table corruption, accidentally dropped tables, permission issues, max_connection hit, privilege problems etc. </p>
<p>A customer of ours was in this particular situation. They had a very decent hardware load balancer for their webservers with capacity to spare. So they ended up load balancing the mysql instances through the same device and using a piece of software I&#8217;ve written called <a href="http://mylbhelper.northernmost.org">mylbhelper</a>. </p>
<p>I won&#8217;t go into exactly what mylbhelper does as the project page explains that well enough (I hope). But in a nutshell, it runs as a daemon and periodically runs a custom query on the local DB server, if it fails in any way shape or form, it executes a custom script. The script which comes with mylbhelper blocks L4 access (ie. firewalls port 3306) so that the load balancer stops sending traffic to it. Of course you can write your own scripts. Once mylbhelper has successfully executed the predefined query (twice, to avoid flapping), another script runs. Obviously, the shipped script simply removes the firewall rule put in place.</p>
<p>And oh it&#8217;s written in C, so you&#8217;ll need libmysql in order to compile and run it. It runs on any posix compliant system and is released under the BSD license.</p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/mysql-load-balancing-with-mylbhelper/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Tool tip: mysqlsniffer</title>
		<link>http://northernmost.org/blog/tool-tip-mysqlsniff/</link>
		<comments>http://northernmost.org/blog/tool-tip-mysqlsniff/#comments</comments>
		<pubDate>Wed, 18 Jun 2008 23:59:52 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[mysqlsniff]]></category>
		<category><![CDATA[querysniffer]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=29</guid>
		<description><![CDATA[mysqlsniff is a tool that I find very useful and utilise a lot, but it doesn't seem to be so widely known as it deserves to be. It's pretty much general_log light without a restart! <a href="http://northernmost.org/blog/tool-tip-mysqlsniff/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://iank.org/querysniffer/">mysqlsniff</a> is a tool that I find very useful and utilise a lot, but it doesn&#8217;t seem to be so widely known as it deserves to be.<br />
I often see people ask &#8220;how can I see what queries are being ran on my server?&#8221; to which the answer almost always is &#8220;enable general logging or run show processlist&#8221;. That&#8217;s all fine and well in some cases general logging requires restarting the server (unless you&#8217;re on 5.1) and show processlist is just point-in-time. They can both help, but aren&#8217;t ideal in all situations! Sure, show processlist is good to see that long running query, so it&#8217;s obviously not altogether useless, but as a tool to get an overview of query frequency etc. it&#8217;s rather limited.</p>
<p>With querysniffer you get a real time overview of all queries which are running. It&#8217;s a simple perlscript and is easy enough to get going with. On RedHat/CentOS, you&#8217;d go about it like this:<br />
<code><br />
wget http://iank.org/querysniffer/mysqlsniff-0.10.pl.txt -O mysqlsniff.pl<br />
yum install libpcap-devel<br />
cpan -i Net::PcapUtils<br />
cpan -i NetPacket::Ethernet<br />
perl mysqlsniff.pl eth0</code></p>
<p>And now you should see any query sent to the server over eth0 on your terminal. </p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/tool-tip-mysqlsniff/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wet dream finally coming through?</title>
		<link>http://northernmost.org/blog/wet-dream-finally-coming-through/</link>
		<comments>http://northernmost.org/blog/wet-dream-finally-coming-through/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 22:17:34 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[maatkit]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[self healing replication]]></category>
		<category><![CDATA[summer of code]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=28</guid>
		<description><![CDATA[My major beef with it is that it's not self-healing. Sure, you can monitor, script and re-jig things to a certain extent. But this is why I was thrilled when read the MySQL Forge suggestions for Google Summer of Code. One of the suggestions is to enable self-healing replication using components, or at least concepts, from maatkit and Google's MMRM.   <a href="http://northernmost.org/blog/wet-dream-finally-coming-through/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m one of the people who don&#8217;t believe replication is the worst thing since bad hair day was invented. Flawed? Absolutely! But used in the right situation and catered for in setup and application, it can take you quite far!</p>
<p>Being at the peak of the &#8220;scalability hype&#8221;, where everyone pretends to be a mini-livejournal or flickr , the vast amount of companies of a more humble size out there is often forgotten about. The type of companies that easily get away with a decent master and a slave or two hanging off of it, and most likely will for a very long time to come.</p>
<p>So, why is it flawed? My major beef with it is that it&#8217;s not self-healing. Sure, you can monitor, script and re-jig things to a certain extent. But this is why I was thrilled when read the <a href="http://forge.mysql.com/wiki/SummerOfCode2008Ideas#MySQL_Server" target="_blank">MySQL Forge suggestions</a> for <a href="http://code.google.com/soc/2008/" target="_blank">Google Summer of Code</a>. One of the suggestions is to enable self-healing replication using components, or at least concepts, from <a href="http://www.maatkit.org/" target="_blank">maatkit</a> and <a href="http://code.google.com/p/mysql-master-master/" target="_blank">Google&#8217;s MMRM</a>.  </p>
<p>Interesting! While I can&#8217;t see it become completely fool proof, I&#8217;m sure it would help in the majority of scenarios I&#8217;ve seen where replication has broken.</p>
<p>As the linked forum post says, it is a bit of a shame that the tools to make this has been conceived by people outside of MySQL when it really should have been a part of the server&#8217;s implementation a long time ago!</p>
<p>Let&#8217;s just hope someone talented with some spare time steps up to the challenge!</p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/wet-dream-finally-coming-through/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>paramy &#8211; import dumps in a flash</title>
		<link>http://northernmost.org/blog/paramy-import-dumps-in-a-flash/</link>
		<comments>http://northernmost.org/blog/paramy-import-dumps-in-a-flash/#comments</comments>
		<pubDate>Fri, 30 May 2008 21:35:35 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[data import]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[paramy]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=27</guid>
		<description><![CDATA[Basically it's a multithreaded client. Most servers these days have, or certainly should have, multiple disks and multiple CPU cores and reasonably fast storage. So using a single threaded client to insert those hundreds of thousands or millions of records doesn't make that much sense today. There's quite a lot of time to save. I've ran some tests on MySQL 5.1.24 and compared the results with those from the stock mysql client.

 <a href="http://northernmost.org/blog/paramy-import-dumps-in-a-flash/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Follwing an interesting and lengthy discussion one evening on whether multi threaded data importing should be part of the client or the server, Domas went ahead and scratched his itch to write something and came up with <a title="this" href="http://dammit.lt/2008/05/26/insert-speed-paramy-auto-inc/" target="_blank">paramy</a>.</p>
<p>Basically it&#8217;s a multithreaded client. Most servers these days have, or certainly should have, multiple disks and multiple CPU cores and reasonably fast storage. So using a single threaded client to insert those hundreds of thousands or millions of records doesn&#8217;t make that much sense today. There&#8217;s quite a lot of time to save. I&#8217;ve ran some tests on MySQL 5.1.24 and compared the results with those from the stock mysql client:</p>
<p>1 million rows, 15k SAS drives in RAID 10, 4 cores @ 2.33GHz, 6G of ram, 4 threads for paramy. Table is int, char(8), char(8), InnoDB plugin 1.0.1 (given enough time, I&#8217;ll compare these to MySQL 5.0 and 4.0 as well at a later stage):</p>
<p>Extended insert format:<br />
paramy: 58786.74 inserts/s &#8211; 0m4.977s<br />
stock: 25898.35 inserts/s &#8211; 0m9.588s<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
51% gain</p>
<p>Single row format:<br />
paramy: 24329.54 inserts/s &#8211; 0m25.221s<br />
stock: 16642.02 inserts/s &#8211; 0m52.753s<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
47% gain</p>
<p>So, effectively cutting the insert time in half. It can&#8217;t handle auto incremented fields very well yet, but domas&#8217; working code (not publicly available yet as far as I know) handles these with 1.3 core speed.</p>
<p>Be careful not to over-do it though, as more threads doesn&#8217;t necessarily equal more throughput. Take CPU cores and disks into account when setting the amount. I ran paramy with a few different number of threads. As stated above I ran these test on a 4 core system:</p>
<p>n threads, single row format:<br />
2 threads: 0m39.700s<br />
4 threads: 0m25.221s<br />
6 threads: 0m31.889s<br />
8 threads: 0m32.761s<br />
 </p>
<p>I tested paramy on an RHEL4 system, and experienced an issue to begin with. After reporting this, it was concluded that it was related to the LOCK TABLES, so after a swish grep -v &#8220;LOCK TABLES&#8221; paramy.sql &gt; paramy_nolock.sql, it was good to go.</p>
<p>All in all &#8211; this tool is bound to save me hours and hours of time in the future! Good work indeed! </p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/paramy-import-dumps-in-a-flash/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>InnoDB plugin compression with benchmarks</title>
		<link>http://northernmost.org/blog/innodb-plugin-compression-with-benchmarks/</link>
		<comments>http://northernmost.org/blog/innodb-plugin-compression-with-benchmarks/#comments</comments>
		<pubDate>Thu, 29 May 2008 19:57:51 +0000</pubDate>
		<dc:creator>Erik Ljungstrom</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://northernmost.org/blog/?p=26</guid>
		<description><![CDATA[InnoDB Plugin compression levels and their performance examined. <a href="http://northernmost.org/blog/innodb-plugin-compression-with-benchmarks/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Following the <a href="http://www.innodb.com/wp/2008/05/08/innodb-plugin-101-released/">announcement of InnoDB Plugin 1.01</a> I was keen to get some time over to give it a try (yes, I&#8217;ve been busy and this post has been in edit for some time now). I first tried the shortcut and downloaded the precompiled plugin, but got linker errors and messages of the wrong API (turns out it&#8217;s built against 5.1.23 and I was trying on 5.1.22). I also had an issue with the RHEL4 specific RPMs and the glibc specific plugin. Seeing where this was going, I figured it was quicker to recompile the lot from source. This was a painless process.</p>
<p>As I am quite intrigued by the compression facility, I shortly after the compilation proceeded by trying to create a table:</p>
<p><code>mysql&gt; CREATE TABLE compressed_4 (id int(11) primary key, txt char(8)) Engine=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;<br />
Query OK, 0 rows affected, 5 warnings (0.00 sec)</code></p>
<p><code>mysql&gt; show warnings;<br />
+---------+------+----------------------------------------------------------------+<br />
| Level   | Code | Message                                                        |<br />
+---------+------+----------------------------------------------------------------+<br />
| Warning | 1478 | InnoDB: KEY_BLOCK_SIZE requires innodb_file_per_table.         |<br />
| Warning | 1478 | InnoDB: KEY_BLOCK_SIZE requires innodb_file_format &gt; Antelope. |<br />
| Warning | 1478 | InnoDB: ignoring KEY_BLOCK_SIZE=4.                             |<br />
| Warning | 1478 | InnoDB: ROW_FORMAT=COMPRESSED requires innodb_file_per_table.  |<br />
| Warning | 1478 | InnoDB: assuming ROW_FORMAT=COMPACT.                           |<br />
+---------+------+----------------------------------------------------------------+<br />
5 rows in set (0.00 sec)</code><br />
Primary mistake: Not reading the docs properly. So after sticking innodb_file_per_table=1 and innodb_file_format=&#8221;Barracuda&#8221; into my.cnf and restart MySQL, the creation worked fine. I created three tables, 3 with compression and 1 without:</p>
<p><code>CREATE TABLE compressed_4 (id int(11) PRIMARY KEY, txt char(8)) Engine=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;<br />
CREATE TABLE compressed_8 (id int(11) PRIMARY KEY, txt char(8)) Engine=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;<br />
CREATE TABLE compressed_1 (id int(11) PRIMARY KEY, txt char(8)) Engine=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;<br />
CREATE TABLE noncompressed_4 (id int(11) PRIMARY KEY, txt char(8)) Engine=InnoDB;</code></p>
<p>I then proceeded to insert 50000 rows and inspect the on-disk size:</p>
<p><code># du /var/lib/mysql_i/testing/*.ibd<br />
3076    /var/lib/mysql_i/testing/compressed_1.ibd<br />
4100    /var/lib/mysql_i/testing/compressed_4.ibd<br />
6156    /var/lib/mysql_i/testing/compressed_8.ibd<br />
10256   /var/lib/mysql_i/testing/noncompressed_4.ibd</code></p>
<p>So far so good! 30% with 1K page size (referred to as KEY_BLOCK_SIZE at table creation because MySQL doesn&#8217;t allow storage engines to add their own syntax (another FIXME for our friends at MySQL? At least one should get the option in my opinion)).</p>
<p>But what is the performance like? Both the insert and select benchmarks are done with to a ramdisk. I want to know the impact on the CPUs and the difference between the different compression rates, not the performance of the disks:</p>
<blockquote><p>1K insert time: 7.4322438240051 seconds.<br />
4K insert time: 5.1487679481506 seconds.<br />
8K insert time: 4.8629088401794 seconds.<br />
NC insert time: 3.5483801364899 seconds. (No Compression)</p></blockquote>
<p>Pretty much what I expected for the inserts. </p>
<p>Selects are another story altogether though:</p>
<blockquote><p>1K select time: 1.8464620113373 seconds.<br />
4K select time: 1.7520101070404 seconds.<br />
8K select time: 1.6863808631897 seconds.<br />
NC select time: 1.631724023819 seconds. (No Compression)</p></blockquote>
<blockquote><p>1K select time: 1.5653350353241 seconds.<br />
4K select time: 1.8094820976257 seconds.<br />
8K select time: 1.7305459976196 seconds.<br />
NC select time: 1.8121519088745 seconds. (No Compression)</p></blockquote>
<blockquote><p>1K select time: 1.7385520935059 seconds.<br />
4K select time: 1.8407368659973 seconds.<br />
8K select time: 1.8273019790649 seconds.<br />
NC select time: 1.7326860427856 seconds. (No Compression)</p></blockquote>
<p>Rather random from the looks of things. Not entirely sure if that&#8217;s down to poor benchmarking tools or if I&#8217;ve overseen something in regards to how select queries are performed. Yes, the query cache was disabled and the server was rebooted between the runs. I was so perplexed that I ran the tests on two other machines, and both with and without using a ramdisk for storage, but the variation is there regardless. Does anyone have some deeper insight into how the bits and bytes of this works? </p>
<p>My only half-feasible theory at this point is that the reading of the data takes 2-2.5x the time it takes to decompress it. So in a scenario with reading 1K and 4K pages; while MySQL reads the additional 3K, it&#8217;s already well on the way of decompressing the 1K since it&#8217;s obviously quicker to read 1K than 4, and in the end of the day, they sum up to pretty much the same amount of time &#8211; just in different ways (CPU or storage).  </p>
<p>So as an example &#8211; say a SELECT takes 1 second:<br />
For the 1K page, 0.3s would be spent reading it and 0.7s would be spent decompressing it.<br />
For the 4K page, 0.7s would be spent reading it and 0.3s would be spent decompressing it.</p>
<p>I suppose this could be tested with a really slow single core CPU and fast storage, or vice versa. </p>
]]></content:encoded>
			<wfw:commentRss>http://northernmost.org/blog/innodb-plugin-compression-with-benchmarks/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

