Site Slow After Scaling Out? Yeah, Possibly!

Every now and then, we have customers who outgrow their single server setup. The next natural step is of course splitting the web layer from the DB layer. So they get another server, and move the database to that.

So far so good! A week or so later, we often get the call “Our page load time is higher now than before the upgrade! We’ve got twice as much hardware, and it’s slower! You have broken it!” It’s easy to see where they’re coming from. It makes sense, right?

That is until you factor in the newly introduced network topology! Today it’s not unusual (that’s not to say it’s acceptable or optimal) for your average wordpress/drupal/joomla/otherspawnofsatan site to run 40-50 queries per page load. Quite often even more!

Based on a tcpdump session of a reasonably average query (if there is such a thing), connecting to a server, authenticating, sending a query and receiving a 5 row result set of 1434 bytes yields 25 packets being sent between my laptop and a remote DB server on the same wired, non-congested network. A normal, average latency of TCP/IP over Ethernet is ~0.2 ms for the size of packets we’re talking here. So, doing the maths, you’re seeing 25*0.2*50=250ms in just network latency per page load for your SQL queries. This is obviously a lot more than you see over a local UNIX socket.

This is inevitable, laws of physics. It is nothing you, your sysadmin and/or your hosting company can do anything about. There may however be something your developer can do about the amount of queries! You also shouldn’t confuse response-times with availability. Your response times may be slower, but you can (hopefully) serve a lot more users with this setup!

Sure, there are technologies out there which have considerably less latency than ethernet, but they come with quite the price-tag, and there are more often than not quite a few avenues to go down before it makes sense to start looking at that kind of thing.

You could also potentially looking at running the full stack on both machines using master/master replication for your DBs, and load balance your front-ends and have them both read locally, but only write to one node at a time! That kind of DB scenario is something fairly easily set up using mmm for MySQL. But in my experience, this often ends up more costly and potentially introducing more complexities than it solves. I’m an avid advocate for keeping server roles separate as much as possible!

Mar 29th, 2011