MySQL Query Profiling

When you begin designing, implementing and testing an application, you obviously have a very clear picture of what queries are executed where and roughly how often. Following every rule in the book, you design your schema and spec your servers accordingly. However, as the application grows, becomes more dynamic, get more users, new functionality, get tweaked etc. your initial clear overview might be quite a bit off from the truth. Therefore it is quite good to every now and then run a query profiler on your DBs.

It is really quite simple, first step is to enable the general log which you do by putting this in my.cnf:

1
2
general_log=1
log_output=TABLE

You can chose to log to both a table and a file, or just a file as well for that matter. To log to both, set log_output to TABLE,FILE. It will then be logged to the mysql.general_log table as well as whatever file you specify with the log= statement. If you omitt the log_output variable in MySQL 5.0, the default is to log to file, this has however changed in versions > 5.1.6 where a table is the default. There is obviously a slight overhead in this as each statement along with some other information is written on each query. The structure of the general_log table looks like this:

1
2
3
4
5
6
7
8
CREATE TABLE `general_log` (
`event_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`user_host` mediumtext,
`thread_id` int(11) DEFAULT NULL,
`server_id` int(11) DEFAULT NULL,
`command_type` varchar(64) DEFAULT NULL,
`argument` mediumtext
) ENGINE=CSV DEFAULT CHARSET=utf8 COMMENT='General log'

To get MySQL logging without having to restart the server, you simply issue these commands:

1
SET GLOBAL log_output='TABLE'; SET GLOBAL general_log='ON';

which has the same effect (note that this is only possible in 5.1) Then let it run for an adequate amount of time, an hour or two during or around peak hours is usually sufficient to get a good statistically significant data set. When you feel that you’ve got enough queries and that they portray normal usage you need to dump the general_log table such as this:

1
2
3
mkdir /dump
chown mysql.mysql /dump
mysqldump --skip-lock-tables -K --tab=/dump/ --fields-terminated-by=',' --fields-enclosed-by='"' --database mysql --table general_log > general_log.sql

This will give you two files in the /dump directory, general_log.sql and general_log.txt. The latter is a CSV formatted file which we will feed to a program called MyProfi. This is just a zipped PHP script. You run this script on the general_log.txt, like this for instance:

1
/usr/bin/php parser.php -csv -top 20 -type "select, update, delete, insert" /dump/general_log.txt

This will give you the top 20 most common queries executed on your DBs. You can of course change the arguments to -type to just say “select” to get the 20 most common select queries etc. Very useful!

Mar 5th, 2008

Mod_log_forensic Howto

mod_log_forensic is an often forgotten yet very handy tool in debugging webservers. It gives each request a unique ID which you can then track through your log. It first writes the request prefixed with the unique ID, then it writes the same ID once the request is completed. Very useful to spot scripts which never finishes, be it due to client or server issues. You see the entire request including browser information,  cookies etc. Like this:

1
2
+sS6NLH8AAAEAAHoSUdUAAAAB|GET /favicon.ico HTTP/1.1|User-Agent:Opera/9.26 (Windows NT 5.1; U; en)|Host:northernmost.org|Accept:text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1|Accept-Language:en-GB,en;q=0.9|Accept-Charset:iso-8859-1, utf-8, utf-16, *;q=0.1|Accept-Encoding:deflate, gzip, x-gzip, identity, *;q=0|Referer:http%3a//northernmost.org/test.html|Connection:Keep-Alive, TE|TE:deflate, gzip, chunked, identity, trailers
-sS6NLH8AAAEAAHoSUdUAAAAB

When investigating the logs, you usually want to look for any + without a corresponding -, that means that that request was never finalised. You can then look through the script/file in question and see what may have gone wrong.

mod_log_forensic is included in most distribution packages of Apache and comes with the source tarball download, but if you compile Apache 2.x.x from source, you need to add --enable-log-forensic and --enable-unique-id to the configure line.

 In httpd.conf, you will need to have the following lines (preferably in their respective context):

1
2
3
LoadModule log_forensic_module modules/mod_log_forensic.so
LoadModule unique_id_module modules/mod_unique_id.so
ForensicLog logs/forensic_log

If you want, you can have the request ID added to your normal access_log as well by simply editing the LogFormat to look something along the lines of:

1
LogFormat "%{forensic-id}n %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

Note that this will most likely break any log-based webstats package you might be using, such as awstats, webalizer etc.

There is a tool supplied with the source distribution of Apache called check_forensic which takes the forensics log as input and outputs any inomplete request. If I for instance were to remove -sS6NLH8AAAEAAHoSUdUAAAAB line from /var/log/httpd/forensics_log, I’d get this from a run:

1
2
# sh check_forensic /var/log/httpd/forensic_log
+sS6NLH8AAAEAAHoSUdUAAAAB|GET /favicon.ico HTTP/1.1|User-Agent:Opera/9.26 (Windows NT 5.1; U; en)|Host:northernmost.org|Accept:text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1|Accept-Language:en-GB,en;q=0.9|Accept-Charset:iso-8859-1, utf-8, utf-16, *;q=0.1|Accept-Encoding:deflate, gzip, x-gzip, identity, *;q=0|Referer:http%3a//northernmost.org/test.html|Connection:Keep-Alive, TE|TE:deflate, gzip, chunked, identity, trailers

You would however have rather serious issues if you had failed requests for favicon.ico ;) Another overlooked useful module you might want/need to use in combination with mod_log_forensic is mod_whatkilledus.

Feb 24th, 2008

Tail With Action

Just thought I’d share a little python script I wrote the other day and have found helpful on several occasions already. It’s basically the same as tail -f file | grep pattern, but, if it finds a match, it allows you to execute an arbitrary command.

For instance, the yesterday a customer had problems with Apache infrequently segfaulting. The problem was temporarily fixed by simply restarting Apache, so I used this script to automatically restart apache each time children started segfaulting while we looked for a more long term solution.

Here is the script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/python

# This script mimics the behaviour of tail -f , but has the capablity to execute
# a command whenver a line with a matching pattern in it is found.
# As you probably see, there's no support for regex or anything of that
# flashy nature.
#
# Erik Ljungstrom 15/10/07

import time
import sys
import os
import socket

file = open(sys.argv[2],"r");
node=socket.gethostbyaddr(socket.gethostname())[0]
setdef = os.path.getsize(sys.argv[2])
file.seek(setdef-1)
while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        pos = line.find(sys.argv[1])
        if pos >= 0:
            os.system("logger -p local6.crit -t postfix amavisd has bailed out on %s!" % (node))
            sys.exit(1)

You start it such as ./pytail.py pattern file. As my comment says, it’s very basic, but useful! Of course it would be nice if it forked into the background and could do regexp matching etc. But I simply haven’t had use for that yet, and screen is a good enoguh tool. The above ‘version’ was used under pretty similar circumstances. A customer’s amavisd kept bailing out due to BDB 3 limtiations (running out of locker entries). So I ran that script while we upgraded it on all their mail servers. So I ran

1
./pytail.py child_init_hook /var/log/messages

The command being executed is the bit between the quotation marks in os.system(). In that case, I had previously set up syslog to log to a MySQL instance on one of my own servers which is then polled from our monitoring panel (2 x 50″ LCD screens in my corner of the office). So whenever this happened, we knew which server had the issue and could mitigate the problem.

Feb 23rd, 2008

PHP 5.1 and 5.2 and MySQL 5.1.20+

Today I fell victim of a gotcha when custom compiling PHP 5.1.4 for a customer which runs MySQL 5.1.22. The customer in question requires the mysqli component which appears to be broken with these versions (fixed in 5.2.4 though). So if you configure PHP with --with-mysqli=/usr/bin/mysql_config enabled you will soon find yourself staring at:

1
2
3
4
5
6
/usr/local/src/php-5.1.4/ext/mysqli/mysqli_api.c: In function 'zif_mysqli_stmt_bind_param':
/usr/local/src/php-5.1.4/ext/mysqli/mysqli_api.c:145: error: 'gptr' undeclared (first use in this function)
/usr/local/src/php-5.1.4/ext/mysqli/mysqli_api.c:145: error: (Each undeclared identifier is reported only once
/usr/local/src/php-5.1.4/ext/mysqli/mysqli_api.c:145: error: for each function it appears in.)
/usr/local/src/php-5.1.4/ext/mysqli/mysqli_api.c: In function 'zif_mysqli_stmt_execute':
/usr/local/src/php-5.1.4/ext/mysqli/mysqli_api.c:601: error: 'gptr' undeclared (first use in this function) 

To fix this, either upgrade to 5.2.4, or if you’re left with no choice such as myself edit php-5.1.4/ext/mysqli/mysqli_api.c and replace all occurrances of gptr with char*

That ought to get you through the rest of the compilation provided that you’ve got the rest of your setup correct.

Feb 22nd, 2008

Pam_shield Succeeding blockhosts.py

I’ve since a while back suggested blockhosts to customers who are unable to use firewalls for one reason or another (not always entirely sure what they could be!) but still want to stop brute force attacks on various services.

While I don’t mind ‘hackish’ things, I never really got properly to terms with it, even after writing a patch for it. Today I gave pam_shield a go, and I was surprised over how simple it was.

Just download the tarball, extract it, install gdbm-devel and pam-devel (on RHEL/CentOS machines) and compile it. Then edit /etc/security/shield.conf to your liking. The defaults are alright, but you mightwant to put another network in the allow-list. After that stick: auth optional pam_shield.so at the top of /etc/pam.d/sshd and you’re good to go.

While you test this, I suggest you either use a callback script or set the retention ridiculously low just in case you accidentally lock yourself out. By default, pam_shield null routes offending ips. Should you wish to unblock an IP blocked by mistake, firstly apply appropriate lart to the user in question, then run:

1
route del offending.ip.here gw 127.0.0.1 lo

to drop the null route from the routing table. This said, I have yet to look at a few other solutions out there, such as Fail2Ban, but I really have yet to read or hear anything about it which would sway me away from pam_shield

Feb 20th, 2008

Apache Gotcha With ServerLimit

A little gotcha that happened to me a while ago; A rather sizeable customer had just launched a new campaign and had problems with HTTP content matching alerts being thrown rather frequently.

This particular solution has got five loadbalanced webservers and two database back-ends designed to cater for a reasonably high amount of traffic and shorter bursts of traffic surges.

The servers did not seem to be under any particular load, and I could connect to localhost just fine (telnet is a magnificent troubleshooting tool, is it not?). I looked at the cacti graphs on all four servers in the cluster, and noticed in the apache scoreboard that there were more or less exactly 256 threads running on all five servers. ps ax|grep httpd | wc -l confirmed this. So I went and edited httpd.conf and raised ServerLimit and MaxClients to 750 (from 256).

Then here’s my mistake – being the nice guy that I am – I did /usr/local/apache/bin/apachectl graceful . But got this in error_log:

1
2
WARNING: MaxClients of 750 exceeds ServerLimit value of 256 servers,lowering
MaxClients to 256. To increase, please see the ServerLimit directive.

Hm, why would maxclients be respected, but not serverlimit? Short answer is; when doing graceful, the parent apache uber-process is not killed and restarted and this value can not be changed without a full blown restart. Pretty obvious once you know about it! I sifted through apachectl, and all it did was:…

case $ARGV instart|stop|restart|graceful|graceful-stop) $HTTPD -k $ARGV ERROR=$? ;;

…so obviously an apachectl restart would not be sufficient either. So long story short, when changing the ServerLimit directive, a graceful-stop and a start is necessary! At least this is true for this custom compiled apache 2.2.3.

Feb 20th, 2008

Highly Unscientific Comparison of Webservers

Websever Benchmark Results A while ago a customer asked us to perform some benchmarks which would be applicable to their live application. The webservers in question are Apache, lighttpd and nginx

The tests were made using apachebench on two dedicated machines with no other tasks and over a dedicated 100 mbit network. Some figures really surprises me/strikes me as odd, such as the varied php performance of lighttpd utilising fastcgi!

I suppose it would only have been fair to test apache with fastcgi as well, but thatwas out of scope for their request.Should you be interested in the full AB output, you can download a tarball containing all the data here. See attachment for diagrams and figures.

Feb 20th, 2008

MySQL Backup Using Lagged Replication

It’s a rather common thing to use a slave DB for backup purposes in a MySQL shop. This ensures that the application is always available since no table locks will be acquired for taking the backup. This is usually not a big problem on less busy sites and most times goes by unnoticed. However, on slightly more heavily trafficked sites, a table lock on a sizeable table while it’s being dumped can affect more than a handful of users. So while having a slave or two solves that problem, it can also bring another advantage;

Your DB backup never have to be more than x amount of minutes old! By using the superb mysql-slave-delay tool from maatkit, you can delay the slave with a certain amount of minutes. That way you can always keep your slave, say, 30 minutes behind. So if/when disaster strike and someone/something runs DROP TABLE or DELETE FROM on the wrong table (and you notice it within 30 minuets), you can just stop the slave thread altogether on the slave machine, and you have a snapshot of your database 30 minutes ago!

Simply download, extract and run:

1
mysql-slave-delay -daemonize -d 30m u=root,p=notbloodylikely,h=127.0.0.1

on the slave machine. There are a few different options, so have a look at mysql-slave-delay -help. But that’s the most basic command line.

The down side of this is that you lose the last 30 minutes of data in a failover situation where the master machine for some reason goes down, and the crisis does not have anything to do with the data set being damaged. To remedy this, you could either run two mysql processes on the same machine provided that the resources allow so, or if your budget allows, a third machine.

If this is a bit overkill for you and you want some efficient backup system, have a look at mylvmbackup This – of course – requires that the data directories are housed on an LVM volume. More about that in another post another time maybe.

Feb 20th, 2008

MySQL Upgrade to 5.1.22 and Stored Procedures

I upgraded a database for one of our customers a while back from 5.1.16-beta to 5.1.22. The developers and myself obviously gave the ChangeLogs a thorough read to ensure that nothing would affect us as far as we could tell. I know the server setup, integration and architecture like the backside of my hand, however, I’m not 100% clued up with the code etc. as that obviously falls on the developers. Presumably the area of the stored procedures fell between chairs, because we did actually overlook one change that did affect us. Only ever so slightly, but even so… Following the upgrade, I tested the application, checksummed the tables (using the excellent tool in Maatkit (formerly known as mysql-toolkit)) and it all seemed to be another glorious feat! Until I checked the logs these messages

1
2
3
4
080130  5:00:19 [Warning] Stored routine ‘db'.'stored_procedure': invalid value in column mysql.proc.collation_connection.
080130  5:00:19 [Warning] Stored routine ‘db'.'stored_procedure': invalid value in column mysql.proc.db_collation.
080130  5:00:41 [Warning] Stored routine ‘db'.'stored_procedure': invalid value in column mysql.proc.collation_connection.
080130  5:00:41 [Warning] Stored routine ‘db'.'stored_procedure': invalid value in column mysql.proc.db_collation.

Luckily, they were only warnings and had no affect on the actual application, other than some slight performance degradation due to the increased disk I/O caused by the log being written to everytime a stored procedure was executed! Googling for these messages, it (at that time) appeared as if we were the only people in the whole wide world with this problem, as the only other hit was a german person asking about it in his blog. Needless to say, he had no replies either. So I went back through the change logs and came to http://dev.mysql.com/doc/refman/5.1/en/news-5-1-21.html and read

The fix for the problems is to store character set information from the object creation context
so that this information is available when the object needs to be used later. The context includes
the client character set, the connection character set and collation, and the collation of the
database with which the object is associated.

As a result of the patch, several tables have new columns:
In the mysql database, the proc and event tables now have these columns:
`character_set_client, collation_connection, db_collation, body_utf8.

A-ha! This was clearly the problem. Dumping all the procedures, and updating them and set the collation and character set to ‘latin1' and collations to ‘latin1_swedish_ci' solved the issue.

Feb 20th, 2008