2011-12-03 20:53:11
MySQL Slow Query Log Filter
Filtering all queries of one user out of a MySQL Slow Query Log is a common task. Maatkit is a toolkit for analyzing log files and more. Here is an example run to filter all queries of user123 of a 707 MB big slow-query-log:
$ time mk-query-digest --filter \ '($event->{user} || "") =~ m/user123/' \ --print --no-report log-slow-queries.log >1 log-slow-queries.log: 15% 02:49 remain log-slow-queries.log: 29% 02:26 remain log-slow-queries.log: 44% 01:51 remain log-slow-queries.log: 59% 01:20 remain log-slow-queries.log: 76% 00:45 remain log-slow-queries.log: 90% 00:18 remain real 3m20.097s user 0m16.616s sys 3m2.592s
Well, 707 MB in more than 3 minutes? Simple filtering can also be done with awk, try it:
$ time awk \ '/^# Time: /{t=1;s=$0;p=0}'\ '!/^# /{t=0}/^# User@Host: /{p=0}'\ '/^# User@Host: user123/{p=1};'\ '{if(p){if(t){print s;t=0;}print$0}}' \ <log-slow-queries.log >2 real 0m58.844s user 0m2.501s sys 0m3.268s
Nice speed, but awk is a symlink to mawk, try gawk:
$ time gawk \ '/^# Time: /{t=1;s=$0;p=0}'\ '!/^# /{t=0}/^# User@Host: /{p=0}'\ '/^# User@Host: user123/{p=1};'\ '{if(p){if(t){print s;t=0;}print$0}}' \ <log-slow-queries.log >2 real 3m30.287s user 0m29.934s sys 2m58.800s
Ooops, but didn't we already know, how slow GNU-software can be? Try a different server:
$ time gawk \ '/^# Time: /{t=1;s=$0;p=0}'\ '!/^# /{t=0}/^# User@Host: /{p=0}'\ '/^# User@Host: user123/{p=1};'\ '{if(p){if(t){print s;t=0;}print$0}}' \ <log-slow-queries.log >2 real 2m4.899s user 2m2.240s sys 0m1.890s
This server runs more fast, but it lacks mawk, is virtual and it runs like the first server only 32-bit-linux. Try gawk and mawk on 64-bit-linux (same physical host):
$ time gawk \ '/^# Time: /{t=1;s=$0;p=0}'\ '!/^# /{t=0}/^# User@Host: /{p=0}'\ '/^# User@Host: user123/{p=1};'\ '{if(p){if(t){print s;t=0;}print$0}}' \ <log-slow-queries.log >2 real 1m49.348s user 1m40.242s sys 0m1.528s $ time mawk \ '/^# Time: /{t=1;s=$0;p=0}'\ '!/^# /{t=0}/^# User@Host: /{p=0}'\ '/^# User@Host: user123/{p=1};'\ '{if(p){if(t){print s;t=0;}print$0}}' \ <log-slow-queries.log >3 real 0m13.115s user 0m1.000s sys 0m2.392s
Conclusions:
- simple filter can be done faster with awk than with maatkit
- mawk does this filtering faster than gawk
- mawk and gawk run faster on 64-bit-linux