I have noticed the same phenomenon on my systems. Queries which normally take a millisecond will suddenly take 1-2 seconds. All of my cases are simple, single table INSERT/UPDATE/REPLACE statements — not on any SELECTs. No load, locking, or thread build up is evident.
I had suspected that it’s due to clearing out dirty pages, flushing changes to disk, or some hidden mutex, but I have yet to narrow it down.
Also Ruled Out
- Server load — no correlation with high load
- Engine — happens with InnoDB/MyISAM/Memory
- MySQL Query Cache — happens whether it’s on or off
- Log rotations — no correlation in events
The only other observation I have at this point is derived from the fact I’m running the same db on multiple machines. I have a heavy read application so I’m using an environment with replication — most of the load is on the slaves. I’ve noticed that even though there is minimal load on the master, the phenomenon occurs more there. Even though I see no locking issues, maybe it’s Innodb/Mysql having trouble with (thread) concurrency? Recall that the updates on the slave will be single threaded.
MySQL Verion 5.1.48
Update
I think I have a lead for the problem on my case. On some of my servers, I noticed this phenomenon on more than the others. Seeing what was different between the different servers, and tweaking things around, I was lead to the MySQL innodb system variable innodb_flush_log_at_trx_commit.
I found the doc a bit awkward to read, but innodb_flush_log_at_trx_commit can take the values of 1,2,0:
- For 1, the log buffer is flushed to
the log file for every commit, and the log
file is flushed to disk for every commit. - For 2, the log buffer is flushed to
the log file for every commit, and the log
file is flushed to disk approximately every 1-2 seconds. - For 0, the log buffer is flushed to
the log file every second, and the log
file is flushed to disk every second.
Effectively, in the order (1,2,0), as reported and documented, you’re supposed to get with increasing performance in trade for increased risk.
Having said that, I found that the servers with innodb_flush_log_at_trx_commit=0 were performing worse (i.e. having 10-100 times more “long updates”) than the servers with innodb_flush_log_at_trx_commit=2. Moreover, things immediately improved on the bad instances when I switched it to 2 (note you can change it on the fly).
So, my question is, what is yours set to? Note that I’m not blaming this parameter, but rather highlighting that it’s context is related to this issue.