PDA

View Full Version : For anyone that uses xtrabackup



prosperent brian
06-25-2012, 12:54 PM
I've been beating my head against the wall for 3 days now trying to figure out the exact cause of the crash this last Saturday morning. I knew why it happened, but I wanted to know the why of why.

It all comes down to two, well, two and a half obscure bugs.

The outwardly apparent issue was that the binary log drive became full on our big master database server. This caused mysql to hang and stop writing to the database. Problem here, this should never have happened. So why then did that 2TB drive fill up?

I noticed that xtrabackup hadn't been deleting old backup files. The way I have things setup, xtrabackup keeps the last X days worth of backups, specifically the last X minutes of backups. I knew that the drive filled up because old backups hadn't been deleted as they should, but the reason behind that was a mystery.

I decided to ignore the backup issue and work on getting the slave server back online. You see, when the master drive filled up, it sent over a half written binary log file to the slave which caused replication to break and become impossible to restart. This bug is one that has existed since mysql 4.X, and should have been fixed.

My next step was to restore the slave server from a master backup. A simple process that should take an hour or so, mostly because of the time it takes to actually copy the backup from server to server. This is where I hit the next issue. None of the backups would work. I kept getting a strange error:


InnoDB: Error: page 7 log sequence number 15031414640307
InnoDB: is in the future! Current system log sequence number 14982886719500.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files.

From the error I knew the issue was the log files, but why? I the rest of Saturday, Sunday, and most of today trying to get to the bottom of that one.

The solution? It turns out that xtrabackup doesn't work properly if you have a non default value set for your innodb_log_block_size variable. Mine was 4096 instead of the default 512. This causes xtrabackup to error out and die at the last step where it applies the latest transactions to the innodb log files. This is also why my old backups weren't being deleted. The backup process died before it deleted the old files. If I hadn't needed to restore a backup, I never would have caught the issue (well, at least until I needed to restore data lol).

So, I set the innodb_log_block_size back to default, and bam, I have a working backup that can now be restored on the slave.

Hopefully this post saves someone else a few days of pain in the future :)