Skip to content

Backing up MariaDB clusters with xtrabackup

Since this was written I've migrated to mariabackup, which is a fork of xtrabackup. Works in the same way, same principles and terminology. It's just a more official solution from MariaDB instead of from Percona.

With MariaDB clustering also comes a new method of backing up your system. No more SQL dumps but rather incremental state transfers of the entire datadir.

The most clear advantage is being able to easily and quickly create incremental snapshots of changed data every hour without affecting a busy database. SQL dumps can often be problematic on busy databases due to locking of data.

See my earlier guide for more info on MariaDB replication and terminology.

In this guide I assume there's a node dedicated for backups, for example an arbitrator or a node with no cluster role at all.

I'll be explaining the manual steps but obviously you'd want to automate all of this.

{{:teknik:guider:database:galera backups.png?direct&200 |Galera backups representation}}

Some things to keep track of.

  • Consider the DB node that runs the backup the backup client.
  • Consider the receiving DB node the backup server where the backups are stored. In my case my backup server is also the loadbalancer and arbitrator.
  • Backup job is run from one master node, in my case node1. I call this the client.
  • Backup server in this guide is node3.
  • On Node3 backups are stored in ''/var/backups''.

Installing packages

$ sudo yum install percona-xtrabackup

Setup SSH

A user must be able to login without password from the master node where the backup job runs, to the backup server.

The user must also have write-access to the destination backup dir.

Xtrabackup configuration

I've taken to creating a directory on the master node for all the client configuration.

$ sudo mkdir /etc/xtrabackup_client

There you can have a mysql defaults file used by innobackupex later. The innobackupex program can also write LSN data to this dir to keep track of its backups.

Taking full base backup

The full base backup should optimally only be done once.

$ innobackupex --defaults-file="/etc/xtrabackup_client/mysql_backup.cnf" \
--socket="/var/run/mysqld/mysqld.sock" \
--extra-lsndir="/etc/xtrabackup_client" --stream=xbstream /tmp | \
ssh "backup@node3" "cat - | xbstream -x -C /var/backups/xtrabackup/"

Taking incremental backups

These can be done as often as necessary and don't lock the databases. You're essentially doing an IST.

$ innobackupex --defaults-file="/etc/xtrabackup_client/mysql_backup.cnf" \
--socket="/var/run/mysqld/mysqld.sock" \
---extra-lsndir="/etc/xtrabackup_client" --stream=xbstream \
--incremental --incremental-lsn="$to_lsn" /tmp | \
ssh "backup@node3" "cat - | xbstream -x -C /var/backups/xtrabackup_incremental/incremental-$(date +'%Y%m%d-%H%M%S.%s')"

Note a few things here.

  • ''--incremental-lsn'' - Is a value from ''/etc/xtrabackup_client/xtrabackup_checkpoints'' which is created and updated at each innobackupex run.

Automation

I'm not going to cover the full scripts I've written to automate this process but I will document what they're named so I can reference them in the operations manual.

  • xtrabackup_client.sh (client) - This runs on a DB node (backup client) and runs innobackupex to stream the backup to a backup server.
  • xtrabackup_prepare.sh (server) - Counts all the LSN numbers in all ''xtrabackup_checkpoints'' files to ensure that they're an unbroken sequence.
  • xtrabackup_merge.sh (server) - Runs innobackupex to merge one or more incremental backups into the base backup.
  • xtrabackup_restore.sh (server) - Runs the two last above steps in sequence, and then restores the base backup to a datadir and runs mysql with that datadir in a docker container to run some integrity tests on the restored data.

Operations manual

This part documents routines and situations that are relevant with this solution.

Failed incremental backup

Here's just one example of a failed incremental backup. Failed backups can manifest in different ways.

The backup client (db node that runs innobackupex) runs out of RAM during the backup, this causes the backup to fail.

Looking at the backup server the incremental directory is missing a lot of info, including ''xtrabackup_checkpoints'' file.

In this case I would compare the ''xtrabackup_checkpoints'' file from the last good incremental backup with the global file on the backup client. I tend to create the global ''xtrabackup_checkpoints'' file in ''/etc/xtrabackup_client/'' on the db client in my scripts.

If the two are identical then I just delete the failed incremental backup and re-run the backup client script.

If the file on the db client has a more recent LSN than the last incremental then it's a bad situation and you might have to scrap all the incrementals, and re-take the base backup.

Start over with new base backup

With the scripts I've made to automate the whole process one only needs to delete the file ''/etc/xtrabackup_client/xtrabackup_checkpoints'' on the backup client and of course empty the destination dirs for the base backup and the incrementals on the backup server.

Without automation, running the commands described above, this is equal to starting over in a new base dir destination with the base backup command. Then continuing by creating new incremental backups with the incremental command.

Desyncs

At the end of each incremental backup run the node you backed up will resync itself into the cluster. You might see something like this in logs if the backup finished at 00:30.

Nothing to worry about, so far it's just a pecular line that is logged at each backup.

maj 02 00:30:14 dbnode04 mysqld[22278]: 2017-05-02  0:30:14 139864527861504 [Note] WSREP: Member 0.0 (dbnode04) desyncs itself from group
maj 02 00:30:14 dbnode04 mysqld[22278]: 2017-05-02  0:30:14 139864527861504 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 15010689)
maj 02 00:30:14 dbnode04 mysqld[22278]: 2017-05-02  0:30:14 139865678915328 [Note] WSREP: Provider paused at c926c607-f9ce-11e6-b203-ce70d0d27d00:15010689 (130415)
maj 02 00:30:15 dbnode04 mysqld[22278]: 2017-05-02  0:30:15 139865678915328 [Note] WSREP: resuming provider at 130415
maj 02 00:30:15 dbnode04 mysqld[22278]: 2017-05-02  0:30:15 139865678915328 [Note] WSREP: Provider resumed.
maj 02 00:30:15 dbnode04 mysqld[22278]: 2017-05-02  0:30:15 139864527861504 [Note] WSREP: Member 0.0 (dbnode04) resyncs itself to group
maj 02 00:30:15 dbnode04 mysqld[22278]: 2017-05-02  0:30:15 139864527861504 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 15010698)
maj 02 00:30:15 dbnode04 mysqld[22278]: 2017-05-02  0:30:15 139864527861504 [Note] WSREP: Member 0.0 (dbnode04) synced with group.
maj 02 00:30:15 dbnode04 mysqld[22278]: 2017-05-02  0:30:15 139864527861504 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 15010698)
maj 02 00:30:15 dbnode04 mysqld[22278]: 2017-05-02  0:30:15 139865785468672 [Note] WSREP: Synchronized with group, ready for connections
maj 02 00:30:15 dbnode04 mysqld[22278]: 2017-05-02  0:30:15 139865785468672 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

See also


Last update: September 19, 2021