If you need to reset one or more slaves regularly, say every morning before business hours begin, importing a SQL dump might take too much time, especially if the slaves are relatively low-end machines without a sophisticated I/O subsystem.
In this recipe, we will present a way to set up a MySQL slave with minimal I/O load on the hard drive and the network adapter of the slave. The example assumes a Linux-based slave machine; however, you should be able to apply this to Windows as well. but you will need to download some free tools most Linux distributions come with out of the box.
The general idea is to have a more powerful machine, which can be the master if resources allow, to prepare a complete set of data files for the slaves and later stream them directly to the slave's disk from a web server.
This recipe is based on the fact that you can quite easily copy MySQL's data files (including InnoDB table space files) from one machine to another, as long as you copy them all. So, we first create a ready-to-go set of slave data files on a relatively powerful machine and transfer them to multiple slaves with weaker hardware. Usually, those files will be bigger than a simple SQL dump file that is usually used for slave setups. But no parsing and processing is required on the target system. This makes the whole thing mostly network and linear disk I/O bound.
The idea behind this concept is to relieve the individual slaves from importing SQL files themselves. As their hardware is rather slow and MySQL only supports single threaded slave SQL execution, this can be very time consuming. Instead, we use the master's better resources temporarily as a single power-slave and let it handle the process of importing. We then provide any number of identical slaves with its data files. This will reduce the burden of the other slaves to simply unpack some files.
While this does not really save anything in terms of bytes that need to be written to each slave's disk, the access pattern is much more sensible. The following table compares the disk transfers for a regular SQL import from local disk and the proposed alternative for a 60MB gzipped SQL file, which will lead to approximately 2GB of InnoDB table space files:
Importing a SQL file from the local hard disk means there are continual seeks between the current position in the SQL text file and the server's data files. Moreover, as the database schema may define lots of indexes, there is even more random disk write activity when executing simple INSERT
statements.
In contrast unpacking ready-made InnoDB table spaces (or MyISAM table files for that matter)is basically just linear writing.
The SQL dump needs to be executed at least once. So, we set up a temporary MySQL daemon with a stripped down configuration that is close to the actual slaves—meaning all the parameters that affect the storage files must match the slaves to create compatible data files.
Every time you want to prepare such a new slave installation image, the temporary daemon should be started with an empty data directory. While not strictly necessary, we prefer to delete the table space and transaction log files every time because it allows for better compression rates later.
The data files should be created close to the size that will be needed, maybe a little more to prevent the need for them to grow. Nevertheless, specify the last data file to be auto-extending. Otherwise the process of importing the SQL data may lead to filling the table space prematurely, especially when used in an automated process that can be difficult to handle.
Also, you should allow InnoDB to add larger chunks to the last data file if needed (default: 8 MB). Extending the files is associated with some overhead, but using bigger chunks reduces the impact on the I/O subsystem. You should be fine with 50 MB or 100 MB. The bigger this is, the less often InnoDB will have to extend the file. See the manual section on InnoDB configuration for more info.
Once you have the temporary daemon running, use the mysqldump
tool with the --master-data
and --single-transaction
options to create a dump of the database(s) you need to replicate. In order to save time and disk space, you may find it useful to pipe the output directly through the mysql
command-line client and feed it into the target temporary server.
Shutting down and compressing
You can now shut down the temporary server. Compress the data directory. Depending on how you want to configure permissions, you may include or exclude the mysql schema. We usually have the temporary server set up with as low permissions as possible and do not move the mysql schema along.
For compression, you should not use the ZIP format. It contains a catalog of all files included at its very end; so piping it through a decompression program on the fly will not work. Instead, we use a gzipped tarball. This allows us to download and to pipe the data stream through gunzip
before directing it to disk.
Transferring to the slave and uncompressing
On the slave we suggest curl
as a download tool. It is important that the tool you choose be able to output the downloaded file directly to standard out. With curl
that is quite simple—it is its default behavior. It also handles files larger than 2 GB, which some versions of wget
have problems with. The command line should look similar to this:
curl
will download the file and pipe it to tar
to decompress into the target data directory.
Note
Do not miss the final -
at the end of the command!
You will find that on a local area network, downloading and unpacking will be considerably faster than having MySQL to first create the empty data file and then import the SQL, for the reasons stated above.
Adjusting slave configuration
When the data files have reached their destination on the slave, you may need to adjust the slave settings. This especially depends on whether you copied fixed size data files (in which case you can prepare the config file in advance) or used the autoextend
option on the last table space file. In that case, you could write a little script that takes a template my.cnf
file with your basic settings and replaces some placeholders for the data file-related settings via sed
. One of those is the size of the last InnoDB data file from the archive. It will become a fixed size file on the slave. Another file will then be added at the first slave start.
One last thing that needs to be done is to read the master's current binlog file name and position from the master.info
file. This is required because once the slave server has been started you will need to provide correct credentials for the replication user. You must also explicitly tell the slave which master host to connect to. Unfortunately, when issuing a CHANGE MASTER TO
command on the slave, which includes a master host name, all information about previous master binlogs the corresponding offset is discarded (see MySQL online manual, Chapter 12.6.2.1 CHANGE MASTER TO Syntax at http://dev.mysql.com/doc/refman/5.1/en/change-master-to.html).
Therefore, you will need to tell the slave again where to begin replication.
One possible solution is to read the contents of the master.info
file that was brought along with the data files into a bash script array and inject the values into the statement:
The format of the master.info
file is described in the MySQL manual.
As soon as you issue a START SLAVE
statement, the slave will connect to the master and begin to catch up with whatever has happened since the time when the dump was taken.