Backing up and restoring Hbase
Amazon Elastic MapReduce provides multiple ways to back up and restore Hbase data to S3 cloud. It also allows us to do an incremental backup; during the backup process Hbase continues to execute the write commands, helping us to keep working while the backup process continues.
Note
There is a risk of having an inconsistency in the data. If consistence is of prime importance, then the write needs to be stopped during the initial backup process, synchronized across nodes. This can be achieved by passing the–consistent parameter when requesting a backup.
When you back up HBase data, you should specify a different backup directory for each cluster.
An easy way to do this is to use the cluster identifier as part of the path specified for the backup directory.
For example, s3://mybucket/backups/j-3AEXXXXXX16F2
. This ensures that any future incremental backups reference the correct HBase cluster.
How to do it…
When you are ready to delete old backup files that are no longer...