Purpose of backup and recovery
As a DBA, you are the person responsible for recovering the data and guarding the business continuity of your organization. Consequently, you have the key responsibility for developing, deploying, and managing an efficient backup and recovery strategy for your institution or clients that will allow them to easily recover from any possible disastrous situation. Remember, data is one of the most important assets a company can have. Most organizations would not survive after the loss of this important asset.
Testing backups
It's incredible how many corporations around the world do not have a proper disaster recovery plan (DRP) in place, and what is worse, many DBAs never even test their backups. Most of the time when auditing Oracle environments for clients, I ask the following question to the DBA team:
Are you 100 percent sure that you can trust your backups? For this question I generally receive answers like:
I'm not 100 percent sure since we do not recover from backups too often
We do not test our backups, and so I cannot guarantee the recoverability of them
Another good question is the following:
Do you know how long a full recovery of your database will take? Common responses to this question are:
Probably anything between 6 and 12 hours
I don't know, because I've never done a full recovery of my database
As you can see, a simple implementation of a procedure to proactively test the backups randomly will allow you to:
Test your backups and ensure that they are valid and recoverable: I have been called several times to help clients because their current backups are not viable. Once I was called to help a client and discovered that their backup-to-disk starts every night at 10 P.M. and ends at 2 A.M. Afterwards, the backup files are copied to a tape by a system administrator every morning at 4 A.M. The problem here was that when this process was implemented, the database size was only 500 GB, but after few months, the size of the database had grown to over 1 TB. Consequently, the backup that was initially finishing before 2 A.M. was now finishing at 5 A.M., but the copy to a tape was still being triggered at 4 A.M. by the system administrator. As a result, all backups to a tape were unusable.
Know your recovery process in detail: If you test your backups, you will have the knowledge to answer questions regarding how long a full recovery will take. Answering that your full recovery will take around three and a half hours, but you prefer to say five hours just in case of any unexpected problem that you will come across, you will look more professional. This will let me know that you really know what you are talking about.
Document and improve your recovery process: The complete process needs to be documented. If the process is documented and you also allow your team to practice on a rotation basis, this will ensure that they are familiar with the course of action and will have all the knowledge necessary to know what to do in case of a disaster. You will now be able to rest in your home at night without being disturbed, because now you are not the only person in the team with the experience required to perform this important task.
Good for you if you have a solid backup and recovery plan in place. But have you tested that plan? Have you verified your ability to recover?