Protecting data
As being the main person responsible for the recovery and availability of the data, you need to have a full understanding of how to protect your data against all possible situations you could come across in your daily job. The most common situations you could see are:
Media failure
Hardware failure
Human error
Application error
Let's take a closer look at each of these situations.
Media failure
Media failure occurs when a system is unable to write or read from a physical storage device such a disk or a tape due to a defect on the recording surface. This kind of failure can be easily overcome by ensuring that your data is saved on more than one disk (mirrored) using a solution such as RAID (Redundant Array of Independent Disks) or ASM (Automatic Storage Management). In the case of tapes, ensure that your backups are saved in more than one tape and as mentioned earlier, testing the recoverability from them.
Hardware failure
Hardware failure is when a failure occurs on a physical component of your hardware such as when your server motherboard, CPU, or any other component stops working. To overcome this kind of situation, you will need to have a high availability solution in place as part of your disaster and recovery strategy. This could include solutions such as Oracle RAC, a standby database, or even replacement hardware on the premises. If you are using Oracle Standard Edition or Standard Edition One and need to implement a proper standby database solution, I will recommend you to take a closer look at the Dbvisit Standby solution for Oracle databases that is currently available in the market to allow you to fulfill this need (http://dbvisit.com).
Human error
Human error, also known as user error, is when a user interacting directly or through an application causes damage to the data stored in the database or to the database itself. The most frequent examples of human error involve changing or deleting data and even files by mistake. It is likely that this kind of error is the greatest single cause of database downtime in a company.
No one is immune to user error. Even an experienced DBA or system administrator can delete a redo log file that has the extension .log
as a mistake when taking it as a simple log file to be deleted to release space. Fortunately, user error can most of the time be solved by using physical backups, logical backups, and even Oracle Flashback technology.
Application error
An application error happens when a software malfunction causes data corruption in the logical or physical levels. A bug in the code can easily damage data or even corrupt a data block. This kind of problem can be solved using Oracle block media recovery, and is why it is so important to have a proper test done prior to promoting an application change to any production environment.
Tip
Always do a backup before and after a change is implemented in a production environment. A before backup will allow you to roll back to the previous state in case something goes wrong. An after backup will protect you to avoid to redo the change in case of a failure, due that it was not included in the previous backup available.