What to Replicate?
Another key decision in any GoldenGate implementation is what data to replicate. There is little point replicating data that doesn't need to be, as this will cause unnecessary additional overhead. Furthermore, if you decide that you need to replicate everything, GoldenGate may not necessarily provide the best solution. Other products such as Oracle 11g Active Data Guard may be more appropriate. The forthcoming paragraphs talk not only about what to replicate but also how to replicate, plus important functional and design considerations.
Object mapping and data selection
The power of GoldenGate comes into its own when you select what data you wish to replicate, by using its inbuilt tools and functions. You may even wish to transform the data before it hits the target. There are numerous options at your disposal, but choosing the right combination is paramount.
The configuration of GoldenGate includes mapping of source objects to target objects. Given the enormity of parameters and functions available, it is easy to over complicate your GoldenGate Extract or Replicat process configuration through redundant operations. Try to keep your configuration as simple as possible, choosing the right parameter, option, or function for the job. Although it is possible to string these together to achieve a powerful solution, this may cause significant additional processing and performance will suffer as a result.
GoldenGate provides the ability to select or filter out data based on a variety of levels and conditions. Typical data mapping and selection parameters are as follows:
TABLE/MAP
Specifies the source and target objects to replicate. TABLE is used in Extract and MAP in Replicat parameter files.
WHERE
Similar to the SQL WHERE clause, the WHERE option included with a TABLE or MAP parameter enables basic data filtering.
FILTER
Provides complex data filtering. The FILTER option can be used with a TABLE or MAP parameter.
COLS/COLSEXCEPT
The COLS and COLSEXCEPT option allows columns to be mapped or excluded when included with a TABLE or MAP parameter.
Before GoldenGate can extract data from the databases' online redo logs, the relevant data needs to be written to its log files. A number of pre-requisites exist to ensure the changed data can be replicated:
Enable supplemental logging.
Setting at database level overrides any NOLOGGING operations and ensures all changed data is written to the redo logs.
Forces the logging of the full before and after image for updates.
Ensure each source table has a primary key.
GoldenGate requires a primary key to uniquely identify a row.
If the primary key does not exist on the source table, GoldenGate will create its own unique identifier by concatenating all the table columns together. This can be grossly inefficient given the volume of data that needs to be extracted from the redo logs. Ideally, only the primary key plus the changed data (before and after images in the case of an update statement) are required.
It is also advisable to have a primary key defined on your target table(s) to ensure fast lookup when the Replicat recreates and applies the DML statements against the target database. This is particularly important for UPDATE and DELETE operations.
Initial Load
Initial Load is the process of instantiating the objects on the source database, synchronizing the target database objects with the source and providing the starting point for data replication. The process enables "change synchronization" which keeps track of ongoing transactional changes while the load is being applied. This allows users to continue to change data on the source during the Initial Load process.
The Initial Load can be successfully conducted using the following:
A database load utility such as import / export or data pump.
An Extract process to write data to files in ASCII format. Replicat then applies the files to the target tables.
An Extract process to write data to files in ASCII format. SQL*Loader (direct load) can be used to load the data into the target tables.
An Extract process that communicates directly with the Replicat process across TCP/IP without using a Collector process or files.
CSN co-ordination
An Oracle database uses the System Change Number (SCN) to keep track of transactions. For every commit, a new SCN is assigned. The data changes including primary key and SCN are written to the databases' online redo logs. Oracle requires these logs for crash recovery, which allows the committed transactions to be recovered (uncommitted transactions are rolled back). GoldenGate leverages this mechanism by reading the online redo logs, extracting the data and storing the SCN as a series of bytes. The Replicat process replays the data in SCN order when applying data changes on the target database. The GoldenGate manuals refer to the SCN as a CSN (Commit Sequence Number).
Trail file format
GoldenGate's Trail files are in Canonical Format. Backed by checkpoint files for persistence, they store the changed data in a hierarchical form including metadata definitions. The GoldenGate software includes a comprehensive utility named Logdump that has a number of commands to search and view the internal file format.