Chapter 1. Product Overview
The purpose of ETL (Extract, Load, Transform) tools is to help with the consolidation of data that is dispersed throughout the information system. Data is stored in disparate applications, databases, files, operating systems, and in incompatible formats. The consequences of such a dispersal of the information can be dire, for example, different business units operating on different data will show conflicting results and information cannot be shared across different entities of the same business.
Imagine the marketing department reporting on the success of their latest campaign while the finance department complains about its lack of efficiency. Both have numbers to back up their assertions, but the numbers do not match!
What could be worse than a shipping department that struggles to understand customer orders, or a support department that cannot confirm whether a customer is current with his/her payment and should indeed receive support? The examples are endless.
The only way to have a centralized view of the information is to consolidate the data—whether it is in a data warehouse, a series of data marts, or by normalizing the data across applications with master data management (MDM) solutions. ETL tools usually come into play when a large volume of data has to be exchanged (as opposed to Service-Oriented Architecture infrastructures for instance, which would be more transaction based).
In the early days of ETL, databases had very weak transformation functions. Apart from using an insert or a select statement, SQL was a relatively limited language. To perform heavy duty, complex transformations, vendors put together transformation platforms—the ETL tools.
Over time, the SQL language has evolved to include more and more transformation capabilities. You can now go as far as handling hierarchies, manipulating XML formats, using analytical functions, and so on. It is not by chance that 50 percent of the ETL implementations in existence today are done in plain SQL scripts—SQL makes it possible.
This is where the ODI ELT architecture (Extract-Load-Transform—the inversion in the acronym is not a mistake) comes into play. The concept with ELT is that instead of extracting the data from a source, transforming it with a dedicated platform, and then loading into the target database, you will extract from the source, load into the target, then transform into the target database, leveraging SQL for the transformations.
To some extent, ETL and ELT are marketing acronyms. When you look at ODI for instance, it can perform transformations on the source side as well as on the target side. You can also dedicate some database or schema for the staging and transformation of your data, and can have something more similar to an ETL architecture. Similarly, some ETL tools all have the ability to generate SQL code and to push some transformations at the database level.
The key differences then for a true ELT architecture are as follows:
The ability to dynamically manage a staging area (location, content, automatic management of table alterations)
The ability to generate code on source and target systems alike, in the same transformation
The ability to generate native SQL for any database on the market—most ETL tools will generate code for their own engines, and then translate that code for the databases—hence limiting their generation capacities to their ability to convert proprietary concepts
The ability to generate DML and DDL, and to orchestrate sequences of operations on the heterogeneous systems
In a way, the purpose of an ELT tool is to provide the comfort of a graphical interface with all the functionality of traditional ETL tools, to keep the efficiency of SQL coding with set-based processing of data in the database, and limiting the overhead of moving data from place to place.
In this chapter we will focus on the architecture of Oracle Data Integrator 11g, as well as the key concepts of the product. The topics we will cover are as follows:
The elements of the architecture, namely, the repository, the Studio, the Agents, the Console, and integration into Oracle Enterprise Manager
An introduction to key concepts, namely, Execution Contexts, Knowledge Modules, Models, Interfaces, Packages, Scenarios, and Load Plans