Introduction
Databases are broadly used by organizations to store and administer transactional data such as customer service history, bank transactions, purchases and sales, and so on. They also constitute the storage method for data warehouses, the repositories used in Business Intelligence solutions.
In this chapter, you will learn to deal with databases in Kettle. The first recipe tells you how to connect to a database, which is a prerequisite for all the other recipes. The rest of the chapter teaches you how to perform different operations and can be read in any order according to your needs.
Note
The focus of this chapter is on relational databases (RDBMS). Thus the term database is used as a synonym for relational databases through the recipes.
Sample databases
Through the chapter you will use a couple of sample databases. Those databases can be created and loaded by running the scripts available at the book's website. The scripts are ready to run under MySQL.
Note
If you work with a different DBMS you may have to modify the scripts slightly.
For more information about the structure of the sample databases and the meaning of the tables and fields, please refer to Appendix, Data Structures. Feel free to adapt the recipes to different databases. You could try some well known databases; for example Foodmart (available as part of the Mondrian distribution at http://sourceforge.net/projects/mondrian/) or the MySQL sample databases (available at http://dev.mysql.com/doc/index-other.html).
Pentaho BI platform databases
As part of the sample databases used in this chapter you will use the Pentaho BI platform Demo databases. The Pentaho BI Platform Demo is a pre-configured installation that lets you explore the capabilities of the Pentaho platform. It relies on the following databases:
Database name |
Description |
---|---|
hibernate |
Administrative information including user authentication and authorization data. |
quartz |
Repository for Quartz, the scheduler used by Pentaho. |
sampledata |
Data for Steel Wheels, a fictional company that sells all kind of scale replicas of vehicles. |
By default, all those databases are stored in Hypersonic (HSQLDB). The script for creating the databases in HSQLDB can be found at http://sourceforge.net/projects/pentaho/files. Under Business Intelligence Server | 1.7.1-stable look for pentaho_sample_data-1.7.1.zip.
It can also be found at svn://ci.pentaho.com/view/Platform/job/bi-platform-sample-data/.
These databases can be stored in other DBMS as well. Scripts for creating and loading these databases in other popular DBMS as for example MySQL or Oracle can be found in Prashant Raju's blog, at http://www.prashantraju.com/projects/pentaho/.
Beside the scripts, you will find instructions for creating and loading the databases.
Tip
Prashant Raju, an expert Pentaho developer, provides several excellent tutorials related to the Pentaho platform. If you are interested in knowing more about Pentaho, it's worth taking a look at his blog.