Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Pentaho Data Integration Quick Start Guide
Pentaho Data Integration Quick Start Guide

Pentaho Data Integration Quick Start Guide: Create ETL processes using Pentaho

eBook
₹799.99 ₹1965.99
Paperback
₹2457.99
Subscription
Free Trial
Renews at ₹800p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Pentaho Data Integration Quick Start Guide

Getting Started with PDI

Pentaho Data Integration (PDI) is a popular business intelligence tool, used for exploring, transforming, validating, and migrating data, along with other useful operations. PDI allows you to perform all of the preceding tasks thanks to its friendly user interface, modern architecture, and rich functionality. This book will introduce you to the tool, giving you a quick understanding of the daily tasks that you can perform with it.

We will cover the following topics in this chapter:

  • Introducing PDI
  • Installing PDI
  • Configuring the graphical designer tool
  • Creating a simple transformation
  • Understanding the Kettle home directory

Introducing PDI

PDI, also known as Kettle, is a very powerful tool. It can be used for performing typical Extract, Transform, and Load (ETL) processes. PDI gets data from different sources and manipulates it in many ways (deduplicating, filtering, cleaning, and formatting, among others), saving the data in different formats and destinations. The following diagram illustrates a very simple example of an ETL process designed with PDI:

ETL process

Aside from the preceding processes, PDI serves to migrate data between applications, access and manipulate real-time data, access data in the cloud, orchestrate administrative tasks, and more.

Installing PDI

The following are the instructions to install the PDI Community Edition (CE), irrespective of the operating system that you may be using:

  • Make sure that you have JRE 8.0 installed.
If you don't have JRE 8.0 installed, download it from http://www.java.com Redash source code by cloning the repository, and install it before proceeding. Make sure that the JAVA_HOME system variable is set.
PDI on SourceForge.net
  • Download the available ZIP file, which will serve you for all platforms.
  • Unzip the downloaded file in a folder of your choice (for example, c:/software/pdi or /home/pdi_user/pdi).
  • Browse your disk and look for the PDI folder that was just created. You will see a folder named data-integration, with several subfolders (lib, plugins, samples, and more) and a bunch of scripts (spoon.bat, pan.bat, and others), which we will soon learn how to use.

Configuring the graphical designer tool

Spoon is PDI's desktop designer tool. With Spoon, you can design, preview, and test all of your work (that is, transformations and jobs).

Before starting to work with PDI, it's advisable to take a look at the Spoon interface and do some minimal configuration. The instructions are as follows:

  • Start Spoon: If your system is Windows, run Spoon.bat from within the PDI installation directory. On other platforms, such as Unix, Linux, and so on, open a Terminal window and type spoon.sh.
  • The main window will show up, with a Welcome! window already open, as shown in the following screenshot:
Welcome page
The Welcome! page includes some links to web resources, forums, and more, as well as some shortcuts for working with PDI. You can reach that window at any time by navigating to the Help Welcome Screen option.

In order to customize Spoon, do the following:

  • Click on Options... in the Tools menu. A window appears, where you can change various general characteristics, as follows:
Options

  • Many of the options in this tab will not make sense to you yet. Instead of doing anything here, select the tab Look & Feel:
Look & Feel options
  • Feel free to change any of the options in this tab (for example, the font color or size). Click on the OK button.
  • Restart Spoon to apply the changes.

Creating a simple transformation

Transformations and jobs are the main PDI artifacts. Transformations are data-flow oriented entities, while jobs are task-oriented. In this book, we will start by learning all about transformations, focusing on jobs later. To get a quick idea of what, exactly, a transformation is, we will start by creating a simple one. This will also allow you to see what it's like to work with Spoon.

Our first transformation will find out the current version of PDI (Kettle), and will print the value to the log. Proceed as follows:

  • On the Welcome page, click on the New transformation link, located under the WORK link group. Alternatively, press Ctrl + N.
  • A new tab will appear, with the title Transformation 1. It's in this tab that you will create your work.
  • To the left of the screen, under the Design tab, you'll see a tree of folders. Expand the Input folder by double-clicking on it.
Note that if you work in macOS, a single click is enough.
  • Then, left-click on the Get System Info icon, and, without releasing the button, drag and drop the selected icon to the work area (that is, the blank area that occupies almost all of the screen). You should see something like this:
Dragging and dropping a step
  • Double-click on the Get System Info icon. A configuration window will show up. Fill in the first row in the grid, as shown in the following screenshot. Note that you don't have to type the Kettle version. Instead, you can choose it from a list of available options:
Configuring the Get System Info step
  • In the Design tab, double-click on the Utility folder, click on the Write to log icon, and drag and drop it to the work area.
  • Put the mouse cursor over the Get System Info icon and wait until a tiny toolbar shows up, as shown in the following screenshot:
Mouseover assistance toolbar
  • Click on the output connector (the icon highlighted in the preceding image) and drag it towards the Write to log icon. A greyed hop is displayed.
  • When the mouse cursor is over the Write to log step, release the button. A link (a hop, from now on) is created, from the first step to the second one. The screen should look as follows:
Connecting steps with a hop

Let's add some color note to our work, as follows:

  • Right-click anywhere in the work area to bring up a contextual menu.
  • In the menu, select the New Note... option. A note editor will appear.
  • Type a description, such as My first transformation. Select the Font style tab and choose a nice font and some colors for your note, and then click on OK. The following should be the final result:
My first transformation
  • Save the transformation by pressing Ctrl + S. PDI will ask for a destination folder. Select the folder of your choice, and give the transformation a name. PDI will save the transformation as a file with a ktr extension (for example, sample_transformation.ktr).

Finally, let's run the transformation to see what happens:

  • Click on the Run icon, located in the transformation toolbar:
Run icon in the transformation toolbar
  • A window named Run Options will appear. Click on Run.
  • At the bottom of the screen, you should see a log with the results of the execution:
Execution Results

Understanding the Kettle home directory

When you run Spoon for the first time, a folder named .kettle is created in your home directory by default. This folder is referred to as the Kettle home directory.

The folder contains several configuration files, mainly created and updated by the different PDI tools. Among these files, there is the kettle.properties file.

The purpose of the kettle.properties file – created along with the .kettle folder, the first time you run Spoon – is to contain variable definitions with a broad scope: Java Virtual Machine. Therefore, it's the perfect place to define general settings; some examples are as follows:

  • Database connection settings: host, database name, and so on
  • SMTP settings: SMTP server, port, and so on
  • Common input and output folders
  • Directory to send log files to

Before continuing, let's add some variables to the file. Suppose that you have two folders, named C:/PDI/INPUT and C:/PDI/OUTPUT, which you will use for storing files. The objective will be to add two variables, named INPUT_FOLDER and OUTPUT_FOLDER, containing those values:

  1. Locate the Kettle home directory. If you work in Windows, the folder could be C:\Documents and Settings\<your_name> or C:\Users\<your_name>, depending on which Windows version you have. If you work in Linux (or similar) or macOS, the folder will most likely be /home/<your_name>/.
  2. Edit the kettle.properties file. You will see that it only contains commented sample lines.
  3. You can safely remove the contents of the file and define your own variables by typing the following lines:
       INPUT_FOLDER=C:/PDI/INPUT
OUTPUT_FOLDER=C:/PDI/OUTPUT

Save the file and restart Spoon, so that it can recognize the variables defined in the file. We will learn how to use these variables in Chapter 2Getting Familiar with Spoon.

Summary

In this chapter, you were introduced to Pentaho Data Integration. Specifically, you learned what PDI is, and you installed the tool. You were introduced to Spoon, PDI's graphical designer tool, and you created your first transformation. You were also introduced to the Kettle home directory and the kettle.properties file, which will be used throughout the rest of the book.

In Chapter 2, Getting Familiar with Spoon, you will learn much more about the process of creating, testing, and running transformations in Spoon.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Take away the pain of starting with a complex and powerful system
  • Simplify your data transformation and integration work
  • Explore, transform, and validate your data with Pentaho Data Integration

Description

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.

Who is this book for?

This book is for software developers, business intelligence analysts, and others involved or interested in developing ETL solutions, or more generally, doing any kind of data manipulation.

What you will learn

  • Design, preview and run transformations in Spoon
  • Run transformations using the Pan utility
  • Understand how to obtain data from different types of files
  • Connect to a database and explore it using the database explorer
  • Understand how to transform data in a variety of ways
  • Understand how to insert data into database tables
  • Design and run jobs for sequencing tasks and sending emails
  • Combine the execution of jobs and transformations

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Aug 30, 2018
Length: 178 pages
Edition : 1st
Language : English
ISBN-13 : 9781789342796
Vendor :
Pentaho
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Aug 30, 2018
Length: 178 pages
Edition : 1st
Language : English
ISBN-13 : 9781789342796
Vendor :
Pentaho
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
₹800 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
₹4500 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts
₹5000 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 10,651.97
Learning Pentaho Data Integration 8 CE
₹4096.99
Pentaho 8 Reporting for Java Developers
₹4096.99
Pentaho Data Integration Quick Start Guide
₹2457.99
Total 10,651.97 Stars icon

Table of Contents

7 Chapters
Getting Started with PDI Chevron down icon Chevron up icon
Getting Familiar with Spoon Chevron down icon Chevron up icon
Extracting Data Chevron down icon Chevron up icon
Transforming Data Chevron down icon Chevron up icon
Loading Data Chevron down icon Chevron up icon
Orchestrating Your Work Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
(1 Ratings)
5 star 0%
4 star 0%
3 star 100%
2 star 0%
1 star 0%
MetalPesto Jan 19, 2020
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
Das Buch bietet einen Praxis-orientierten Einstieg in die Nutzung von PDI. Nicht mehr und nicht weniger. Kann man alles in der offiziellen Dokumentation und über andere Quellen herausfinden, aber wenn man keine Praxiserfahrung mit PDI hat und sich etwas Zeit sparen möchte, ist man mit diesem Buch gut beraten. Eine umfassende Vorstellung aller Steps sucht man allerdings vergebens.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.