Search icon CANCEL
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
$9.99 | ALL EBOOKS & VIDEOS
Save more on purchases! Buy 2 and save 10%, Buy 3 and save 15%, Buy 5 and save 20%
Pentaho Data Integration Quick Start Guide
Pentaho Data Integration Quick Start Guide

Pentaho Data Integration Quick Start Guide: Create ETL processes using Pentaho

By María Carina Roldán
$32.99 $22.99
Book Aug 2018 178 pages 1st Edition
eBook
$25.99 $9.99
Print
$32.99 $22.99
Subscription
$15.99 Monthly
eBook
$25.99 $9.99
Print
$32.99 $22.99
Subscription
$15.99 Monthly

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

Pentaho Data Integration Quick Start Guide

Chapter 1. Getting Started with PDI

Pentaho Data Integration (PDI) is a popular business intelligence tool, used for exploring, transforming, validating, and migrating data, along with other useful operations. PDI allows you to perform all of the preceding tasks thanks to its friendly user interface, modern architecture, and rich functionality. This book will introduce you to the tool, giving you a quick understanding of the daily tasks that you can perform with it.

We will cover the following topics in this chapter:

  • Introducing PDI
  • Installing PDI
  • Configuring the graphical designer tool
  • Creating a simple transformation
  • Understanding the Kettle home directory

Introducing PDI


PDI, also known as Kettle, is a very powerful tool. It can be used for performing typical Extract, Transform, and Load (ETL) processes. PDI gets data from different sources and manipulates it in many ways (deduplicating, filtering, cleaning, and formatting, among others), saving the data in different formats and destinations. The following diagram illustrates a very simple example of an ETL process designed with PDI:

ETL process

Aside from the preceding processes, PDI serves to migrate data between applications, access and manipulate real-time data, access data in the cloud, orchestrate administrative tasks, and more.

Installing PDI


The following are the instructions to install the PDI Community Edition (CE), irrespective of the operating system that you may be using:

  • Make sure that you have JRE 8.0 installed.

Note

If you don't have JRE 8.0 installed, download it from http://www.java.com Redash source code by cloning the repository, and install it before proceeding. Make sure that the JAVA_HOME system variable is set.

PDI on SourceForge.net

  • Download the available ZIP file, which will serve you for all platforms.
  • Unzip the downloaded file in a folder of your choice (for example, c:/software/pdi or /home/pdi_user/pdi).
  • Browse your disk and look for the PDI folder that was just created. You will see a folder named data-integration, with several subfolders (lib, plugins, samples, and more) and a bunch of scripts (spoon.bat, pan.bat, and others), which we will soon learn how to use.

Configuring the graphical designer tool


Spoon is PDI's desktop designer tool. With Spoon, you can design, preview, and test all of your work (that is, transformations and jobs).

Before starting to work with PDI, it's advisable to take a look at the Spoon interface and do some minimal configuration. The instructions are as follows:

  • Start Spoon: If your system is Windows, run Spoon.bat from within the PDI installation directory. On other platforms, such as Unix, Linux, and so on, open a Terminal window and type spoon.sh.
  • The main window will show up, with a Welcome! window already open, as shown in the following screenshot:

Welcome page

Note

The Welcome! page includes some links to web resources, forums, and more, as well as some shortcuts for working with PDI. You can reach that window at any time by navigating to the Help Welcome Screen option.

In order to customize Spoon, do the following:

  • Click on Options... in the Tools menu. A window appears, where you can change various general characteristics, as follows:

Options

 

  • Many of the options in this tab will not make sense to you yet. Instead of doing anything here, select the tab Look & Feel:

Look & Feel options

  • Feel free to change any of the options in this tab (for example, the font color or size). Click on the OK button.
  • Restart Spoon to apply the changes.

Creating a simple transformation


Transformations and jobs are the main PDI artifacts. Transformations are data-flow oriented entities, while jobs are task-oriented. In this book, we will start by learning all about transformations, focusing on jobs later. To get a quick idea of what, exactly, a transformation is, we will start by creating a simple one. This will also allow you to see what it's like to work with Spoon.

Our first transformation will find out the current version of PDI (Kettle), and will print the value to the log. Proceed as follows:

  • On the Welcome page, click on the New transformation link, located under the WORK link group. Alternatively, press Ctrl + N.
  • A new tab will appear, with the title Transformation 1. It's in this tab that you will create your work.
  • To the left of the screen, under the Design tab, you'll see a tree of folders. Expand the Input folder by double-clicking on it.

Note

Note that if you work in macOS, a single click is enough.

  • Then, left-click on the Get System Info icon, and, without releasing the button, drag and drop the selected icon to the work area (that is, the blank area that occupies almost all of the screen). You should see something like this:

Dragging and dropping a step

  • Double-click on the Get System Info icon. A configuration window will show up. Fill in the first row in the grid, as shown in the following screenshot. Note that you don't have to type the Kettle version. Instead, you can choose it from a list of available options:

Configuring the Get System Info step

  • In the Design tab, double-click on the Utility folder, click on the Write to log icon, and drag and drop it to the work area.
  • Put the mouse cursor over the Get System Info icon and wait until a tiny toolbar shows up, as shown in the following screenshot:

Mouseover assistance toolbar

  • Click on the output connector (the icon highlighted in the preceding image) and drag it towards the Write to log icon. A greyed hop is displayed.
  • When the mouse cursor is over the Write to log step, release the button. A link (a hop, from now on) is created, from the first step to the second one. The screen should look as follows:

Connecting steps with a hop

Let's add some color note to our work, as follows:

  • Right-click anywhere in the work area to bring up a contextual menu.
  • In the menu, select the New Note... option. A note editor will appear.
  • Type a description, such as My first transformation. Select the Font style tab and choose a nice font and some colors for your note, and then click on OK. The following should be the final result:

My first transformation

  • Save the transformation by pressing Ctrl + S. PDI will ask for a destination folder. Select the folder of your choice, and give the transformation a name. PDI will save the transformation as a file with a ktr extension (for example, sample_transformation.ktr).

Finally, let's run the transformation to see what happens:

  • Click on the Run icon, located in the transformation toolbar:

Run icon in the transformation toolbar

  • A window named Run Options will appear. Click on Run.
  • At the bottom of the screen, you should see a log with the results of the execution:

Execution Results

Understanding the Kettle home directory


When you run Spoon for the first time, a folder named .kettle is created in your home directory by default. This folder is referred to as the Kettle home directory.

The folder contains several configuration files, mainly created and updated by the different PDI tools. Among these files, there is the kettle.properties file.

The purpose of the kettle.properties file – created along with the .kettle folder, the first time you run Spoon – is to contain variable definitions with a broad scope: Java Virtual Machine. Therefore, it's the perfect place to define general settings; some examples are as follows:

  • Database connection settings: host, database name, and so on
  • SMTP settings: SMTP server, port, and so on
  • Common input and output folders
  • Directory to send log files to

Before continuing, let's add some variables to the file. Suppose that you have two folders, named C:/PDI/INPUT and C:/PDI/OUTPUT, which you will use for storing files. The objective will be to add two variables, named INPUT_FOLDER and OUTPUT_FOLDER, containing those values:

  1. Locate the Kettle home directory. If you work in Windows, the folder could be C:\Documents and Settings\<your_name> or C:\Users\<your_name>, depending on which Windows version you have. If you work in Linux (or similar) or macOS, the folder will most likely be /home/<your_name>/.
  2. Edit the kettle.properties file. You will see that it only contains commented sample lines.
  3. You can safely remove the contents of the file and define your own variables by typing the following lines:
       INPUT_FOLDER=C:/PDI/INPUT
       OUTPUT_FOLDER=C:/PDI/OUTPUT

Save the file and restart Spoon, so that it can recognize the variables defined in the file. We will learn how to use these variables in Chapter 2Getting Familiar with Spoon.

 

 

Summary


In this chapter, you were introduced to Pentaho Data Integration. Specifically, you learned what PDI is, and you installed the tool. You were introduced to Spoon, PDI's graphical designer tool, and you created your first transformation. You were also introduced to the Kettle home directory and the kettle.properties file, which will be used throughout the rest of the book.

In Chapter 2, Getting Familiar with Spoon, you will learn much more about the process of creating, testing, and running transformations in Spoon.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Take away the pain of starting with a complex and powerful system
  • Simplify your data transformation and integration work
  • Explore, transform, and validate your data with Pentaho Data Integration

Description

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution. This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers. By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.

What you will learn

Design, preview and run transformations in Spoon Run transformations using the Pan utility Understand how to obtain data from different types of files Connect to a database and explore it using the database explorer Understand how to transform data in a variety of ways Understand how to insert data into database tables Design and run jobs for sequencing tasks and sending emails Combine the execution of jobs and transformations
Estimated delivery fee Deliver to Colombia

Standard delivery 10 - 13 business days

$19.95

Premium delivery 3 - 6 business days

$40.95
(Includes tracking information)

Product Details

Country selected

Publication date : Aug 30, 2018
Length 178 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789343328
Vendor :
Pentaho
Category :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Estimated delivery fee Deliver to Colombia

Standard delivery 10 - 13 business days

$19.95

Premium delivery 3 - 6 business days

$40.95
(Includes tracking information)

Product Details


Publication date : Aug 30, 2018
Length 178 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789343328
Vendor :
Pentaho
Category :

Table of Contents

15 Chapters
Title Page Chevron down icon Chevron up icon
Copyright and Credits Chevron down icon Chevron up icon
Dedication Chevron down icon Chevron up icon
Packt Upsell Chevron down icon Chevron up icon
Foreword Chevron down icon Chevron up icon
Contributors Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
1. Getting Started with PDI Chevron down icon Chevron up icon
2. Getting Familiar with Spoon Chevron down icon Chevron up icon
3. Extracting Data Chevron down icon Chevron up icon
4. Transforming Data Chevron down icon Chevron up icon
5. Loading Data Chevron down icon Chevron up icon
6. Orchestrating Your Work Chevron down icon Chevron up icon
1. Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Top Reviews
No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela