Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning Pentaho Data Integration 8 CE

You're reading from   Learning Pentaho Data Integration 8 CE An end-to-end guide to exploring, transforming, and integrating your data across multiple sources

Arrow left icon
Product type Paperback
Published in Dec 2017
Publisher Packt
ISBN-13 9781788292436
Length 500 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
María Carina Roldán María Carina Roldán
Author Profile Icon María Carina Roldán
María Carina Roldán
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Getting Started with Pentaho Data Integration FREE CHAPTER 2. Getting Started with Transformations 3. Creating Basic Task Flows 4. Reading and Writing Files 5. Manipulating PDI Data and Metadata 6. Controlling the Flow of Data 7. Cleansing, Validating, and Fixing Data 8. Manipulating Data by Coding 9. Transforming the Dataset 10. Performing Basic Operations with Databases 11. Loading Data Marts with PDI 12. Creating Portable and Reusable Transformations 13. Implementing Metadata Injection 14. Creating Advanced Jobs 15. Launching Transformations and Jobs from the Command Line 16. Best Practices for Designing and Deploying a PDI Project

Introducing transformations

Till now, you've just opened and customized the look and feel of Spoon. It's time to do some interesting tasks beyond looking around. As mentioned before, in PDI we basically work with two kinds of artifacts: transformations and jobs. In this section, we will introduce transformations. First of all, we will introduce some basic definitions. Then, we will design, preview, and run our first Transformation.

The basics about transformations

A Transformation is an entity made of steps linked by hops. These steps and hops build paths through which data flows: the data enters or is created in a step, the step applies some kind of Transformation to it, and finally, the data leaves that step. Therefore, it's said that a Transformation is data flow oriented. Graphically, steps are represented with small boxes, while hops are represented by directional arrows, as depicted in the following sample:

Steps and hops

A Transformation itself is neither a program nor an executable file. It is just plain XML. The Transformation contains metadata, which tells the Kettle engine what to do.

A step is a minimal unit inside a Transformation. A big set of steps is available, either out of the box or the Marketplace, as explained before. These steps are grouped in categories, as, for example, input, output, or transform. Each step is conceived to accomplish a specific function, going from a simple task as reading a parameter to normalizing a dataset.

A hop is a graphical representation of data flowing between two steps: an origin and a destination. The data that flows through that hop constitutes the output data of the origin step and the input data of the destination step.

That's enough theory for now. Let's see it in practice.

Creating a Hello World! Transformation

In this section, we will design, preview, and run a simple Hello World! Transformation; simple, but good enough for our first practical example.

Designing a Transformation

Here are the steps to start working on our very first Transformation. All you need for starting is to have PDI installed:

  1. Open Spoon.From the main menu and navigate to File | New | Transformation.
  2. On the left of the screen, under the Design tab, you'll see a tree of Steps. Expand the Input branch by double-clicking on it.
Note that if you work in Mac OS, a single click is enough.
  1. Then, left-click on the Data Grid icon and without releasing the button, drag and drop the selected icon to the main canvas. The screen will look like the following screenshot:
Dragging and dropping a step
The dotted grid appeared as a consequence of the changes we made in the options window. Also, note that we changed the preferred language back to English.
  1. Double-click on the Data Grid step you just put on the canvas, and fill the Meta tab as follows:
Configuring a metadata tab
  1. Now select the Data tab and fill the grid with some names, as in the following screenshot. Then click on OK to close the window:
Filling a Data tab
  1. From the Steps tree, double-click on the Scripting branch, click on the User Defined Java Expression icon, and drag and drop it to the main canvas.
  2. Put the mouse cursor over the Data Grid step and wait until a tiny toolbar shows up succeeding the Data Grid icon, as shown next:
Mouseover assistance toolbar
  1. Click on the output connector (the icon highlighted in the preceding image) and drag it towards the User Defined Java Expression (UDJE) step. A greyed hop is displayed.
  1. When the mouse cursor is over the UDJE step, release the button. A link—a hop from now on is created from the Data Grid step to the UDJE step. The screen should look like this:
Connecting steps with a hop
  1. Double-click the UDJE icon and fill the grid as shown. Then close the window:
Configuring a UDJE step

Done! We have a draft for our first Transformation. A Data Grid with the names of a list of people, and a script step that builds the hello_message.

Before continuing, let's just add some color note to our work. This is totally optional, but as your work gets more complicated, it's highly recommended that you comment your transformations:

  1. Right-click anywhere on the canvas to bring a contextual menu.
  2. In the menu, select the New note option. A note editor appears.
  3. Type some description, such as Hello, World!. Select the Font style tab and choose some nice font and colors for your note, and then click on OK. This should be the final result:
Hello World Transformation

The final step is to save the work:

  1. From the main menu, navigate to Edit | Settings.... A window appears to specify Transformation properties. Fill the Transformation name textbox with a simple name, such as hello world. Fill the Description textbox with a short description, such as My first transformation. Finally, provide a more clear explanation in the Extended description textbox, and then click on OK.
  1. From the main menu, navigate to File | Save and save the Transformation in a folder of your choice with the name hello_world.

Next step is to preview the data produced and run the Transformation.

Previewing and running a Transformation

Now we will preview and run the Transformation created earlier. Note the difference between both:

  • The Preview functionality allows you to see a sample of the data produced for selected steps
  • The Run option effectively runs the whole Transformation

In our Transformation, we will preview the output of the User Defined Java Expression step:

  1. Select the User Defined Java Expression step by left-clicking on it.
  2. Click on the Preview icon in the bar menu preceding in the main canvas: 
Preview icon in the Transformation toolbar
  1. The Transformation debug dialog window will appear. Click on the Quick Launch button.
  1. A window will appear to preview the data generated by the Transformation, as shown in the following screenshot:
Previewing the Hello World Transformation
  1. Close the preview window.
You can preview the output of any step in the Transformation at any time of your designing process. You can also preview the data even if you haven't yet saved the work.

Once we have the Transformation ready, we can run it:

  1. Click on the Run icon:
Run icon in the Transformation toolbar
  1. A window named Run Options appears. Click on Run.
You need to save the Transformation before you run it. If you have modified the Transformation without saving it, you will be prompted to do so.
  1. At the bottom of the screen, you should see a log with the result of the execution:
Sample execution results window

Whether you preview or run a Transformation, you'll get an Execution Results window showing what happened. You will learn more about this in Chapter 2, Getting Started with Transformations.

You have been reading a chapter from
Learning Pentaho Data Integration 8 CE - Third Edition
Published in: Dec 2017
Publisher: Packt
ISBN-13: 9781788292436
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image