Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Pentaho Data Integration Beginner's Guide - Second Edition

You're reading from   Pentaho Data Integration Beginner's Guide - Second Edition Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide with this book and ebook

Arrow left icon
Product type Paperback
Published in Oct 2013
Publisher Packt
ISBN-13 9781782165040
Length 502 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
María Carina Roldán María Carina Roldán
Author Profile Icon María Carina Roldán
María Carina Roldán
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Getting Started with Pentaho Data Integration FREE CHAPTER 2. Getting Started with Transformations 3. Manipulating Real-world Data 4. Filtering, Searching, and Performing Other Useful Operations with Data 5. Controlling the Flow of Data 6. Transforming Your Data by Coding 7. Transforming the Rowset 8. Working with Databases 9. Performing Advanced Operations with Databases 10. Creating Basic Task Flows 11. Creating Advanced Transformations and Jobs 12. Developing and Implementing a Simple Datamart A. Working with Repositories B. Pan and Kitchen – Launching Transformations and Jobs from the Command Line C. Quick Reference – Steps and Job Entries D. Spoon Shortcuts E. Introducing PDI 5 Features F. Best Practices G. Pop Quiz Answers Index

Time for action – creating a hello world transformation

How about starting by saying hello to the world? It's not really new, but good enough for our first practical example; here are the steps to follow:

  1. Create a folder named pdi_labs under a folder of your choice.
  2. Open Spoon.
  3. From the main menu, navigate to File | New | Transformation.
  4. On the left of the screen, under the Design tab, you’ll see a tree of Steps. Expand the Input branch by double-clicking on it.

    Note

    Note that if you work in Mac OS a single click is enough.

  5. Then, left-click on the Generate Rows icon and without releasing the button, drag-and-drop the selected icon to the main canvas. The screen will look like the following screenshot:
    Time for action – creating a hello world transformation

    Note

    Note that we changed the preferred language back to English.

  6. Double-click on the Generate Rows step you just put in the canvas, and fill the textboxes, including Step name and Limit and grid as follows:
    Time for action – creating a hello world transformation
  7. From the Steps tree, double-click on the Flow branch.
  8. Click on the Dummy (do nothing) icon and drag-and-drop it to the main canvas.
  9. Put the mouse cursor over the Generate Rows step and wait until a tiny toolbar shows up below the entry icon, as shown in the following screenshot:
    Time for action – creating a hello world transformation
  10. Click on the output connector (the last icon in the toolbar), and drag towards the Dummy (do nothing) step. A grayed hop is displayed.
  11. When the mouse cursor is over the Dummy (do nothing) step, release the button. A link—a hop from now on—is created from the Generate Rows step to the Dummy (do nothing) step. The screen should look like the following screenshot:
    Time for action – creating a hello world transformation
  12. Right-click anywhere on the canvas to bring a contextual menu.
  13. In the menu, select the New note option. A note editor appears.
  14. Type some description such as Hello, World! Select the Font style tab and choose some nice font and colors for your note, and then click on OK.
  15. From the main menu, navigate to Edit | Settings.... A window appears to specify transformation properties. Fill the Transformation name textbox with a simple name, such as hello world. Fill the Description textbox with a short description such as My first transformation. Finally, provide a more clear explanation in the Extended description textbox, and then click on OK.
  16. From the main menu, navigate to File | Save.
  17. Save the transformation in the folder pdi_labs with the name hello_world.
  18. Select the Dummy (do nothing) step by left-clicking on it.
  19. Click on the Preview icon in the bar menu above the main canvas. The screen should look like the following screenshot:
    Time for action – creating a hello world transformation
  20. The Transformation debug dialog window appears. Click on the Quick Launch button.
  21. A window appears to preview the data generated by the transformation as shown in the following screenshot:
    Time for action – creating a hello world transformation
  22. Close the preview window and click on the Run icon. The screen should look like the following screenshot:
    Time for action – creating a hello world transformation
  23. A window named Execute a transformation appears. Click on Launch.
  24. The execution results are shown at the bottom of the screen. The Logging tab should look as follows:
    Time for action – creating a hello world transformation

What just happened?

You have just created your first transformation.

First, you created a new transformation, dragged-and-dropped into the work area two steps: Generate Rows and Dummy (do nothing), and connected them.

With the Generate Rows step you created 10 rows of data with the message Hello World! The Dummy (do nothing) step simply served as a destination of those rows.

After creating the transformation, you did a preview. The preview allowed you to see the content of the created data, this is, the 10 rows with the message Hello World!

Finally, you run the transformation. Then you could see at the bottom of the screen the Execution Results window, where a Logging tab shows the complete detail of what happened. There are other tabs in this window which you will learn later in the book.

Directing Kettle engine with transformations

A transformation is an entity made of steps linked by hops. These steps and hops build paths through which data flows—the data enters or is created in a step, the step applies some kind of transformation to it, and finally the data leaves that step. Therefore, it’s said that a transformation is data flow oriented.

Directing Kettle engine with transformations

A transformation itself is neither a program nor an executable file. It is just plain XML. The transformation contains metadata which tells the Kettle engine what to do.

A step is the minimal unit inside a transformation. A big set of steps is available. These steps are grouped in categories such as the Input and Flow categories that you saw in the example.

Each step is conceived to accomplish a specific function, going from reading a parameter to normalizing a dataset.

Each step has a configuration window. These windows vary according to the functionality of the steps and the category to which they belong. What all steps have in common are the name and description:

Step property

Description

Name

A representative name inside the transformation.

Description

A brief explanation that allows you to clarify the purpose of the step. It’s not mandatory but it is useful.

A hop is a graphical representation of data flowing between two steps: an origin and a destination. The data that flows through that hop constitute the output data of the origin step and the input data of the destination step.

Exploring the Spoon interface

As you just saw, Spoon is the tool with which you create, preview, and run transformations. The following screenshot shows you the basic work areas: Main menu, Design view, Transformation toolbar, and Canvas (work area):

Exploring the Spoon interface

Note

The words canvas and work area will be used interchangeably throughout the book.

There is also an area named View that shows the structure of the transformation currently being edited. You can see that area by clicking on the View tab at the upper-left corner of the screen:

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com . If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Exploring the Spoon interface

Designing a transformation

In the earlier section, you designed a very simple transformation, with just two steps and one explanatory note. You learned to link steps by using the mouseover assistance toolbar. There are alternative ways to do the same thing. You can use the one that you feel more comfortable with. Appendix D, Spoon Shortcuts explains all of the different options to you. It also explains a lot of shortcuts to zoom in and out, align the steps, among others. These shortcuts are very useful as your transformations become more complex.

Note

Appendix F, Best Practices, explains the benefit of using shortcuts as well as other best practices that are invaluable when you work with Spoon, especially when you have to design and develop big ETL projects.

Running and previewing the transformation

The Preview functionality allows you to see a sample of the data produced for selected steps. In the previous example, you previewed the output of the Dummy (do nothing) step.

The Run icon effectively runs the whole transformation.

Whether you preview or run a transformation, you’ll get an Execution Results window showing what happened. You will learn more about this in the next chapter.

Pop quiz – PDI basics

Q1. There are several graphical tools in PDI, but Spoon is the most used.

  1. True.
  2. False.

Q2. You can choose to save transformations either in files or in a database.

  1. True.
  2. False.

Q3. To run a transformation, an executable file has to be generated from Spoon.

  1. True.
  2. False.

Q4. The grid size option in the Look & Feel window allows you to resize the work area.

  1. True.
  2. False.

Q5. To create a transformation you have to provide external data (that is, text file, spreadsheet, database, and so on).

  1. True.
  2. False.
You have been reading a chapter from
Pentaho Data Integration Beginner's Guide - Second Edition - Second Edition
Published in: Oct 2013
Publisher: Packt
ISBN-13: 9781782165040
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime