You have just created your first transformation.
First, you created a new transformation, dragged-and-dropped into the work area two steps: Generate Rows and Dummy (do nothing), and connected them.
With the Generate Rows step you created 10 rows of data with the message Hello World!
The Dummy (do nothing) step simply served as a destination of those rows.
After creating the transformation, you did a preview. The preview allowed you to see the content of the created data, this is, the 10 rows with the message Hello World!
Finally, you run the transformation. Then you could see at the bottom of the screen the Execution Results window, where a Logging tab shows the complete detail of what happened. There are other tabs in this window which you will learn later in the book.
Directing Kettle engine with transformations
A transformation is an entity made of steps linked by hops. These steps and hops build paths through which data flows—the data enters or is created in a step, the step applies some kind of transformation to it, and finally the data leaves that step. Therefore, it’s said that a transformation is data flow oriented.
A transformation itself is neither a program nor an executable file. It is just plain XML. The transformation contains metadata which tells the Kettle engine what to do.
A step is the minimal unit inside a transformation. A big set of steps is available. These steps are grouped in categories such as the Input and Flow categories that you saw in the example.
Each step is conceived to accomplish a specific function, going from reading a parameter to normalizing a dataset.
Each step has a configuration window. These windows vary according to the functionality of the steps and the category to which they belong. What all steps have in common are the name and description:
A hop
is a graphical representation of data flowing between two steps: an origin and a destination. The data that flows through that hop constitute the output data of the origin step and the input data of the destination step.
Exploring the Spoon interface
As you just saw, Spoon is the tool with which you create, preview, and run transformations. The following screenshot shows you the basic work areas: Main menu, Design view, Transformation toolbar, and Canvas (work area):
Note
The words canvas and work area will be used interchangeably throughout the book.
There is also an area named View that shows the structure of the transformation currently being edited. You can see that area by clicking on the View tab at the upper-left corner of the screen:
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com . If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Designing a transformation
In the earlier section, you designed a very simple transformation, with just two steps and one explanatory note. You learned to link steps by using the mouseover assistance toolbar. There are alternative ways to do the same thing. You can use the one that you feel more comfortable with. Appendix D,
Spoon Shortcuts explains all of the different options to you. It also explains a lot of shortcuts to zoom in and out, align the steps, among others. These shortcuts are very useful as your transformations become more complex.
Note
Appendix F, Best Practices, explains the benefit of using shortcuts as well as other best practices that are invaluable when you work with Spoon, especially when you have to design and develop big ETL projects.
Running and previewing the transformation
The Preview functionality allows you to see a sample of the data produced for selected steps. In the previous example, you previewed the output of the Dummy (do nothing) step.
The Run icon effectively runs the whole transformation.
Whether you preview or run a transformation, you’ll get an Execution Results window showing what happened. You will learn more about this in the next chapter.
Q1. There are several graphical tools in PDI, but Spoon is the most used.
Q2. You can choose to save transformations either in files or in a database.
Q3. To run a transformation, an executable file has to be generated from Spoon.
Q4. The grid size option in the Look & Feel window allows you to resize the work area.
Q5. To create a transformation you have to provide external data (that is, text file, spreadsheet, database, and so on).