Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Learning Pentaho Data Integration 8 CE
Learning Pentaho Data Integration 8 CE

Learning Pentaho Data Integration 8 CE: An end-to-end guide to exploring, transforming, and integrating your data across multiple sources , Third Edition

eBook
€22.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Learning Pentaho Data Integration 8 CE

Getting Started with Transformations

In the previous chapter, you used the graphical designer Spoon to create your first Transformation, Hello World. Now you're ready to begin transforming data, and at the same time get familiar with the Spoon environment.

In this chapter, you will:

  • Learn the simplest ways of transforming data
  • Get familiar with the process of designing, debugging, and testing a Transformation
  • Explore the available features for running transformations from Spoon
  • Learn basic PDI terminology related to data and metadata
  • Get an introduction to handling runtime errors

Designing and previewing transformations

In the previous chapter, you created a simple Transformation, previewed the data, and also ran the Transformation. That allowed you to get your first contact with the PDI graphical designer. In this section, you will become more familiar with the editing features, experiment the Preview option in detail, and deal with errors that may appear as you develop and test a Transformation.

Getting familiar with editing features

Editing transformations with Spoon can be very time-consuming if you're not familiar with the editing facilities that the software offers. In this section, you will learn a bit more about three editing features that you already faced in the previous...

Understanding PDI data and metadata

By now, you have already created three transformations and must have an idea of what a dataset is, the kind of data types that PDI supports, and how data is modified as it goes through the path of steps and hops. This section will provide you with a deeper understanding of these concepts:

  • We will give formal definitions for PDI basic terminology related to data and metadata
  • We will also give you a practical list of steps that will expand your toolbox for Transforming data

Understanding the PDI rowset

Transformation deal with datasets or rowsets, that is, rows of data with a predefined metadata. The metadata tells us about the structure of data, that is, the list of fields as well...

Handling errors

So far, each time you got an error, you had the opportunity to discover what kind of error it was and fix it. This is quite different from real scenarios, mainly for two reasons:

  • Real data has errors—a fact that cannot be avoided. If you fail to heed it, the transformations that run with test or sample data will probably crash when running with real data.
  • In most cases, your final work is run by an automated process and not by a user from Spoon. Therefore, if a Transformation crashes, there will be nobody who notices and reacts to that situation.

In this section, you will learn the simplest way to trap errors that may occur, avoiding unexpected crashes. This is the first step in the creation of transformations ready to be run in a production environment.

Implementing...

Summary

In this chapter, you created several transformations. As you did it, you got more familiar with the design process, including dealing with errors, previewing, and running Transformations. You had the opportunity of learning to use several PDI steps, and you also learned how to handle errors that may appear. At the same time, you were introduced to the basic terminology related to data, metadata, and transformations.

Now that you know the basics about data manipulation, it's time to change the focus. In the next chapter, we will introduce a very different, yet core, subject, task flows.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Manipulate your data by exploring, transforming, validating, and integrating it using Pentaho Data Integration 8 CE
  • A comprehensive guide exploring the features of Pentaho Data Integration 8 CE
  • Connect to any database engine, explore the databases, and perform all kind of operations on relational databases

Description

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag-and-drop design and powerful Extract-Tranform-Load (ETL) capabilities. This book shows and explains the new interactive features of Spoon, the revamped look and feel, and the newest features of the tool including transformations and jobs Executors and the invaluable Metadata Injection capability. We begin with the installation of PDI software and then move on to cover all the key PDI concepts. Each of the chapter introduces new features, enabling you to gradually get practicing with the tool. First, you will learn to do all kind of data manipulation and work with simple plain files. Then, the book teaches you how you can work with relational databases inside PDI. Moreover, you will be given a primer on data warehouse concepts and you will learn how to load data in a data warehouse. During the course of this book, you will be familiarized with its intuitive, graphical and drag-and-drop design environment. By the end of this book, you will learn everything you need to know in order to meet your data manipulation requirements. Besides, your will be given best practices and advises for designing and deploying your projects.

Who is this book for?

This book is a must-have for software developers, business intelligence analysts, IT students, or anyone involved or interested in developing ETL solutions. If you plan on using Pentaho Data Integration for doing any data manipulation task, this book will help you as well. This book is also a good starting point for data warehouse designers, architects, or anyone who is responsible for data warehouse projects and needs to load data into them.

What you will learn

  • • Explore the features and capabilities of Pentaho Data Integration 8 Community Edition
  • • Install and get started with PDI
  • • Learn the ins and outs of Spoon, the graphical designer tool
  • • Learn to get data from all kind of data sources, such as plain files, Excel spreadsheets, databases, and XML files
  • • Use Pentaho Data Integration to perform CRUD (create, read, update, and delete) operations on relationaldatabases
  • • Populate a data mart with Pentaho Data Integration
  • • Use Pentaho Data Integration to organize files and folders, run daily processes, deal with errors, and more

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 05, 2017
Length: 500 pages
Edition : 3rd
Language : English
ISBN-13 : 9781788290074
Vendor :
Pentaho
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Dec 05, 2017
Length: 500 pages
Edition : 3rd
Language : English
ISBN-13 : 9781788290074
Vendor :
Pentaho
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 125.97
Learning Pentaho Data Integration 8 CE
€41.99
Learning Pentaho CTools
€41.99
Pentaho 8 Reporting for Java Developers
€41.99
Total 125.97 Stars icon

Table of Contents

16 Chapters
Getting Started with Pentaho Data Integration Chevron down icon Chevron up icon
Getting Started with Transformations Chevron down icon Chevron up icon
Creating Basic Task Flows Chevron down icon Chevron up icon
Reading and Writing Files Chevron down icon Chevron up icon
Manipulating PDI Data and Metadata Chevron down icon Chevron up icon
Controlling the Flow of Data Chevron down icon Chevron up icon
Cleansing, Validating, and Fixing Data Chevron down icon Chevron up icon
Manipulating Data by Coding Chevron down icon Chevron up icon
Transforming the Dataset Chevron down icon Chevron up icon
Performing Basic Operations with Databases Chevron down icon Chevron up icon
Loading Data Marts with PDI Chevron down icon Chevron up icon
Creating Portable and Reusable Transformations Chevron down icon Chevron up icon
Implementing Metadata Injection Chevron down icon Chevron up icon
Creating Advanced Jobs Chevron down icon Chevron up icon
Launching Transformations and Jobs from the Command Line Chevron down icon Chevron up icon
Best Practices for Designing and Deploying a PDI Project Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(5 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Mr. T. Mangiacapre Dec 22, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book does a great job covering all of the core capabilities and features of the latest PDI (8) release. It flows very well and is written in practical and easily digestible terms. Not only does it help you understand the platform and how to use and configure the core integration steps, but it also teaches you useful techniques and methods for addressing common use cases. The exercises are quite useful but I'd recommend that you go beyond them and explore other steps not covered in the book (there are over 300 of them). Well worth the investment of time and money ... I learned a lot! (Technical Field Enablement Mgr at Pentaho, Hitachi Vantara)
Amazon Verified review Amazon
Thomas Martens Apr 11, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Das Buch geht sehr beispielorientiert durch diverse Themenbereiche von Pentaho. Es ist ideal für das Selbststudium. Zudem ist das Buch in sehr einfachen Englisch geschrieben.
Amazon Verified review Amazon
Juan José Ortilles Ruiz Jan 27, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Maria is a great expert in ETL porcesses and this book is great tool to introduce and explore the power of Pentaho Data Integration.A lot of examples and use-case are found in the book and really well explained.Really useful
Amazon Verified review Amazon
Harijs Mar 03, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Good, excellent book! From elementary things till more complicated possibilities how to use PDI. Recommend for all that want to be a good specialist in this area.
Amazon Verified review Amazon
Fábio de Salles Mar 05, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am a heavy Pentaho practicioner and because of that I have been following Mrs. Roldan work since the beggining.Maria Roldan ís a long time user of Pentaho Data Integration to solve real life Business Intelligence data problems. Her first book Pentaho Data Integration 4 Cookbook taught lots of newcomers on how to do important data handling with PDI. From the simplest things like read a table from a database to processing of complex multi-line text files with multilayred structures like XML or JSON. Then it was Pentaho 4.Again comes Mr. Roldan to share with us her knowledge about Data Integration chores with Pentaho Data Integration newest 8th version, the first Hitachi-Pentaho issue of PDI.And boy, she rocks!Above all this is a very polished book: You can note a lot of thinking has gone into the examples and the explanations. If the first book was a somewhat dry and a bit redundant and boring, this one is elegantly laid out and very well-writen. If for nothing else, the book is a pleasure to read due to its high quality editing.Technical books are not bought for their literary value though, but for the knowledge the carry and impart. "Learning PDI 8 CE" has all you need to get up and running with PDI 8 to a lot of your data processing needs. Files, databases, formats, error handling, jobs, calculations, complex data handling - the examples are so numerous a lazy person would just sum it up as "everything". And if all the how-tos were not enough, there are also some very important and usefull advice on how to best use PDI, how to set a project and even how to build reusable transformations and better write your jobs.There is even help on how to interactively run a Job or Transformation, making it very easy to debug the processes!No matter how true to its mission the book is, it does not have everything of course. Missing for instance are examples on how to harness the powerfull PDI's log system to use it in regular ETL processes, nor there are any example on how to integrate Mathematic models (built with R, Weka ou RapidMiner for instance) into regular data processing/data integration tasks or even some other Pentaho tools. In fact, this issue is specifically related to in other book (PDI Cookbook 2nd. Edition.)Purchasing the book will give you access to a pack of files and examples used within the book, making it very easy to understand examples and giving a starting point to through modification, have your DI skills reach new highs with Pentaho Data Integration.All in all this is a PDI must-have book. Even if you have purchased any of her previous books, or if you are already a seasoned PDI professional, this one will surelly have something for you.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.