Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Learning Pentaho Data Integration 8 CE
Learning Pentaho Data Integration 8 CE

Learning Pentaho Data Integration 8 CE: An end-to-end guide to exploring, transforming, and integrating your data across multiple sources , Third Edition

Arrow left icon
Profile Icon Carina Roldán
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (5 Ratings)
Paperback Dec 2017 500 pages 3rd Edition
eBook
Can$38.99 Can$55.99
Paperback
Can$69.99
Subscription
Free Trial
Arrow left icon
Profile Icon Carina Roldán
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (5 Ratings)
Paperback Dec 2017 500 pages 3rd Edition
eBook
Can$38.99 Can$55.99
Paperback
Can$69.99
Subscription
Free Trial
eBook
Can$38.99 Can$55.99
Paperback
Can$69.99
Subscription
Free Trial

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Learning Pentaho Data Integration 8 CE

Getting Started with Transformations

In the previous chapter, you used the graphical designer Spoon to create your first Transformation, Hello World. Now you're ready to begin transforming data, and at the same time get familiar with the Spoon environment.

In this chapter, you will:

  • Learn the simplest ways of transforming data
  • Get familiar with the process of designing, debugging, and testing a Transformation
  • Explore the available features for running transformations from Spoon
  • Learn basic PDI terminology related to data and metadata
  • Get an introduction to handling runtime errors

Designing and previewing transformations

In the previous chapter, you created a simple Transformation, previewed the data, and also ran the Transformation. That allowed you to get your first contact with the PDI graphical designer. In this section, you will become more familiar with the editing features, experiment the Preview option in detail, and deal with errors that may appear as you develop and test a Transformation.

Getting familiar with editing features

Editing transformations with Spoon can be very time-consuming if you're not familiar with the editing facilities that the software offers. In this section, you will learn a bit more about three editing features that you already faced in the previous...

Understanding PDI data and metadata

By now, you have already created three transformations and must have an idea of what a dataset is, the kind of data types that PDI supports, and how data is modified as it goes through the path of steps and hops. This section will provide you with a deeper understanding of these concepts:

  • We will give formal definitions for PDI basic terminology related to data and metadata
  • We will also give you a practical list of steps that will expand your toolbox for Transforming data

Understanding the PDI rowset

Transformation deal with datasets or rowsets, that is, rows of data with a predefined metadata. The metadata tells us about the structure of data, that is, the list of fields as well...

Handling errors

So far, each time you got an error, you had the opportunity to discover what kind of error it was and fix it. This is quite different from real scenarios, mainly for two reasons:

  • Real data has errors—a fact that cannot be avoided. If you fail to heed it, the transformations that run with test or sample data will probably crash when running with real data.
  • In most cases, your final work is run by an automated process and not by a user from Spoon. Therefore, if a Transformation crashes, there will be nobody who notices and reacts to that situation.

In this section, you will learn the simplest way to trap errors that may occur, avoiding unexpected crashes. This is the first step in the creation of transformations ready to be run in a production environment.

Implementing...

Summary

In this chapter, you created several transformations. As you did it, you got more familiar with the design process, including dealing with errors, previewing, and running Transformations. You had the opportunity of learning to use several PDI steps, and you also learned how to handle errors that may appear. At the same time, you were introduced to the basic terminology related to data, metadata, and transformations.

Now that you know the basics about data manipulation, it's time to change the focus. In the next chapter, we will introduce a very different, yet core, subject, task flows.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Manipulate your data by exploring, transforming, validating, and integrating it using Pentaho Data Integration 8 CE
  • A comprehensive guide exploring the features of Pentaho Data Integration 8 CE
  • Connect to any database engine, explore the databases, and perform all kind of operations on relational databases

Description

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag-and-drop design and powerful Extract-Tranform-Load (ETL) capabilities. This book shows and explains the new interactive features of Spoon, the revamped look and feel, and the newest features of the tool including transformations and jobs Executors and the invaluable Metadata Injection capability. We begin with the installation of PDI software and then move on to cover all the key PDI concepts. Each of the chapter introduces new features, enabling you to gradually get practicing with the tool. First, you will learn to do all kind of data manipulation and work with simple plain files. Then, the book teaches you how you can work with relational databases inside PDI. Moreover, you will be given a primer on data warehouse concepts and you will learn how to load data in a data warehouse. During the course of this book, you will be familiarized with its intuitive, graphical and drag-and-drop design environment. By the end of this book, you will learn everything you need to know in order to meet your data manipulation requirements. Besides, your will be given best practices and advises for designing and deploying your projects.

Who is this book for?

This book is a must-have for software developers, business intelligence analysts, IT students, or anyone involved or interested in developing ETL solutions. If you plan on using Pentaho Data Integration for doing any data manipulation task, this book will help you as well. This book is also a good starting point for data warehouse designers, architects, or anyone who is responsible for data warehouse projects and needs to load data into them.

What you will learn

  • • Explore the features and capabilities of Pentaho Data Integration 8 Community Edition
  • • Install and get started with PDI
  • • Learn the ins and outs of Spoon, the graphical designer tool
  • • Learn to get data from all kind of data sources, such as plain files, Excel spreadsheets, databases, and XML files
  • • Use Pentaho Data Integration to perform CRUD (create, read, update, and delete) operations on relationaldatabases
  • • Populate a data mart with Pentaho Data Integration
  • • Use Pentaho Data Integration to organize files and folders, run daily processes, deal with errors, and more

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 05, 2017
Length: 500 pages
Edition : 3rd
Language : English
ISBN-13 : 9781788292436
Vendor :
Pentaho
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Dec 05, 2017
Length: 500 pages
Edition : 3rd
Language : English
ISBN-13 : 9781788292436
Vendor :
Pentaho
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total Can$ 209.97
Learning Pentaho Data Integration 8 CE
Can$69.99
Learning Pentaho CTools
Can$69.99
Pentaho 8 Reporting for Java Developers
Can$69.99
Total Can$ 209.97 Stars icon

Table of Contents

16 Chapters
Getting Started with Pentaho Data Integration Chevron down icon Chevron up icon
Getting Started with Transformations Chevron down icon Chevron up icon
Creating Basic Task Flows Chevron down icon Chevron up icon
Reading and Writing Files Chevron down icon Chevron up icon
Manipulating PDI Data and Metadata Chevron down icon Chevron up icon
Controlling the Flow of Data Chevron down icon Chevron up icon
Cleansing, Validating, and Fixing Data Chevron down icon Chevron up icon
Manipulating Data by Coding Chevron down icon Chevron up icon
Transforming the Dataset Chevron down icon Chevron up icon
Performing Basic Operations with Databases Chevron down icon Chevron up icon
Loading Data Marts with PDI Chevron down icon Chevron up icon
Creating Portable and Reusable Transformations Chevron down icon Chevron up icon
Implementing Metadata Injection Chevron down icon Chevron up icon
Creating Advanced Jobs Chevron down icon Chevron up icon
Launching Transformations and Jobs from the Command Line Chevron down icon Chevron up icon
Best Practices for Designing and Deploying a PDI Project Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(5 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Mr. T. Mangiacapre Dec 22, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book does a great job covering all of the core capabilities and features of the latest PDI (8) release. It flows very well and is written in practical and easily digestible terms. Not only does it help you understand the platform and how to use and configure the core integration steps, but it also teaches you useful techniques and methods for addressing common use cases. The exercises are quite useful but I'd recommend that you go beyond them and explore other steps not covered in the book (there are over 300 of them). Well worth the investment of time and money ... I learned a lot! (Technical Field Enablement Mgr at Pentaho, Hitachi Vantara)
Amazon Verified review Amazon
Thomas Martens Apr 11, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Das Buch geht sehr beispielorientiert durch diverse Themenbereiche von Pentaho. Es ist ideal für das Selbststudium. Zudem ist das Buch in sehr einfachen Englisch geschrieben.
Amazon Verified review Amazon
Juan José Ortilles Ruiz Jan 27, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Maria is a great expert in ETL porcesses and this book is great tool to introduce and explore the power of Pentaho Data Integration.A lot of examples and use-case are found in the book and really well explained.Really useful
Amazon Verified review Amazon
Harijs Mar 03, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Good, excellent book! From elementary things till more complicated possibilities how to use PDI. Recommend for all that want to be a good specialist in this area.
Amazon Verified review Amazon
Fábio de Salles Mar 05, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am a heavy Pentaho practicioner and because of that I have been following Mrs. Roldan work since the beggining.Maria Roldan ís a long time user of Pentaho Data Integration to solve real life Business Intelligence data problems. Her first book Pentaho Data Integration 4 Cookbook taught lots of newcomers on how to do important data handling with PDI. From the simplest things like read a table from a database to processing of complex multi-line text files with multilayred structures like XML or JSON. Then it was Pentaho 4.Again comes Mr. Roldan to share with us her knowledge about Data Integration chores with Pentaho Data Integration newest 8th version, the first Hitachi-Pentaho issue of PDI.And boy, she rocks!Above all this is a very polished book: You can note a lot of thinking has gone into the examples and the explanations. If the first book was a somewhat dry and a bit redundant and boring, this one is elegantly laid out and very well-writen. If for nothing else, the book is a pleasure to read due to its high quality editing.Technical books are not bought for their literary value though, but for the knowledge the carry and impart. "Learning PDI 8 CE" has all you need to get up and running with PDI 8 to a lot of your data processing needs. Files, databases, formats, error handling, jobs, calculations, complex data handling - the examples are so numerous a lazy person would just sum it up as "everything". And if all the how-tos were not enough, there are also some very important and usefull advice on how to best use PDI, how to set a project and even how to build reusable transformations and better write your jobs.There is even help on how to interactively run a Job or Transformation, making it very easy to debug the processes!No matter how true to its mission the book is, it does not have everything of course. Missing for instance are examples on how to harness the powerfull PDI's log system to use it in regular ETL processes, nor there are any example on how to integrate Mathematic models (built with R, Weka ou RapidMiner for instance) into regular data processing/data integration tasks or even some other Pentaho tools. In fact, this issue is specifically related to in other book (PDI Cookbook 2nd. Edition.)Purchasing the book will give you access to a pack of files and examples used within the book, making it very easy to understand examples and giving a starting point to through modification, have your DI skills reach new highs with Pentaho Data Integration.All in all this is a PDI must-have book. Even if you have purchased any of her previous books, or if you are already a seasoned PDI professional, this one will surelly have something for you.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.