Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Pentaho Data Integration Beginner's Guide - Second Edition

You're reading from   Pentaho Data Integration Beginner's Guide - Second Edition Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide with this book and ebook

Arrow left icon
Product type Paperback
Published in Oct 2013
Publisher Packt
ISBN-13 9781782165040
Length 502 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
María Carina Roldán María Carina Roldán
Author Profile Icon María Carina Roldán
María Carina Roldán
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Getting Started with Pentaho Data Integration 2. Getting Started with Transformations FREE CHAPTER 3. Manipulating Real-world Data 4. Filtering, Searching, and Performing Other Useful Operations with Data 5. Controlling the Flow of Data 6. Transforming Your Data by Coding 7. Transforming the Rowset 8. Working with Databases 9. Performing Advanced Operations with Databases 10. Creating Basic Task Flows 11. Creating Advanced Transformations and Jobs 12. Developing and Implementing a Simple Datamart A. Working with Repositories B. Pan and Kitchen – Launching Transformations and Jobs from the Command Line C. Quick Reference – Steps and Job Entries D. Spoon Shortcuts E. Introducing PDI 5 Features F. Best Practices G. Pop Quiz Answers Index

What this book covers

Chapter 1, Getting Started with Pentaho Data Integration, serves as the most basic introduction to PDI, presenting the tool. This chapter includes instructions for installing PDI and gives you the opportunity to play with the graphical designer (Spoon). The chapter also includes instructions for installing a MySQL server.

Chapter 2, Getting Started with Transformations, explains the fundamentals of working with transformations, including learning the simplest ways of transforming data and getting familiar with the process of designing, debugging, and testing a transformation.

Chapter 3, Manipulating Real-world Data, explains how to apply the concepts learned in the previous chapter to real-world data that comes from different sources. It also explains how to save the results to different destinations: plain files, Excel files, and more. As real data is very prone to errors, this chapter also explains the basics of handling errors and validating data.

Chapter 4, Filtering, Searching, and Performing Other Useful Operations with Data, expands the set of operations learned in previous chapters by teaching the reader a great variety of essential features such as filtering, sorting, or looking for data.

Chapter 5, Controlling the Flow of Data, explains different options that PDI offers to combine or split flows of data.

Chapter 6, Transforming Your Data by Coding, explains how JavaScript and Java coding can help in the treatment of data. It shows why you may need to code inside PDI, and explains in detail how to do it.

Chapter 7, Transforming the Rowset, explains the ability of PDI to deal with some sophisticated problems—for example, normalizing data from pivoted tables—in a simple fashion.

Chapter 8, Working with Databases, explains how to use PDI to work with databases. The list of topics covered includes connecting to a database, previewing and getting data, and inserting, updating, and deleting data. As database knowledge is not presumed, the chapter also covers fundamental concepts of databases and the SQL language.

Chapter 9, Performing Advanced Operations with Databases, explains how to perform advanced operations with databases, including those especially designed to load data warehouses. A primer on data warehouse concepts is also given in case you are not familiar with the subject.

Chapter 10, Creating Basic Task Flows, serves as an introduction to processes in PDI. Through the creation of simple jobs, you will learn what jobs are and what they are used for.

Chapter 11, Creating Advanced Transformations and Jobs, deals with advanced concepts that will allow you to build complex PDI projects. The list of covered topics includes nesting jobs, iterating on jobs and transformations, and creating subtransformations.

Chapter 12, Developing and Implementing a Simple Datamart, presents a simple datamart project, and guides you to build the datamart by using all the concepts learned throughout the book.

Appendix A, Working with Repositories, is a step-by-step guide to the creation of a PDI database repository and then gives instructions on to work with it.

Appendix B, Pan and Kitchen – Launching Transformations and Jobs from the Command Line, is a quick reference for running transformations and jobs from the command line.

Appendix C, Quick Reference – Steps and Job Entries, serves as a quick reference to steps and job entries used throughout the book.

Appendix D, Spoon Shortcuts, is an extensive list of Spoon shortcuts useful for saving time when designing and running PDI jobs and transformations.

Appendix E, Introducing PDI 5 Features, quickly introduces you to the architectural and functional features included in Kettle 5—the version that was under development when this book was written.

Appendix F, Best Practices, gives a list of best PDI practices and recommendations.

Appendix G , Pop Quiz Answers, contains answers to pop quiz questions.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image