You're reading from Pentaho Data Integration Beginner's Guide - Second Edition Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide with this book and ebook

Product type Paperback

Published in Oct 2013

Publisher Packt

ISBN-13 9781782165040

Length 502 pages

Edition 2nd Edition

Languages

Java

Tools

Pentaho

Concepts

Data Visualization

Author (1):

María Carina Roldán

View More author details

Table of Contents (21) Chapters

Preface

1. Getting Started with Pentaho Data Integration

2. Getting Started with Transformations FREE CHAPTER

3. Manipulating Real-world Data

4. Filtering, Searching, and Performing Other Useful Operations with Data

5. Controlling the Flow of Data

6. Transforming Your Data by Coding

7. Transforming the Rowset

8. Working with Databases

9. Performing Advanced Operations with Databases

10. Creating Basic Task Flows

11. Creating Advanced Transformations and Jobs

12. Developing and Implementing a Simple Datamart

A. Working with Repositories

B. Pan and Kitchen – Launching Transformations and Jobs from the Command Line

C. Quick Reference – Steps and Job Entries

D. Spoon Shortcuts

E. Introducing PDI 5 Features

F. Best Practices

Summary

G. Pop Quiz Answers

Index

Chapter 5. Controlling the Flow of Data

In the previous chapters, you learned to transform your data in many ways. Now suppose you collect results from a survey. You receive several files with the data and those files have different formats. You have to merge those files somehow, and generate a unified view of the information. Not only that, you want to remove the rows of data whose content is irrelevant. Finally, based on the rows that interest you, you want to create another file with some statistics. This kind of requirement is very common, but requires more background in PDI.

In this chapter, you will learn how to implement this kind of task with Kettle. In particular, we will cover the following topics:

Copying and distributing rows
Splitting the stream based on conditions
Merging streams

You will also apply these concepts in the treatment of invalid data.

The rest of the chapter is locked

You're reading from Pentaho Data Integration Beginner's Guide - Second Edition Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide with this book and ebook

Table of Contents (21) Chapters

Chapter 5. Controlling the Flow of Data

Authors (1)

Personalised recommendations for you

You're reading from Pentaho Data Integration Beginner's Guide - Second Edition Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide with this book and ebook

Table of Contents (21) Chapters

Chapter 5. Controlling the Flow of Data

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you