Chapter 5. Controlling the Flow of Data
In the previous chapters, you learned to transform your data in many ways. Now suppose you collect results from a survey. You receive several files with the data and those files have different formats. You have to merge those files somehow, and generate a unified view of the information. Not only that, you want to remove the rows of data whose content is irrelevant. Finally, based on the rows that interest you, you want to create another file with some statistics. This kind of requirement is very common, but requires more background in PDI.
In this chapter, you will learn how to implement this kind of task with Kettle. In particular, we will cover the following topics:
- Copying and distributing rows
- Splitting the stream based on conditions
- Merging streams
You will also apply these concepts in the treatment of invalid data.