Duplicating and merging dataflows
Our final section in this chapter will look at how we can duplicate and merge dataflows. Duplicating dataflows is particularly useful as it allows us to undertake different processing on the same data without having to read a file twice or query a database twice. Merging dataflows allows us to take data from different sources and rationalize it into a single dataflow.
Duplicating data
Open the job DuplicatingData
from the Resources
directory.
It starts with a simple database query. The dataflow from this is replicated using a tReplicate component and the same dataflow is subsequently passed to two processing streams. In this case the processing is very simple, a filter on each dataflow to filter for rows from region1 or region3 respectively. As noted previously, the processing on each dataflow could be completely different, for example, one flow being extracted to a CSV file while the other transformed and imported into a different database.
Tip
The tReplicate...