Comparing two streams and generating differences
Suppose that you have two streams with the same structure and want to find out the differences in the data. Kettle has a step meant specifically for that purpose: the Merge Rows (diff) step. In this recipe, you will see how it works.
Suppose that you have a file with information about the fastest roller coasters around the world. Now, you get an updated file and want to find out the differences between the files: There can be new roller coasters in the list; maybe some roller coasters are no longer among the fastest. Besides, you were told that in the old file, there were some errors about the location, country, and year information, so you are also interested in knowing if some of these have changed.
Getting ready
For this recipe, you will need two files with information about roller coasters. You can download them from the book's site.
Both files have the same structure and look like the following:
Roller_Coaster|Speed|park|location|country|Year...