Creating a process flow
Suppose that you have a dataset with a list of entities such as people, addresses, products, or names of files, just to give some examples. You need to take that data and perform some further processing such as cleaning the data, discarding the useless rows, or calculating some extra fields. Finally, you have to insert the data into a database and build an Excel sheet containing statistics about the just processed. All of this can be seen as a simple task flow or a process flow. With Kettle, you can easily implement a process flow like this.
Suppose that you have a file with a list of names and dates of birth, for example:
name,birthdate Paul,31/12/1969 Santiago,15/02/2004 Santiago,15/02/2004 Lourdes,05/08/1994 Isabella Anna,08/10/1978 Zoe, 15/01/1975
The file may have some duplicates (identical consecutive rows) and some birthdates may be absent. You want to keep only the unique rows and discard the entries of people whose date of birth is missing. Finally, you want...