Treating invalid data by splitting and merging streams
When you are transforming data, it is not uncommon that you detect inaccuracies or errors. Sometimes the issues you find may not be severe enough to discard the rows. Maybe you can somehow guess what data was supposed to be there instead of the current values, or it can happen that you have default values for the invalid values. Let's see some examples:
- You have a field defined as a string, and this field represents the date of birth of a person. As values, you have, besides valid dates, other strings, for example
N/A
,-
,???
, and so on. Any attempt to run a calculation with these values would lead to an error. - You have two dates representing the start date and end date of the execution of a task. Suppose that you have
2018-01-05
and2017-10-31
as the start date and end date respectively. They are well-formatted dates, but if you try to calculate the time that it took to execute the task, you will get a negative value, which is clearly...