Validating data at runtime
While processing data there will eventually come a time where it is critical to validate the data while in stream, to ensure it is of enough high quality to continue executing the process. Kettle comes with several built-in steps that provide validation capabilities, including a generic Data Validator step which allows for data to be processed with a custom set of rules. For this recipe, we will be building some custom rules to validate author data from the books' database.
Getting ready
You must have a database that matches the books' data structures, as listed in Appendix A, Data Structures. The code to build this database is available from Packt's website.
How to do it...
Perform the following steps:
Create a new transformation.
Add a Table input step from the Input category.
Open the step and configure it to connect to the books' database. For the query, click on Get SQL select statement... and select the authors' table.
Add all the columns from the authors' table...