Programming custom functionality
In Kettle, you have a lot of functionality provided by the built-in steps, but if that is not enough for you, there is a step named User Defined Java Class (UDJC for short) where you can program custom functionality with Java code. In this way, you can accomplish complex tasks, access Java libraries, and even access the Kettle API. The code you type into this step is compiled once and executed at runtime for each passing row.
Let's create a simple example of the use of the UDJC step. Assume that you have a text file containing sentences; you want to count the words in each row and split the flow of data into two streams depending on the number of words per sentence.
Note that in order to develop a more interesting exercise, we added some extra considerations, as follows:
There are several characters as separators, not only the blank spaces
Sometimes, you can have a sequence of separators together
Some sentences have a special character at the end, and some don...