It's ideal to have input files where the information is well-formed, that is, the number of columns and the type of its data is precise, all rows follow the same pattern, and so on. However, it is very common to find input files where the information has little or no structure or the structure doesn't follow the matrix (n rows by m columns) you expect. This is one of the situations where JavaScript can help.
Suppose that you have a file with a description of houses, which looks like the following:
...
Property Code: MCX-011
Status: Active
5 bedrooms
5 baths
Style: Contemporary
Basement
Laundry room
Fireplace
2 car garage
Central air conditioning
More Features: Attic, Clothes dryer, Clothes washer, Dishwasher
Property Code: MCX-012
4 bedrooms
3 baths
Fireplace
Attached parking
More Features: Alarm System, Eat-in Kitchen, Powder Room
Property Code: MCX-013
3 bedrooms
...
You want to compare the properties among them but it would be easier if the file had a precise structure. The JavaScript step can help you with this.
The first attempt to give structure to the data will be to add to every row the code of the house to which that row belongs. The purpose is to have the following:
- Create a new Transformation.
- Get the sample file from the book site and read it with a Text file input step. Uncheck the Header checkbox and create a single field named text.
- Run a preview. You should see the content of the file under a single column named text.
- After the input step, add a JavaScript step and double-click on it to edit it.
- In the editing area, type the following JavaScript code to create a field with the code of the property:
var prop_code;
posCod = indexOf(text,'Property Code:');
if (posCod>=0)
prop_code = trim(substr(text,posCod+15));
- Click on Get variables to add the prop_code variable to the grid under the code. The variable will contain for every row, the code for the house to which it belongs.
- Click on OK and with the JavaScript step selected, run a preview. You should see the data transformed as expected.
The code you wrote may seem a little strange at the beginning, but it is not really so complex. The general idea is to simulate a loop over the dataset rows.
The code creates a variable named prod_code, which will be used to create a new field to identify the houses. When the JavaScript code detects a property header row as for example:
Property Code: MCX-002
It sets the prop_code variable to the code it finds in that line, in this case, MCX-002.
Here comes the trick: until a new header row appears, the prop_code variable keeps that value. Thus, all the rows following a row like the one shown previously will have the same value for the prop_code variable.
This is an example where you can keep values from the previous rows in the dataset to be used in the current row.