Building a production data pipeline
The data pipeline you build will do the following:
- Read files from the data lake.
- Insert the files into staging.
- Validate the staging data.
- Move staging to the warehouse.
The final data pipeline will look like the following screenshot:
We will build the data pipeline processor group by processor group. The first processor group will read the data lake.
Reading the data lake
In the first section of this book, you read files from NiFi and will do the same here. This processor group will consist of three processors – GetFile
, EvaluateJsonPath
, and UpdateCounter
– and an output port. Drag the processors and port to the canvas. In the following sections, you will configure them.
GetFile
The GetFile
processor reads files from a folder, in this case, our data lake. If you were reading a data lake in Hadoop, you would...