Logging based on data
As mentioned in the Monitoring our data ingest file size recipe, logging our ingest is a good practice in the data field. There are several ways to explore our ingestion logs to increase the process’s reliability and our confidence in it. In this recipe, we will start to get into the data operations field (or DataOps), where the goal is to track the behavior of data from the source until it reaches its final destination.
This recipe will explore other metrics we can track to create a reliable data pipeline.
Getting ready
For this exercise, let’s imagine we have two simple data ingests, one from a database and another from an API. Since this is a straightforward pipeline, let’s visualize it with the following diagram:
Figure 8.16 – Data ingestion phases
With this in mind, let’s explore the instances we can log to make monitoring efficient.
How to do it…
Let’s define...