Summary
This chapter covered data ingestion and processing. We started by exploring the different patterns for batch data ingestion: ETL and ELT.
Then, we delved into the different components of the ELTL pattern, which is used to ingest and process batch data in a data lakehouse. Then, we discussed how to push or pull data into a raw data store. Finally, we discussed the pivotal role that the raw data store layer plays in data ingestion and processing.
Next, we delved into distributed computing and how it is used for processing batch data at scale.
After discussing batch data ingestion and processing, we discussed patterns for ingesting and processing stream data. Then, we discussed how to ingest stream data by publishing it to a topic and subscribing to it for processing. Finally, we learned how to micro batch the streams and exercise actions on a micro batch or a specific event of interest.
Finally, we brought all the concepts we'd discussed together and weaved...