Summary
In this chapter, we started with the basic building blocks of a data lake. We learned that a data lake has three tiers, namely an ingestion tier to ingest the data, a storage tier to store the data, and an insight tier to take business actions. A data lake needs solid operations facilities to secure the data, as well as to guarantee its timely availability.
A Data Lake is supposed to hold the data of the entire enterprise where solid data security is essential. We learned about Apache Ranger and how it creates the fine-grained security in Hadoop by controlling access to various tools in the Hadoop ecosystem with the help of a role-based access model.
We learned about Apache Flume, which lets you build a data ingestion system using the concepts of source, channel, and sink.
We also covered a very new tool called Apache Zeppelin, which eases the data access in Data Lake with the help of simple-web based notebooks that allow you to run HDFS commands and hive queries.
We built a data lake...