Technology stack for Data Lake
We have covered a lot of ground so far. We discussed a conceptual, high level architecture of a data lake in Figure 1. After understanding some of the technologies available, we can now fill the conceptual data lake with real technologies which we will use to implement our data lake as shown in Table 1. Please note that we have already covered some of these technologies in the previous chapters of this book.
Data Lake Tier Name |
Technology Used |
Ingestion Tier |
Apache Flume
HDFS Copy
Apache Sqoop
|
Storage Tier |
HDFS |
Insights Tier |
Apache Zeppelin, Hive QL |
Operations Tier |
Apache Ranger
HDFS Permissions
|
Table 1 Our technology choices to build Data Lake