Storage is one of the most critical parts of a Data Lake. Apache Hadoop (HDFS) is the core of data storage for our Data Lake. The following figure sums up this aspect quite clearly, showing batch and stream data storage components in our Data Lake, along with other technologies within Hadoop Ecosystem with regards to various aspects dealing with the storage:
Figure 02: Apache Hadoop (HDFS) as data storage
We will now understand some important concepts in data storage area. We will also concentrate explicitly on batch and speed data and how these gets stored in the Data Lake and also see some specific details in regards to these data types.