Context in data lake - data acquisition
The process of inducting data from various source systems is called data acquisition. In our data lake, we have a layer defined (in fact, the first one) which has only this responsibility to take care of.
One of the main technologies that we see doing the main job of inducting data into our data lake is using Apache Sqoop. The following sections of this chapter aim at covering Sqoop in detail so that you get a clear picture of this technology as well as get to know the data acquisition layer in detail.
Data acquisition layer
In Chapter 2, Comprehensive Concepts of a Data Lake you got a glimpse of the data acquisition layer. This layer’s responsibility is to gather information from various source systems and induct it into the data lake. This figure will refresh your memory and give you a good pictorial view of this layer:
Figure 01: Data lake - data acquisition layer
The acquisition layer should be able to handle the following:
- Bulk data: Bulk data in the...