Chapter 3. Data Integration, Quality, and Enrichment
In the preceding chapter, we understood the details of obtaining huge volumes of data into the Data Lake's Intake Tier from various External Data Sources. We learned various Hadoop-oriented data transfer mechanisms to either; pull the data from sources or push the data in near real-time, and to perform historical or incremental loads. We also saw the key functionalities that are implemented as part of the Data Intake Tier and got architectural guidance on the Big Data tools and technologies.
Now that the data has been acquired into the Data Lake, we will explore the next logical steps that are performed on the data in this chapter. In a nutshell, we will take a closer look at the Management Tier and understand how to efficiently manage the vast amounts of data and deliver it to multiple applications and systems with a high degree of performance and scalability.
In this chapter, we will gain a deeper understanding of the following...