Chapter 3: Data Cleansing and Integration
In the previous chapter, you were introduced to the first step of the data analytics process – that is, ingesting raw, transactional data from various source systems into a cloud-based data lake. Once we have the raw data available, we need to process, clean, and transform it into a format that helps with extracting meaningful, actionable business insights. This process of cleaning, processing, and transforming raw data is known as data cleansing and integration. This is what you will learn about in this chapter.
Raw data sourced from operational systems is not conducive for data analytics in its raw format. In this chapter, you will learn about various data integration techniques, which are useful in consolidating raw, transactional data from disparate source systems and joining them to enrich them and present the end user with a single, consolidated version of the truth. Then, you will learn how to clean and transform the form and...