Chapter 8: Ingesting and Transforming Data
Welcome to the next major section of the book. In this section, we will focus on designing and developing data processing systems.
In the last chapter, we learned about implementing the serving layer and saw how to share data between services such as Synapse SQL and Spark using metastores. In this chapter, we will focus on data transformation—the process of transforming your data from its raw format to a more useful format that can be used by downstream tools and projects. Once you complete this chapter, you will be able to read data using different file formats and encodings, perform data cleansing, and run transformations using services such as Spark, SQL, and Azure Data Factory (ADF).
We will cover the following topics in this chapter:
- Transforming data by using Apache Spark
- Transforming data by using Transact-SQL (T-SQL)
- Transforming data by using ADF
- Transforming data by using Azure Synapse pipelines ...