Chapter 5: Data Transformation and Processing with Synapse Notebooks
In this chapter, we will cover how to do data processing and transformation with Synapse notebooks. Details on using pandas DataFrames within Synapse notebooks will be covered, which will help us to explore data that is stored as Parquet files in Azure Data Lake Storage (ADLS) Gen2 as a pandas DataFrame and then write it back to ADLS Gen2 as a Parquet file.
We will be covering the following recipes:
- Landing data in ADLS Gen2
- Exploring data with ADLS Gen2 to pandas DataFrame in Synapse notebook
- Processing data from a PySpark notebook within Synapse
- Performing read-write operations to a Parquet file using Spark in Synapse
- Analytics with Spark