Performing Data Exploratory Analysis
Data exploration is much easier from inside Synapse Studio, as it provides easy one-click options to examine various formats of data. You can learn about some of the options available for data exploration using Spark, SQL, and ADF/Synapse pipelines.
Note
This section primarily focuses on the Perform data exploratory analysis concept of the DP-203: Data Engineering on Microsoft Azure exam.
Data Exploration Using Spark
Data exploration is a crucial step in the data analysis process, allowing you to analyze the patterns and correlations within data. Apache Spark, an open source distributed processing system in Azure for data exploration, offers a powerful and scalable approach for handling large datasets efficiently.
Perform the following steps to do so:
- From within the Synapse Studio, right-click on the data file and select the
Load to DataFrame
option, as shown in Figure 4.56:
![Figure 4.56 – The image depicts a user interface for managing data files within a Synapse workspace. It shows a file directory and a context menu with the option “Load to DataFrame” highlighted, indicating the action to load data from a selected file into a DataFrame for analysis or manipulation.](https://static.packt-cdn.com/products/9781805124689/graphics/image/B21126_04_56.jpg)
Figure 4.56 –...