Summary
This was another small but interesting chapter. We started with implementing the star schema, then learned about delivering data in Parquet format, and, finally, looked at how we can share data between the SQL-Spark and Spark-Hive services in Azure. With all this knowledge, you should now be able to design and implement a basic serving layer for a data lake architecture using Azure services. To learn more, please follow the links that I have provided at the end of important topics.
With this, we have reached the end of the first major section of the DP-203 syllabus, Designing and Implementing Data Storage. This covers about 40–45% of the certification examination. We are getting closer to the halfway mark. Good going!
In the next section, we will learn about designing and developing data processing systems and, more specifically, about ingesting and transforming data in data lakes.