Understanding Data Analytics
A new discipline called analytics engineering has emerged. An analytics engineer is primarily focused on taking the data once it’s been delivered and crafting it into consumable data products. An analytics engineer is expected to document, clean, and manipulate whatever users need, whether they are data scientists or business executives. The process of curating and shaping this data can abstractly be understood as data modeling.
In this chapter, we will go over several approaches to data modeling and documentation. We will, at the same time, start looking into PySpark APIs, as well as working with tools for code-based documentation.
By the end of the chapter, you will have built the fundamental skills to start any data analytics project.
In this chapter, we’re going to cover the following main topics:
- Graphviz and
diagrams
- Covering critical PySpark APIs for data cleaning and preparation
- Data modeling for SQL and NoSQL...