Summary
Wow, we have gone through so many concepts in this chapter! First, we reviewed the basics of data modeling in SQL and NoSQL. We then looked at the fundamental Spark APIs for data cleansing and various data documentation tooling. Lastly, we looked at dimensional modeling for data warehousing. We have set up the building blocks for the data analytics engineer role with these techniques and tools. Once you understand the needs of your users, you can now model and cleanse your data.
In the next chapter, we will explore Spark further, as well as some cloud computing techniques.