Summary
It is critical to understand your data before using it. This chapter highlighted a variety of methods to explore and analyze our data within the Databricks ecosystem.
We began by revisiting DLT, this time focusing on how we use a feature called expectations to monitor and improve our data quality. We also introduced Databricks Lakehouse Monitoring as another tool for observing data quality. Among its many capabilities, Lakehouse Monitoring detects shifts in data distribution and alerts users to anomalies, thus preserving data integrity throughout its life cycle. We used Databricks Assistant to explore data with ad hoc queries written in English and showed why AutoML is an extremely useful tool for data exploration by automatically creating comprehensive data exploration notebooks. Together, all of these tools create a strong foundation to understand and explore your data. Finally, the chapter delved into Databricks VS and how using it to find similar documents can improve...