Summary
In this chapter, you learned about the importance of using data visualization to convey meaning from complex datasets in a simple way, as well as to easily surface patterns among data to business users. Various strategies for visualizing data with Spark were introduced. You also learned how to use data visualizations with PySpark natively using Databricks notebooks. We also looked at techniques for using plain Python visualization libraries to visualize data with Spark DataFrames. A few of the prominent open source visualization libraries, such as Matplotlib, Seaborn, Plotly, and Altair, were introduced, along with practical examples of their usage and code samples. Finally, you learned about the pitfalls of using plain Python visualizations with PySpark, the need for PySpark conversion, and some strategies to overcome these issues.
The next chapter will cover the topic of connecting various BI and SQL analysis tools to Spark, which will help you perform ad hoc data analysis...