Chapter 11: Data Visualization with PySpark
So far, from Chapter 1, Distributed Computing Primer, through Chapter 9, Machine Learning Life Cycle Management, you have learned how to ingest, integrate, and cleanse data, as well as how to make data conducive for analytics. You have also learned how to make use of clean data for practical business applications using data science and machine learning. This chapter will introduce you to the basics of deriving meaning out of data using data visualizations.
In this chapter, we're going to cover the following main topics:
- Importance of data visualization
- Techniques for visualizing data using PySpark
- Considerations for PySpark to pandas conversion
Data visualization is the process of graphically representing data using visual elements such as charts, graphs, and maps. Data visualization helps you understand patterns within data in a visual manner. In the big data world, with massive amounts of data, it is even...