Summary
In this chapter, we have turned our focus to deepening our knowledge on how we can manipulate data using the PySpark API and using its core features. We have also gone through the steps required to install and use Python libraries at different instance levels and how we can use them to visualize our data using the display function.
Finally, we went through the basics of the Koalas API, which makes it easier to migrate from working with pandas to working with big data in Azure Databricks.
In the next chapter, will learn how to use Azure Databricks to run machine learning experiments, train models, and make inferences on new data using libraries such as XGBoost, sklearn, and Spark's MlLib.