Descriptive statistics are the most fundamental measures you can calculate on your data. In this recipe, we will learn how easy it is to get familiar with our dataset in PySpark.
Exploring descriptive statistics
Getting ready
To execute this recipe, you need to have a working Spark environment. Also, we will be working off of the no_outliers DataFrame we created in the Handling outliers recipe so we assume you have followed the steps to handle duplicates, missing observations, and outliers.
No other prerequisites are required.
How to do it...
Calculating the descriptive...