Producing descriptive statistics
To fully understand the distribution of any random variable, we need to know its mean and standard deviation, minimum and maximum values, median, mode, first and third quartiles, skewness, and kurtosis.
Sometimes, it is good to perform statistical testing to confirm (or disprove) whether our data follows a specific distribution. This, however, is beyond the scope of this book.
Getting ready
To execute this recipe, all you need is pandas
. No other prerequisites are required.
How to do it…
Here is a piece of code that can quickly give you a basic understanding of your data. We assume that our data was read from a CSV file and stored in the csv_read
variable (the data_describe.py
file):
# calculate the descriptives: count, mean, std, # min, 25%, 50%, 75%, max # for a subset of columns csv_desc = csv_read[ [ 'beds','baths','sq__ft','price','s_price', 'n_price','s_sq__ft','n_sq__ft','b_price', 'p_price','d_Condo','d_Multi-Family', ...