Summarizing Numerical Variables
Now, let's have a look at a numerical column and get a good understanding of its content. We will use some statistical measures that summarize a variable. All of these measures are referred to as descriptive statistics. In this chapter, we will introduce you to the most popular ones.
With the pandas
package, a lot of these measures have been implemented as methods. For instance, if we want to know what the highest value contained in the 'Quantity'
column is, we can use the .max()
method:
df['Quantity'].max()
You should get the following output:
80995
We can see that the maximum quantity of an item sold in this dataset is 80995
, which seems extremely high for a retail business. In a real project, this kind of unexpected value will have to be discussed and confirmed with the data owner or key stakeholders to see whether this is a genuine or an incorrect value. Now, let's have a look at the lowest value for ...