Identifying the percentiles of a dataset
The percentile is an interesting statistic because it can be used to measure the spread of a dataset and, at the same time, identify the center of a dataset. The percentile divides the dataset into 100 equal portions, allowing us to determine the values in a dataset above or below a certain limit. Typically, 99 percentiles will split your dataset into 100 equal portions. The value of the 50th percentile is the same value as the median.
To analyze the percentile of a dataset, we will use the percentile
method from the numpy
library in Python.
Getting ready
We will work with the COVID-19 cases again for this recipe.
How to do it…
We will compute the 60th percentile using the numpy
library:
- Import the
numpy
andpandas
libraries:import numpy as np import pandas as pd
- Load the
.csv
into a dataframe usingread_csv
. Then subset the dataframe to include only relevant columns:covid_data = pd.read_csv("covid-data...