Tidying when multiple variables are stored as column values
Tidy datasets must have a single column for each variable. Occasionally, multiple variable names are placed in a single column with their corresponding value placed in another. The general format for this kind of messy data is as follows:
In this example, the first and last three rows represent two distinct observations that should each be rows. The data needs to be pivoted such that it ends up like this:
Getting ready
In this recipe, we identify the column containing the improperly structured variables and pivot it to create tidy data.
How to do it...
- Read in the restaurant
inspections
dataset, and convert theDate
column data type todatetime64
:
>>> inspections = pd.read_csv('data/restaurant_inspections.csv', parse_dates=['Date']) >>> inspections.head()
- This dataset has two variables,
Name
andDate
, that are each correctly contained in a single column. TheInfo
column itself has five different...