Tidying when multiple variables are stored as a single column
Tidy datasets must have a single column for each variable. Occasionally, multiple variable names are placed in a single column with their corresponding value placed in another.
In this recipe, we identify the column containing the improperly structured variables and pivot it to create tidy data.
How to do it…
- Read in the restaurant inspections dataset, and convert the
Date
column data type todatetime64
:>>> inspections = pd.read_csv('data/restaurant_inspections.csv', ... parse_dates=['Date']) >>> inspections Name ... 0 E & E Grill House ... 1 E & E Grill House ... 2 E & E Grill House ... 3 E & E Grill House ... 4 E & E Grill House ... .. ... ... 495 PIER SIXTY ONE-THE LIGHTHOUSE ... 496...