Tidying when two or more values are stored in the same cell
Tabular data, by nature, is two-dimensional, and thus, there is a limited amount of information that can be presented in a single cell. As a workaround, you will occasionally see datasets with more than a single value stored in the same cell. Tidy data allows for exactly a single value for each cell. To rectify these situations, you will typically need to parse the string data into multiple columns with the methods from the str
Series accessor.
Getting ready...
In this recipe, we examine a dataset that has a column containing multiple different variables in each cell. We use the str
accessor to parse these strings into separate columns to tidy the data.
How to do it...
- Read in the Texas
cities
dataset, and identify the variables:
>>> cities = pd.read_csv('data/texas_cities.csv') >>> cities
- The
City
column looks good and contains exactly one value. TheGeolocation
column, on the other hand, contains four variables:latitude...