Tidying when two or more values are stored in the same cell
Tabular data, by nature, is two-dimensional, and thus, there is a limited amount of information that can be presented in a single cell. As a workaround, you will occasionally see datasets with more than a single value stored in the same cell. Tidy data allows for just a single value for each cell. To rectify these situations, you will typically need to parse the string data into multiple columns with the methods from the .str
attribute.
In this recipe, we examine a dataset that has a column containing multiple different variables in each cell. We use the .str
attribute to parse these strings into separate columns to tidy the data.
How to do it...
- Read in the Texas cities dataset:
>>> cities = pd.read_csv('data/texas_cities.csv') >>> cities City Geolocation 0 Houston 29.7604° N, 95.3698° W 1 Dallas 32.7767° N, 96.7970° W 2 Austin...