Tidying when variables are stored in column names and values
One particularly difficult form of messy data to diagnose appears whenever variables are stored both horizontally across the column names and vertically down column values. This type of dataset usually is not found in a database, but from a summarized report that someone else has already generated.
How to do it…
In this recipe, data is reshaped into tidy data with the .melt
and .pivot_table
methods.
- Read in the sensors dataset:
>>> sensors = pd.read_csv('data/sensors.csv') >>> sensors Group Property 2012 2013 2014 2015 2016 0 A Pressure 928 873 814 973 870 1 A Temperature 1026 1038 1009 1036 1042 2 A Flow 819 806 861 882 856 3 B Pressure 817 877 914 806 942 4 B Temperature 1008 1041 1009 1002 1013 5 B Flow 887 899 837 824 873
- The only variable...