Data dictionaries
A crucial part of data analysis involves creating and maintaining a data dictionary. A data dictionary is a table of metadata and notes on each column of data. One of the primary purposes of a data dictionary is to explain the meaning of the column names. The college dataset uses a lot of abbreviations that are likely to be unfamiliar to an analyst who is inspecting it for the first time.
A data dictionary for the college dataset is provided in the following college_data_dictionary.csv
file:
>>> pd.read_csv("data/college_data_dictionary.csv")
column_name description
0 INSTNM Institut...
1 CITY City Loc...
2 STABBR State Ab...
3 HBCU Historic...
4 MENONLY 0/1 Men ...
.. ... ...
22 PCTPELL Percent ...
23 PCTFLOAN Percent ...
24 UG25ABV Percent ...
25 MD_EARN_... Median E...
26 GRAD_DEB... Median d...
As you can see, it is immensely helpful in deciphering...