Finding and correcting data entries
In the age of computers, human error will always come into play. Unfortunately, those mistaken keystrokes will manifest themselves in the datasets that we are tasked to work with. This will be present in everything from medical information to a car's service record.
You can check for anomalies in a few ways; one is to simply group items together and see which stand out among the other items in that group. Looking back at our college football dataset, we want to confirm that the school's conferences are all correct.
We can simply call on the Conference
column, which will be in a pandas series object. This object has many methods you can access, but the one we are interested in is pandas' Series.value_counts()
method.
Let's use that to check whether there are lone conferences:
df_ncaa_error.Conference.value_counts()
This will show the following: