Cleaning data with pandas
One of the most important aspects that come into play when working with data is ensuring that it's in the correct format that you need. Along with getting enough data, this might be the most vital component to training an accurate model. In this section, we're going to walk through the steps of importing a CSV file and then seeing how to analyze and clean it to make sure that it's prepped for us.
The example that we are going to look at is the data for various US university majors and how it relates to pay. Having a general sense of the domain we are looking into is critical, and this is an area that you might already have a grasp of. This dataset is provided by the excellent FiveThirtyEight site, and more information can be found here: https://github.com/fivethirtyeight/data/tree/master/college-majors.
Our goal is to see whether we can figure out whether we should have chosen another major using this data. We might even find out that...