Chapter 8: Dealing with Common Data Problems
The ability to quickly assess the shortcomings of data and correct them can be the difference between being able to accomplish what you need to on time or falling behind. In this chapter, we're going to give you the tools to identify some of these problems, which you'll find are present in much of the data found in the industry.
We'll first look at when there can be too much data. This can be an issue where features can have an extremely high correlation with one another and in turn complicate a model. You'll see how to find this information and then remove the offending entries.
After that, we'll check into ways to get rid of blank, empty, or Not a Number (NaN) data that muddy the waters. This problem causes empty spaces without adding value.
We'll also look at what to do when you have categorical values. There are times when you'll need to maintain the relationship between categories, and times...