Dealing with missing data
Missing or incomplete data is a problem every data analyst will have to face at one time or another. Data can be missing for any number of reasons. Maybe someone just didn’t enter the data, maybe it’s a survey and the person didn’t answer the question, or a measurement couldn’t be taken for whatever reason. No matter the reason, holes in your dataset happen all the time, and it is something that needs to be addressed.
From a data analytics point of view, the biggest problem is that most analyses won’t run with null values in the data. You get an error message, and you can’t run the code until you have done something about all the gaps. From a statistical point of view, it is a little more complicated. Removing data reduces the statistical power of the analysis, and it can even drop the number of observations below what is required for a specific analysis. Perhaps the biggest problem is that sometimes what is missing...