Identifying missing values in data
Our first method of identifying missing values is to give us a better understanding of how to work with real-world data. Often, data can have missing values due to a variety of reasons, for example with survey data, some observations may not have been recorded. It is important for us to analyze our data, and get a sense of what the missing values are so we can decide how we want to handle missing values for our machine learning. To start, let's dive into a dataset that we will be interested in for the duration of this chapter, the Pima Indian Diabetes Prediction
dataset.
The Pima Indian Diabetes Prediction dataset
This dataset is available on the UCI Machine Learning Repository at:
https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes.
From the main website, we can learn a few things about this publicly available dataset. We have nine columns and 768 instances (rows). The dataset is primarily used for predicting the onset of diabetes within five years...