Handling NA values
Sometimes, it is acceptable to have NA
values in the dataset. However, for many types of analysis, NA
values need to be either removed or replaced. In the case of road length, a better estimate of total road length could be generated if the NA
values were replaced with best guesses. In the following subsections, I will walk through these three approaches to handling NA
values:
- Deletion
- Insertion
- Imputation
Deleting missing values
The simplest way to handle NA
values is to delete any entry that contains an NA
value, or a certain number of NA
values. When removing entries with NA
values, there is a trade-off between the correctness of the data and the completeness of the data. Data entries that contain NA
values may also contain several useful non-NA values, and and removing too many data entries could reduce the dataset to a point where it is no longer useful.
For this dataset, it is not that important to have all of the years present; even one year is enough to give us a rough...