Replacing and filling
Replacing values is straightforward. You have a value that does not fit in the data and you need it to be replaced with another value. In the dataset we’re using in this chapter, there is a good example. In the documentation about the data, it is stated that the author will convert unknown values to “?”, meaning that you will not find any standard NA
values in this dataset. Therefore, it is our job as data scientists to wrangle this and replace all the ?
values with NA
.
Note
It’s worth making a note of this, as a lesson learned from this exercise: always look at the data documentation, if and when it is available. Many explanations about the way the data was collected and the meaning of each variable are contained in these documents.
Replacing the values is possible using slicing notation or the gsub()
function. In the dataset, there are three variables with ?
values: workclass
, occupation
, and native_country
.
We will replace...