Dropping data
In the previous recipes, we introduced how to revise and filter datasets. Following these steps almost concludes the data preprocessing and preparation phase. However, we may still find some bad data within our dataset. Thus, we should discard this bad data or unwanted records to prevent it from generating misleading results. Here, we introduce some practical methods to remove this unnecessary data.
Getting ready
Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees
and salaries
datasets by following the steps from the Renaming the data variable recipe.
How to do it…
Perform the following steps to drop an attribute from the current dataset:
First, you can drop the
last_name
column by excludinglast_name
in our filtered subset:> employees <- employees[,-5]
Or, you can assign
NULL
to the attribute you wish to drop:> employees$hire_date <- NULL
To drop rows, you can specify...