Data cleaning is the process of converting the raw data into a specific format that includes consistent data designed in a simpler manner. R includes a set of comprehensive tools, that are designed specially to clean the data in an effective manner. We will try to focus on cleaning the dataset here in a specific way and will carry out the following steps to this end:
- Include the libraries that are needed to clean and tidy up the dataset as follows:
> library(dplyr)
> library(tidyr)
- Analyze the summary of our dataset as shown in the following code. This will help us to focus on which attributes are important:
> summary(Autompg)
mpg cylinders displacement horsepower weight acceleration
Min. : 9.00 Min. :3.000 Min. : 68.0 150 : 22 Min. :1613 Min. : 8.00
1st Qu.:17.50 1st Qu.:4.000 1st Qu.:104.2 90 : 20 1st Qu.:2224...