Data cleaning, or tidying up the data, is the process of transforming raw data into a specific form of consistent data, which includes analysis in a simple manner. The R programming language includes a set of comprehensive tools that are specifically designed to clean the data in an effective manner. We will be focusing on cleaning the dataset here in a specific way by observing the following steps:
- Include the libraries that are required to clean and tidy up the dataset:
> library(dplyr) > library(tidyr)
- Analyze the summary of our dataset, which will help us to focus on the attributes we need to work on:
> summary(AirQualityUCI) Date Time CO(GT) PT08.S1(CO) NMHC(GT) Min. :2004-03-10 00:00:00 Min. :1899-12-31 00:00:00 Min. :-200.00 Min. :-200 Min. ...