Exploring and understanding data
After collecting data and loading it into R data structures, the next step in the machine learning process involves examining the data in detail. It is during this step that you will begin to explore the data's features and examples, and realize the peculiarities that make your data unique. The better you understand your data, the better you will be able to match a machine learning model to your learning problem.
The best way to understand the process of data exploration is by example. In this section, we will explore the usedcars.csv
dataset, which contains actual data about used cars recently advertised for sale on a popular U.S. website.
Tip
The usedcars.csv
dataset is available for download on Packt's website. If you are following along with the examples, be sure that this file has been downloaded and saved to your R working directory.
Since the dataset is stored in CSV form, we can use the read.csv()
function to load the data into an R data frame:
usedcars...