Dataset
Machine learning works by featuring a dataset that we break up into a training section and a testing section. We use the training data to come up with our model. We can then prove or test that model against the remaining testing section data.
The first issue is finding a dataset with several variables and, hopefully, several hundred observations. I am using the housing data from http://uci.edu. Let's find the dataset using the following command:
> housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data") > colnames(housing) <- c("CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","PRATIO","B","LSTAT","MDEV")
There are close to 500 observations with 14 variables. We can see a summary for a better idea, as follows:
> summary(housing) CRIM ZN ...