Predicting house price using linear regression
Now that we have the basics covered, let us apply these concepts to a real dataset. We will consider the Boston housing price dataset (http://lib.stat.cmu.edu/datasets/boston) collected by Harrison and Rubinfield in 1978. The dataset contains 506 sample cases. Each house is assigned 14 attributes:
- CRIM – per capita crime rate by town
- ZN – proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS – proportion of non-retail business acres per town
- CHAS – Charles River dummy variable (1 if tract bounds river; 0 otherwise)
- NOX – nitric oxide concentration (parts per 10 million)
- RM – average number of rooms per dwelling
- AGE – proportion of owner-occupied units built prior to 1940
- DIS – weighted distances to five Boston employment centers
- RAD – index of accessibility to radial highways
- TAX – full-value property-tax rate...