What we want to do is to create a model of house prices. We will be using this open source dataset of house prices (https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data) for our linear regression model. Specifically, the dataset is the data of price of houses that have been sold in the Ames area in Massachusetts, and their associated features.
As with any machine learning project, we start by asking the most basic of questions: what do we want to predict? In this case, I've already indicated that we're going to be predicting house prices, therefore all the other data will be used as signals to predict house prices. In statistical parlance, we call house prices the dependent variable and the other fields the independent variables.
In the following sections, we will build a graph of dependent logical conditions, then with that as a plan...