Using summary statistics to generate housing price labels
In this section, we are going to generate house price labels using summary statistics of a small set of available labeled housing price data. This is useful in real-world projects when there is insufficient labeled data for regression tasks. In such scenarios, we will generate labeled data by creating some rules based on summary statistics.
We decode the significance of the data’s underlying trends. By computing the mean of each feature within the labeled training dataset, we embark on a journey to quantify the essence of the data. This approach ingeniously leverages distance metrics to unveil the closest match for a label, bestowing unlabeled data points with the wisdom of their labeled counterparts.
Let’s load the data from the housing.csv
file using pandas:
import pandas as pd # Load the labeled data df_labeled = pd.read_csv('housing.csv')
Here’s the output: