Training a model in fastai with a non-curated tabular dataset
In Chapter 2, Exploring and Cleaning Up Data with fastai, you reviewed the curated datasets provided by fastai. In the previous recipe, you created a deep learning model that had been trained on one of these curated datasets. What if you want to train a fastai model for a tabular dataset that is not one of these curated datasets?
In this recipe, we will go through the process of ingesting a non-curated dataset – the Kaggle house prices dataset (https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data) – and training a deep learning model on it. This dataset presents some additional challenges. Compared to a curated fastai dataset, there are additional steps required to ingest the dataset, and its structure requires special handling to deal with missing values.
The goal of this recipe is to use this dataset to train a deep learning model, that then predicts whether a house has a sale...