Training a model with a standalone dataset
In the previous recipes in this chapter, we looked at training fastai models on a curated tabular dataset and a dataset directly loaded from Kaggle. In this recipe, we are going to examine how to train a model with a dataset that is from a self-standing file. The dataset we will use in this recipe is made up of property listings in Kuala Lumpur, Malaysia and is available from the Kaggle site at https://www.kaggle.com/dragonduck/property-listings-in-kuala-lumpur.
This dataset is not like the tabular datasets we have seen so far. The datasets we have already encountered have been well-behaved and have only required a small amount of cleanup. The Kualu Lumpur property dataset, by contrast, is a real-world dataset. In addition to missing values, it contains many errors and irregularities. It is also large enough (over 50k records) to give deep learning a decent chance to be useful on it.
Getting ready
Ensure you have followed the steps...