In this recipe, we will load a dataset and then we will conduct exploratory data analysis (EDA), such as visualizing the distributions.
We will do typical preprocessing tasks such as encoding categorical variables, and normalizing and rescaling for neural network training. We will then create a simple neural network model in Keras, train the model plotting using a generator, and plot the training and validation performance. We will look at a still quite simple dataset: the dult dataset from the UCI machine learning repository. With this dataset (also known as the Census Income dataset), the goal is to predict from census data whether someone earns more than US$50,000 per year.
Since we have a few categorical variables, we'll also deal with the encoding of categorical variables.
Since this is still an introductory recipe, we'll go through this problem with a lot of detail for illustration. We'll have the following parts:
-
Data loading and preprocessing...