Data Preparation for Deep Learning Projects
The first step in every machine learning (ML) project consists of data collection and data preparation. As a subset of ML, deep learning (DL) involves the same data processing processes. We will start this chapter by setting up a standard DL Python notebook environment using Anaconda. Then, we will provide concrete examples for collecting data in various formats (JSON, CSV, HTML, and XML). In many cases, the collected data gets cleaned up and preprocessed as it consists of unnecessary information or invalid formats.
The chapter will introduce popular techniques in this domain: filling in missing values, dropping unnecessary entries, and normalizing the values. Next, you will learn common feature extraction techniques: the bag-of-words model, term frequency-inverse document frequency, one-hot encoding, and dimensionality reduction. Additionally, we will present matplotlib
and seaborn
, which are the most popular data visualization libraries...