Data collection, data cleaning, and data preprocessing
In this section, we will introduce you to various tasks involved in the process of data collection. We will describe how to collect data from multiple sources and convert them into a generic form that data scientists can use regardless of the underlying task. This process can be broken down into a few parts: data collection, data cleaning, and data preprocessing. It is worth mentioning that task-specific transformation is considered feature extraction, which will be discussed in the following section.
Collecting data
First, we will introduce different data collection methods for composing initial datasets. Different techniques are necessary, depending on how the raw data is formatted. Most datasets are either available online as an HFML file or as a JSON object. Some data is stored in Comma-Separated Values (CSV) format, which can easily be loaded through the pandas library, a popular data analysis and manipulation tool....