Preprocessing data
We deal with a lot of raw data in the real world. Machine learning algorithms expect data to be formatted in a certain way before they start the training process. In order to prepare the data for ingestion by machine learning algorithms, we have to preprocess it and convert it into the right format. Let's see how to do it.
Create a new Python file and import the following packages:
import numpy as np from sklearn import preprocessing
Let's define some sample data:
input_data = np.array([[5.1, -2.9, 3.3], [-1.2, 7.8, -6.1], [3.9, 0.4, 2.1], [7.3, -9.9, -4.5]])
We will be talking about several different preprocessing techniques. Let's start with binarization:
Binarization
Mean removal
Scaling
Normalization
Let's take a look at each technique, starting with the first.
Binarization
This process is used when we want to convert our numerical values into boolean values. Let's use an...