Preprocessing data
Raw data is the fuel of machine learning algorithms. But just like we cannot put crude oil into a car and instead we must use gasoline, machine learning algorithms expect data to be formatted in a certain way before the training process can begin. In order to prepare the data for ingestion by machine learning algorithms, the data must be preprocessed and converted into the right format. Let's look at some of the ways this can be accomplished.
For the examples we will analyze to work, we will need to import a few Python packages:
import numpy as np
from sklearn import preprocessing
Also, let's define some sample data:
input_data = np.array([[5.1, -2.9, 3.3],
[-1.2, 7.8, -6.1],
[3.9, 0.4, 2.1],
[7.3, -9.9, -4.5]])
These are the preprocessing techniques we will be analyzing:
- Binarization
- Mean removal
- Scaling
- Normalization