Dask preprocessing offers scikit-learn functionalities such as scalers, encoders, and train/test splits. These preprocessing functionalities work well with Dask DataFrames and Arrays since they can fit and transform data in parallel. In this section, we will discuss feature scaling and feature encoding.
Feature scaling in Dask
As we discussed in Chapter 7, Cleaning Messy Data, feature scaling, also known as feature normalization, is used to scale the features at the same level. It can handle issues regarding different column ranges and units. Dask also offers scaling methods that have parallel execution capacity. It uses most of the methods that scikit-learn offers:
Scaler | Description |
MinMaxScaler | Transforms features by scaling each feature to a given range |
RobustScaler | Scales features using statistics that are robust to outliers |
StandardScaler | Standardizes features by removing the mean and scaling them to unit variance |
Let's scale the last_evaluation...