Replacing categories with ordinal numbers
Ordinal encoding consists of replacing the categories with digits from 1 to k (or 0 to k-1, depending on the implementation), where k is the number of distinct categories of the variable. The numbers are assigned arbitrarily. Ordinal encoding is better suited for non-linear machine learning models, which can navigate through arbitrarily assigned numbers to find patterns that relate to the target.
In this recipe, we will perform ordinal encoding using pandas
, scikit-learn
, and feature-engine
.
How to do it...
First, let’s make the import and prepare the dataset:
- Import
pandas
and the data split function:import pandas as pd from sklearn.model_selection import train_test_split
- Let’s load the Credit Approval dataset and divide it into train and test sets:
data = pd.read_csv("credit_approval_uci.csv") X_train, X_test, y_train, y_test = train_test_split( data.drop(labels=["target...