"Every decoding is another encoding."
–
Non-numeric data is another issue that algorithm implementations cannot deal with. In addition to the core scikit-learn implementation, scikit-learn-contrib has a list of satellite projects. These projects provide additional tools to our data arsenal, and here is how they describe themselves:
"scikit-learn-contrib is a GitHub organization for gathering high-quality, scikit-learn - compatibleprojects. It also provides a template for establishing new scikit-learn compatible projects."
We are going to use one of these projects here—category_encoders. This allows us to encode non-numerical data into different forms. First, we will install the library using the pip installer, as follows:
pip install category_encoders
Before jumping into the different encoding strategies, let's first create a fictional...