Controlling AutoML pipelines
In this section, we’ll explore a few ways we can improve our Featurizer
and Regression
calls in our pipeline by telling AutoML more about our data and what we want it to try.
We’ll start with Featurizer
.
Customizing the Featurizer
Right now, we’re getting the default behavior, where Featurizer
will use all of our columns except for the name and label column. In this setup, the Featurizer has to guess what each column means.
However, Featurizer
also lets us tell it more about our data. It does this by giving us parameters that let us explicitly tell it which columns are numeric, text, or categorical. Actually, we can figure these values out from most DataFrame
objects by looking at the column types.
Let’s start by declaring a small ColumnData
class to contain our column names:
public class ColumnData { public List<string> Text {get; set;} = new(); public List...