Wrapping up all the processing
Now that we have completed the recipes relating to processing different kinds of tabular data, in this recipe we will be wrapping everything together in a class that can easily handle all the fit/transform operations with a pandas DataFrame as input and explicit specifications of what columns to process and how.
Getting ready
Since we will combine multiple transformations, we will take advantage of the FeatureUnion
function from scikit-learn, a function that can concatenate them together easily:
from sklearn.pipeline import FeatureUnion
As a testing dataset, we will then simply combine all our previously used test data:
example = pd.concat([
pd.DataFrame([[1, 2, 3, np.nan], [1, 3, np.nan, 4],[1, 2, 2, 2]],
columns = ['a', 'b', 'c', 'd']),
pd.DataFrame({'date_1': ['04/12/2018', '05/12/2019','07/12/2020'],
'date_2'...