In this chapter, we will use the pandas, NumPy, and scikit-learn Python libraries. You can get all of these libraries from the Python Anaconda distribution, which you can install by following the steps described in the Technical requirements section of Chapter 1, Foreseeing Variable Problems When Building ML Models. For the recipes in this chapter, we will use the Boston House Prices dataset from scikit-learn. To abide by machine learning best practices, we will begin each recipe by separating the data into train and test sets.
For visualizations on how the scaling techniques described in this chapter affect variable distribution, visit the accompanying Jupyter Notebooks in the dedicated GitHub repository (https://github.com/PacktPublishing/Python-Feature-Engineering-Cookbook).