Summary
In this chapter, you learned how to split and scale data for downstream modeling tasks. You now can split data manually if that is appropriate but are also familiar with the sklearn
methods to simplify the splitting tasks. You also saw how different scaling methods work and learned why min
/max
scaling might be used in some models and standardization in other models. You've seen how to make simple linear regression models, a topic to which we will return in the next chapter. Along the way, you learned why it is important to split data and hold some back from the modeling step in order to measure performance for new data. You now have the basic toolkit for preparing data for modeling, which is where we will begin the next chapter.