Summary
In this chapter, we covered methods to help you prepare the dataset for building the models. Many of these methods have to be applied outside of DataRobot, although DataRobot is beginning to provide support for many of the data preparation tasks. As we discussed, many of these tasks cannot be automated at this point in time, and they require domain understanding to make appropriate decisions.
Specifically, in this chapter we have learned how to connect to various data sources and how to aggregate data from these sources. We looked at examples to address missing data issues and other data manipulation that should be done prior to modeling. We also covered several methods for creating new features that can be very important for improving the model's performance.
We are now at a stage where we will be working almost completely inside the DataRobot environment to analyze the data and build models. In the next chapter, we will use DataRobot to analyze the datasets.
...