The core concepts in machine learning
There are many important concepts in machine learning; we'll go over some of the more common topics. Machine learning involves a multi-step process that starts with data acquisition, data mining, and eventually leads to building the predictive models.
The key aspects of the model-building process involve:
- Data pre-processing: Pre-processing and feature selection (for example, centering and scaling, class imbalances, and variable importance)
- Train, test splits and cross-validation:
- Creating the training set (say, 80 percent of the data)
- Creating the test set (~ 20 percent of the data)
- Performing cross-validation
- Create model, get predictions:
- Which algorithms should you try?
- What accuracy measures are you trying to optimize?
- What tuning parameters should you use?
Data management steps in machine learning
Pre-processing, or more generally processing the data, is an integral part of most machine learning exercises. A dataset that you start out with is seldom going...