An overview of supervised learning
Supervised learning entails learning a mapping between a set of input variables (typically a vector) and an output variable (also called the supervisory signal) and applying this mapping to predict the outputs for unseen data. Supervised methods attempt to discover the relationship between input variables and target variables. The relationship discovered is represented in a structure referred to as a model. Usually models describe and explain phenomena, which are hidden in the dataset and can be used for predicting the value of the target attribute knowing the values of the input attributes.
Supervised learning is the machine learning task of inferring a function from supervised training data (set of training examples). The training data consists of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function.
In order to solve the supervised learning problems, the following steps must be performed:
- Determine the type of training examples.
- Gather a training set.
- Determine the input variables of the learned function.
- Determine the structure of the learned function and corresponding learning algorithm.
- Complete the design.
- Evaluate the accuracy of the learned function.
The supervised methods can be implemented in a variety of domains such as marketing, finance, and manufacturing.
Some of the issues to consider in supervised learning are as follows:
- Bias-variance trade-off
- Function complexity and amount of training data
- Dimensionality of the input space
- Noise in the output values
- Heterogeneity of the data
- Redundancy in the data
- Presence of interactions and non-linearity