Performance considerations
As with most discriminative models, the performance of the support vector machine obviously depends on the optimizer selected to maximize the margin during training. Let's look at the time complexity for different configuration and applications of SVM:
- A linear model (SVM without kernel) has an asymptotic time complexity O(N) for training N labeled observations
- Nonlinear models with quadratic kernel methods (formulated as a quadratic programming problem) have an asymptotic time complexity of O(N3)
- An algorithm that uses sequential minimal optimization techniques, such as index caching or elimination of null values (as in LIBSVM), has an asymptotic time complexity of O(N2) with the worst-case scenario (quadratic optimization) of O(N3)
- Sparse problems for very large training sets (N > 10,000) also have an asymptotic time of O(N2):
The time and space complexity of the kernelized support vector machine...