Controlled support vector machines
With real datasets, SVM can extract a very large number of support vectors to increase accuracy, and that can slow down the whole process. To allow finding out a trade-off between precision and number of support vectors, scikit-learn provides an implementation called NuSVC
, where the parameter nu
(bounded between 0—not included—and 1) can be used to control at the same time the number of support vectors (greater values will increase their number) and training errors (lower values reduce the fraction of errors). Let's consider an example with a linear kernel and a simple dataset. In the following figure, there's a scatter plot of our set:
Let's start checking the number of support vectors for a standard SVM:
>>> svc = SVC(kernel='linear') >>> svc.fit(X, Y) >>> svc.support_vectors_.shape (242L, 2L)
So the model has found 242 support vectors. Let's now try to optimize this number using cross-validation. The default value is 0.5,...