Grid searching
sklearn
also has, up its sleeve, another useful tool called grid searching. A grid search will by brute force try many different model parameters and give us the best one based on a metric of our choosing. For example, we can choose to optimize KNN for accuracy in the following manner:
from sklearn.grid_search import GridSearchCV # import our grid search module knn = KNeighborsClassifier() # instantiate a blank slate KNN, no neighbors k_range = range(1, 30, 2) param_grid = dict(n_neighbors=k_range) # param_grid = {"n_ neighbors": [1, 3, 5, …]} grid = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy') grid.fit(X, y)
In the grid.fit()
line of code, what is happening is that, for each combination of features, in this case we have 15 different possibilities for K, we are cross-validating each one five times. This means that by the end of this code, we will have 15 * 5 = 75 different KNN models! You can see how, when applying this technique to...