Grid searching
sklearn
also has another useful tool up its sleeve called grid searching. A grid search will by brute force try many different model parameters and give us the best one based on a metric of our choosing. For example, we can choose to optimize KNN for accuracy in the following manner:
from sklearn.grid_search import GridSearchCV from sklearn.neighbors import KNeighborsClassifier # import our grid search module knn = KNeighborsClassifier(n_jobs=-1) # instantiate a blank slate KNN, no neighbors k_range = list(range(1, 31, 2)) print(k_range) #k_range = range(1, 30) param_grid = dict(n_neighbors=k_range) # param_grid = {"n_ neighbors": [1, 3, 5, ...]} print(param_grid) grid = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy') grid.fit(X, y)
In the grid.fit()
line of code, what is happening is that, for each combination of features in this case, we have 15 different possibilities for K, so we are cross-validating each one five times. This means that by...