Chapter 2 – Classifying with scikit-learn Estimators
Scalability with the nearest neighbor
https://github.com/jnothman/scikit-learn/tree/pr2532
A naïve implementation of the nearest neighbor algorithm is quite slow—it checks all pairs of points to find those that are close together. Better implementations exist, with some implemented in scikit-learn. For instance, a kd-tree can be created that speeds up the algorithm (and this is already included in scikit-learn).
Another way to speed up this search is to use locality-sensitive hashing, Locality-Sensitive Hashing (LSH). This is a proposed improvement for scikit-learn, and hasn't made it into the package at the time of writing. The above link gives a development branch of scikit-learn that will allow you to test out LSH on a dataset. Read through the documentation attached to this branch for details on doing this.
To install it, clone the repository and follow the instructions to install the Bleeding Edge code available...