Integrating with NumPy and scikit-learn
Elasticsearch can easily be integrated with many Python machine learning libraries. One of the most used libraries for working with datasets is NumPy. A NumPy array is a building block dataset that's used for many Python machine learning libraries. In this recipe, you will see how it's possible to use Elasticsearch as a dataset for the scikit-learn
library (https://scikit-learn.org/).
Getting ready
You will need an up and running Elasticsearch installation, as described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.
The code for this recipe can be found in the ch15/code
directory. The file we'll be using in the following section is called kmeans_example.py
.
We will be using the iris
dataset (https://en.wikipedia.org/wiki/Iris_flower_data_set), which we used in Chapter 13, Java Integration. To prepare the iris
dataset, you need to populate it by executing the PopulatingIndex
class...