Fetching data from other systems: river
In the first chapter, we've seen how to create and update indices using the REST API. Loading the data to ElasticSearch is the main task (except, of course, searching), which should be solved when building a search application. It would be good to have some infrastructure or plugins that can handle integration of the search engine with various sources of data. ElasticSearch is a relatively new project, but already addresses this goal with a functionality called river.
What we need and what a river is
You can guess that there are two approaches for putting the data into your search system. We can pop the data from the source system, or the source system could push the data into our system. In the first case, we need some kind of service in our ElasticSearch cluster that could monitor the changes of an external data source or check these sources periodically. River is such a service. ElasticSearch takes care of this service and makes sure that only a...