Indexing data with Apache Pig
Apache Pig (https://pig.apache.org/) is a tool frequently used to store/manipulate data in datastores. It can be very handy if you need to import some CSV in Elasticsearch in a very fast way.
Getting ready
You need an up-and-running Elasticsearch installation as we described in Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
You need a working Pig installation. Depending on your operating system you should follow the instruction at http://pig.apache.org/docs/r0.16.0/start.html.
If you are using Mac OS X with Homebrew you can install it with brew install pig
.
How to do it...
We want read a CSV and write the data in Elasticsearch. We will perform the steps given as follows:
We will download a CSV dataset from geonames site: all the geoname locations of Great Britain. We can fast download them and unzip them via:
wget http://download.geonames.org/export/dump/GB.zip unzip GB.zip
We can write
es.pig
that contains...