Indexing data with Apache Pig
Apache Pig (https://pig.apache.org/) is a tool that's frequently used to store and manipulate data in data stores. It can be very handy if you need to import some Comma-Separated Values (CSV) in Elasticsearch very quickly.
Getting ready
You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.
You need a working Pig installation. Depending on your operating system, you should follow the instructions at http://pig.apache.org/docs/r0.17.0/start.html.
If you are using macOS X with Homebrew, you can install it with brew install pig
;
in Linux/Windows, you can install it with the following commands:
wget -c https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz tar xfvz pig-0.17.0.tar.gz
How to do it...
We want to read a CSV file and write the data in Elasticsearch. We will perform the following steps to do so:
- We will download...