Reading data using Spark SQL
Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. Elasticsearch Spark integration allows us to read data using SQL queries.
Spark SQL works with structured data; in other words, all entries are expected to have the same structure (the same number of fields, of the same type and name). Using unstructured data (documents with different structures) is not supported and will cause problems.
Getting ready
You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.
You also need a working installation of Apache Spark and the data that we indexed in the Indexing data using Apache Spark recipe of this chapter.
How to do it...
To read data in Elasticsearch using Apache Spark SQL and DataFrames, we will perform the following steps:
...