Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Elasticsearch 5.x Cookbook

You're reading from   Elasticsearch 5.x Cookbook Distributed Search and Analytics

Arrow left icon
Product type Paperback
Published in Feb 2017
Publisher
ISBN-13 9781786465580
Length 696 pages
Edition 3rd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Alberto Paro Alberto Paro
Author Profile Icon Alberto Paro
Alberto Paro
Arrow right icon
View More author details
Toc

Table of Contents (25) Chapters Close

Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Dedication
Preface
1. Getting Started FREE CHAPTER 2. Downloading and Setup 3. Managing Mappings 4. Basic Operations 5. Search 6. Text and Numeric Queries 7. Relationships and Geo Queries 8. Aggregations 9. Scripting 10. Managing Clusters and Nodes 11. Backup and Restore 12. User Interfaces 13. Ingest 14. Java Integration 15. Scala Integration 16. Python Integration 17. Plugin Development 18. Big Data Integration

Reading data with Apache Spark


In Spark you can read data from a lot of sources, but in general NoSQL datastores such as HBase, Accumulo, and Cassandra you have a limited query subset and you often need to scan all the data to read only the required data. Using Elasticsearch you can retrieve a subset of documents that match your Elasticsearch query.

Getting ready

To read an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

You also need a working installation of Apache Spark and the data indexed in the previous example.

How to do it...

For reading data in Elasticsearch via Apache Spark, we will perform the steps given as follows:

  1. We need to start the Spark Shell:

            ./bin/spark-shell
    
  2. We import the required classes:

            import org.elasticsearch.spark._         
    
  3. Now we can create a RDD by reading data from Elasticsearch:

            val rdd=sc.esRDD("spark/persons") 
    
  4. We can watch...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime