Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Scaling Apache Solr
Scaling Apache Solr

Scaling Apache Solr: Optimize your searches using high-performance enterprise search repositories with Apache Solr

eBook
₹799.99 ₹2859.99
Paperback
₹3574.99
Subscription
Free Trial
Renews at ₹800p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Scaling Apache Solr

Chapter 2. Getting Started with Apache Solr

In the previous chapter, we went through the challenges in enterprise search; then, we covered Apache Solr as an enterprise search solution, followed its architecture and features, and finally, we also looked at practical use cases for Solr in different industries. In this chapter, we will focus on starting work with Apache Solr and cover the following points:

  • Setting up the Apache Solr server
  • Configuring the Solr instance for applications
  • Consuming Apache Solr in end user applications

Setting up Apache Solr

Apache Solr is a J2EE-based web application, which runs on Apache Lucene, Tika, and other open source libraries. This section focuses on setting up an Apache Solr instance and running it. Existing users who have Solr instance running can skip this section and move on to the next section.

Apache Solr ships with a demo Jetty server, so one can simply run it through command line. This helps users run the Solr instance quickly. However, you can choose to customize it and deploy it in your own environment. Apache Solr does not ship with any installer; it has to run as part of J2EE.

Prerequisites

Before the installation of Apache Solr, you need to make sure that you have the latest JDK on your machines. You can test this by simply running the java –version command on the command line, as shown in the following screenshot:

Prerequisites

Once you have the correct Java version, you need a servlet container such as Tomcat, Jetty, Resin, Glassfish, or Weblogic installed on your machine...

Understanding the Solr structure

Now that your Solr server is set and running, the next step will be to configure your instance for your application. Most of the configuration for Apache Solr can be done through XML configuration files, which go in the conf/ directory of your Solr core home. Let's go through the configuration for the Solr instance.

When we do the deployments on the container, the solr.war file contains the actual web application that can be accessed through the URL. Each container, when started for the first time, unpacks this .war file in an internal directory (this is different for each container).

The J2EE container picks up the home directory as specified by Java parameters before starting the instance and looking for solr.xml. The solr.xml file defines the context that contains information about the core.

Note

A collection in Solr is a combination of one or more indexes spanning one or more cores of Apache Solr. A Solr core or core in Apache Solr is a running instance...

Configuring the Apache Solr for enterprise

Apache Solr allows extensive configuration to meet the needs of the consumer. Configuring the instance revolves around the following:

  • Defining a Solr schema
  • Configuring Solr parameters

Let's look at all these steps to understand the configuration of Apache Solr.

Defining a Solr schema

In an enterprise, the data is generated from all the software systems that are participating in the day-to day-operation. This data has different formats, and bringing in this data for Big Data processing requires a storage system that is flexible enough to accommodate data with varying data models. Traditional relational databases allow users to define a strict data structure and SQL-based querying mechanism.

Rather than confining users to define the data structures, NOSQL databases allow an open database with which they can store any kind of data and retrieve it by running queries that are not based on the SQL syntax. By its design, the NOSQL database is best suited...

Understanding SolrJ

Apache Solr is a web application; it can be used directly by its customers to search. The search interface can be modified and enhanced to work as an end user search tool to search in an enterprise. Solr clients can directly access Solr URL through HTTP to search and read data through various formats such as JSON and XML. Moreover, Apache Solr also allows administration through these HTTP-based services. Queries are executed by generating the URL that Solr will understand.

SolrJ is a tool that can be used by your Java-based application to connect to Apache Solr for indexing. SolrJ provides Java wrappers and adaptors to communicate with Solr and translate its results to Java objects. Using SolrJ is much more convenient than using raw HTTP and JSON. Internally, SolrJ uses Apache HttpClient to send HTTP requests. It provides a user-friendly interface, hiding connection details from the consumer application. Using SolrJ, you can index your documents and perform your queries...

Summary

In this chapter, we have set up the Apache Solr instance and tried to configure it for the J2EE container. We went through various ways to configure the Apache Solr instance in depth, finally ending and consuming the capabilities of Solr. In the next section, we will see how to work with various applications and their dataset and analyze their data using the capabilities of Apache.

Left arrow icon Right arrow icon

Description

This book is a step-by-step guide for readers who would like to learn how to build complete enterprise search solutions, with ample real-world examples and case studies. If you are a developer, designer, or architect who would like to build enterprise search solutions for your customers or organization, but have no prior knowledge of Apache Solr/Lucene technologies, this is the book for you.

What you will learn

  • Gain a complete understanding of Apache Solr and its ecosystem
  • Develop scalable, highperformance search applications using Apache Solr
  • Customize ApacheSolrbased search for different requirements
  • Discover different techniques to build highspeed enterprise searches
  • Design enterpriseready search engines and implement a scalable enterprise search functionality
  • Integrate an ApacheSolrbased search with different subsystems and legacy systems
  • Scale Apache Solr through sharding, replication, and fault tolerance
  • Learn about performance tuning for your Solrbased application while scaling your data
  • Make your enterprise search cloudready to be able to work with multiple clients

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 25, 2014
Length: 298 pages
Edition : 1st
Language : English
ISBN-13 : 9781783981755
Vendor :
Apache
Languages :
Concepts :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 25, 2014
Length: 298 pages
Edition : 1st
Language : English
ISBN-13 : 9781783981755
Vendor :
Apache
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
₹800 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
₹4500 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts
₹5000 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 6,479.98
Scaling Apache Solr
₹3574.99
Apache Solr High Performance
₹2904.99
Total 6,479.98 Stars icon

Table of Contents

12 Chapters
1. Understanding Apache Solr Chevron down icon Chevron up icon
2. Getting Started with Apache Solr Chevron down icon Chevron up icon
3. Analyzing Data with Apache Solr Chevron down icon Chevron up icon
4. Designing Enterprise Search Chevron down icon Chevron up icon
5. Integrating Apache Solr Chevron down icon Chevron up icon
6. Distributed Search Using Apache Solr Chevron down icon Chevron up icon
7. Scaling Solr through Sharding, Fault Tolerance, and Integration Chevron down icon Chevron up icon
8. Scaling Solr through High Performance Chevron down icon Chevron up icon
9. Solr and Cloud Computing Chevron down icon Chevron up icon
10. Scaling Solr Capabilities with Big Data Chevron down icon Chevron up icon
A. Sample Configuration for Apache Solr Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.7
(3 Ratings)
5 star 66.7%
4 star 33.3%
3 star 0%
2 star 0%
1 star 0%
Sai Nov 18, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
With 10 chapters fitted into about 300 pages, not only is it well written it is most importantly well structured.I've always loved books that go a step further in demoing/displaying how the technology could be scaled for an enterprise problem and this book does that exactly. And in doing so, the author is mindful that he could have a naive audience tasting the waters of Big Data. It starts with a building a use case for Solr, then dives into the Solr architecture rounding it up with use cases for Solr application.The book covers the following topicsInstallation and configurationData analysis with Apache SolrEnterprise search design patternsSolr Integration with Java, CMS frameworks, PHP, JS, Drupal.Distributed search, leveraging the IAAS and PAAS Cloud services(Amazon Cloud, Windows Azure, etc)Scaling Solr using sharding, fault tolerance, mongoDB, stormOptimization/Fine tuning Solr ApplicationsIntegration with Hadoop, Katta, Cassandra and R is covered in the final chapter of the book.What I liked most about the book were the case studies presented with some of the chapters that demo the real world enterprise problems and the ways how Solr was used to provide an effective solution around it.All I can say is Great Job, Hrishikesh!
Amazon Verified review Amazon
A. Zubarev Oct 04, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Reading Scaling Apache Solr by Hrishikesh Karambelkar turned out to be a great surprise. Having my expectation set initially low (without an apparent reason) it suddenly unfolded into something huge I could regret having otherwise passed by. It is actually an extremely thoughtful and full of practical examples ... I do not know how to call it, whether a cookbook or handbook, but definitely a wonderful masterpiece!The book may be a good source of wisdom or advice for new projects and even serve as a guide to resolving issues or improving poorly performing search applications.Hrishikesh covers a wide (be warned, it is wide, and in a good sense of the word) variety of topics in his work:Data processing with Apache Solr (techniques)Enterprise search design principalsIntegration examples (Java, Drupal and more)Distributed search, leveraging the CloudMaking Solr scalableMonitoring of Solr including optimizationIntegration with Big Data pillars as Hadoop, Zookeeper, KattaNoSQL: MongoDB and CassandraData analysis with RAs a bonus, the author covers fixes to the most common pitfalls or errors.Blew my expectations! THIS IS the book you want to keep on your bookshelf and electronic media.After reading this book you can be assured to sail with fear the high waters of the Big Data ocean!5 starts + out of five!
Amazon Verified review Amazon
Tomasz Sobczak Oct 13, 2014
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
We live in a world flooded by data and information and all realize that if we can’t find what we’re looking for (e.g. a specific document), there’s no benefit from all these data stores. When your data sets become enormous or your systems need to process thousands of messages a second, you need to an environment that is efficient, tunable and ready for scaling. We all need well-designed search technology.A few days ago, a book called "Scaling Apache Solr" landed on my desk. The author, Hrishikesh Vijay Karambelkar, has written an extremely useful guide to one of the most popular open-source search platforms, Apache Solr. Solr is a full-text, standalone, Java search engine based on Lucene, another successful Apache project. For people working with Solr, like myself, this book should be on their Christmas shopping list! It’s one of the best on this subject.Karambelkar is an enterprise architect with a long history in both commercial products and open source technology. As he says, he currently spends most of his time solving problems for the software industry and developing the next generation of products.The book is divided into 10 chapters. Basically, the first three are an introduction to Apache Solr and cover its architecture, features, configuration and setting up. Chapter One contains many practical cases of Apache Solr, to help beginners understand the topic.Chapter Four is very interesting and describes a common pattern for enterprise search solutions. These patterns focus on data processing/integration and how to meet the requirements of users (interface, relevancy, general experience).The rest of the book mainly refers to the central topic, that is distributing search queries and how to scale/optimize a system. The book discusses all Apache Solr concepts like replication, fault tolerance, sharding and illustrates them with helpful examples. The book precisely explains SolrCloud - a bundle of built-in distributed capabilities available from version 4.0.Chapter 8, dedicated to optimization, drew my attention. It is full of useful tips concerning JVM parameters and manipulating data structures or caching layers as well."Scaling Apache Solr" covers both basic and advanced subjects. The information is well organised, clear and concise. Lots of examples and cases in this book can be absorbed by beginners. I was nicely surprised by the chapter describing integration possibilities. There’s some great information about using Solr with Cassandra, MapReduce paradigm or R (programming language for computational statistics) although I would have preferred this subject to be covered in more detail. The book has two more advantages: first, it discusses designing an enterprise search system in general terms and second, it can be treated as an introduction to large volume data processing.I believe I need to emphasize that many sections related to defining a schema, importing data, running SolrCloud or searching in near real time (NRT) are not just a raw documentation, they also have the author's well-judged advice and comments.Unfortunately, I felt some of the more advanced topics were not described in enough detail. For example, index merging, documents relevance or using dynamic fields in data structure. Moreover, reading the book, I had a feeling that some parts do not fit the title, such as the section about clustering with Carrot2 or integration with PHP web portal.In summary, I can say that I have read this book with pleasure and satisfaction, which in fact is rare regarding technology publications! For me, as a person who has been working with Solr since version 1.3, it was a great way to review and sort out some of its aspects. On the other hand, I'm pretty sure, that people starting their experience with Apache Solr will take a lot from this book. Although, it is mainly focused on advanced problems, it starts with the basics.Despite some little imperfections I can truly recommend this book, especially because it describes the concrete technology in an easy-to-read way and also refers to some general architectural patterns.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.