Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Scaling Apache Solr
Scaling Apache Solr

Scaling Apache Solr: Optimize your searches using high-performance enterprise search repositories with Apache Solr

Arrow left icon
Profile Icon Vijay Karambelkar
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.7 (3 Ratings)
Paperback Jul 2014 298 pages 1st Edition
eBook
$19.99 $28.99
Paperback
$47.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Vijay Karambelkar
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.7 (3 Ratings)
Paperback Jul 2014 298 pages 1st Edition
eBook
$19.99 $28.99
Paperback
$47.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$19.99 $28.99
Paperback
$47.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Scaling Apache Solr

Chapter 2. Getting Started with Apache Solr

In the previous chapter, we went through the challenges in enterprise search; then, we covered Apache Solr as an enterprise search solution, followed its architecture and features, and finally, we also looked at practical use cases for Solr in different industries. In this chapter, we will focus on starting work with Apache Solr and cover the following points:

  • Setting up the Apache Solr server
  • Configuring the Solr instance for applications
  • Consuming Apache Solr in end user applications

Setting up Apache Solr

Apache Solr is a J2EE-based web application, which runs on Apache Lucene, Tika, and other open source libraries. This section focuses on setting up an Apache Solr instance and running it. Existing users who have Solr instance running can skip this section and move on to the next section.

Apache Solr ships with a demo Jetty server, so one can simply run it through command line. This helps users run the Solr instance quickly. However, you can choose to customize it and deploy it in your own environment. Apache Solr does not ship with any installer; it has to run as part of J2EE.

Prerequisites

Before the installation of Apache Solr, you need to make sure that you have the latest JDK on your machines. You can test this by simply running the java –version command on the command line, as shown in the following screenshot:

Prerequisites

Once you have the correct Java version, you need a servlet container such as Tomcat, Jetty, Resin, Glassfish, or Weblogic installed on your machine...

Understanding the Solr structure

Now that your Solr server is set and running, the next step will be to configure your instance for your application. Most of the configuration for Apache Solr can be done through XML configuration files, which go in the conf/ directory of your Solr core home. Let's go through the configuration for the Solr instance.

When we do the deployments on the container, the solr.war file contains the actual web application that can be accessed through the URL. Each container, when started for the first time, unpacks this .war file in an internal directory (this is different for each container).

The J2EE container picks up the home directory as specified by Java parameters before starting the instance and looking for solr.xml. The solr.xml file defines the context that contains information about the core.

Note

A collection in Solr is a combination of one or more indexes spanning one or more cores of Apache Solr. A Solr core or core in Apache Solr is a running instance...

Configuring the Apache Solr for enterprise

Apache Solr allows extensive configuration to meet the needs of the consumer. Configuring the instance revolves around the following:

  • Defining a Solr schema
  • Configuring Solr parameters

Let's look at all these steps to understand the configuration of Apache Solr.

Defining a Solr schema

In an enterprise, the data is generated from all the software systems that are participating in the day-to day-operation. This data has different formats, and bringing in this data for Big Data processing requires a storage system that is flexible enough to accommodate data with varying data models. Traditional relational databases allow users to define a strict data structure and SQL-based querying mechanism.

Rather than confining users to define the data structures, NOSQL databases allow an open database with which they can store any kind of data and retrieve it by running queries that are not based on the SQL syntax. By its design, the NOSQL database is best suited...

Understanding SolrJ

Apache Solr is a web application; it can be used directly by its customers to search. The search interface can be modified and enhanced to work as an end user search tool to search in an enterprise. Solr clients can directly access Solr URL through HTTP to search and read data through various formats such as JSON and XML. Moreover, Apache Solr also allows administration through these HTTP-based services. Queries are executed by generating the URL that Solr will understand.

SolrJ is a tool that can be used by your Java-based application to connect to Apache Solr for indexing. SolrJ provides Java wrappers and adaptors to communicate with Solr and translate its results to Java objects. Using SolrJ is much more convenient than using raw HTTP and JSON. Internally, SolrJ uses Apache HttpClient to send HTTP requests. It provides a user-friendly interface, hiding connection details from the consumer application. Using SolrJ, you can index your documents and perform your queries...

Summary

In this chapter, we have set up the Apache Solr instance and tried to configure it for the J2EE container. We went through various ways to configure the Apache Solr instance in depth, finally ending and consuming the capabilities of Solr. In the next section, we will see how to work with various applications and their dataset and analyze their data using the capabilities of Apache.

Left arrow icon Right arrow icon

Description

This book is a step-by-step guide for readers who would like to learn how to build complete enterprise search solutions, with ample real-world examples and case studies. If you are a developer, designer, or architect who would like to build enterprise search solutions for your customers or organization, but have no prior knowledge of Apache Solr/Lucene technologies, this is the book for you.

What you will learn

  • Gain a complete understanding of Apache Solr and its ecosystem
  • Develop scalable, highperformance search applications using Apache Solr
  • Customize ApacheSolrbased search for different requirements
  • Discover different techniques to build highspeed enterprise searches
  • Design enterpriseready search engines and implement a scalable enterprise search functionality
  • Integrate an ApacheSolrbased search with different subsystems and legacy systems
  • Scale Apache Solr through sharding, replication, and fault tolerance
  • Learn about performance tuning for your Solrbased application while scaling your data
  • Make your enterprise search cloudready to be able to work with multiple clients

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 25, 2014
Length: 298 pages
Edition : 1st
Language : English
ISBN-13 : 9781783981748
Vendor :
Apache
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jul 25, 2014
Length: 298 pages
Edition : 1st
Language : English
ISBN-13 : 9781783981748
Vendor :
Apache
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 86.98
Scaling Apache Solr
$47.99
Apache Solr High Performance
$38.99
Total $ 86.98 Stars icon

Table of Contents

12 Chapters
1. Understanding Apache Solr Chevron down icon Chevron up icon
2. Getting Started with Apache Solr Chevron down icon Chevron up icon
3. Analyzing Data with Apache Solr Chevron down icon Chevron up icon
4. Designing Enterprise Search Chevron down icon Chevron up icon
5. Integrating Apache Solr Chevron down icon Chevron up icon
6. Distributed Search Using Apache Solr Chevron down icon Chevron up icon
7. Scaling Solr through Sharding, Fault Tolerance, and Integration Chevron down icon Chevron up icon
8. Scaling Solr through High Performance Chevron down icon Chevron up icon
9. Solr and Cloud Computing Chevron down icon Chevron up icon
10. Scaling Solr Capabilities with Big Data Chevron down icon Chevron up icon
A. Sample Configuration for Apache Solr Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.7
(3 Ratings)
5 star 66.7%
4 star 33.3%
3 star 0%
2 star 0%
1 star 0%
Sai Nov 18, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
With 10 chapters fitted into about 300 pages, not only is it well written it is most importantly well structured.I've always loved books that go a step further in demoing/displaying how the technology could be scaled for an enterprise problem and this book does that exactly. And in doing so, the author is mindful that he could have a naive audience tasting the waters of Big Data. It starts with a building a use case for Solr, then dives into the Solr architecture rounding it up with use cases for Solr application.The book covers the following topicsInstallation and configurationData analysis with Apache SolrEnterprise search design patternsSolr Integration with Java, CMS frameworks, PHP, JS, Drupal.Distributed search, leveraging the IAAS and PAAS Cloud services(Amazon Cloud, Windows Azure, etc)Scaling Solr using sharding, fault tolerance, mongoDB, stormOptimization/Fine tuning Solr ApplicationsIntegration with Hadoop, Katta, Cassandra and R is covered in the final chapter of the book.What I liked most about the book were the case studies presented with some of the chapters that demo the real world enterprise problems and the ways how Solr was used to provide an effective solution around it.All I can say is Great Job, Hrishikesh!
Amazon Verified review Amazon
A. Zubarev Oct 04, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Reading Scaling Apache Solr by Hrishikesh Karambelkar turned out to be a great surprise. Having my expectation set initially low (without an apparent reason) it suddenly unfolded into something huge I could regret having otherwise passed by. It is actually an extremely thoughtful and full of practical examples ... I do not know how to call it, whether a cookbook or handbook, but definitely a wonderful masterpiece!The book may be a good source of wisdom or advice for new projects and even serve as a guide to resolving issues or improving poorly performing search applications.Hrishikesh covers a wide (be warned, it is wide, and in a good sense of the word) variety of topics in his work:Data processing with Apache Solr (techniques)Enterprise search design principalsIntegration examples (Java, Drupal and more)Distributed search, leveraging the CloudMaking Solr scalableMonitoring of Solr including optimizationIntegration with Big Data pillars as Hadoop, Zookeeper, KattaNoSQL: MongoDB and CassandraData analysis with RAs a bonus, the author covers fixes to the most common pitfalls or errors.Blew my expectations! THIS IS the book you want to keep on your bookshelf and electronic media.After reading this book you can be assured to sail with fear the high waters of the Big Data ocean!5 starts + out of five!
Amazon Verified review Amazon
Tomasz Sobczak Oct 13, 2014
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
We live in a world flooded by data and information and all realize that if we can’t find what we’re looking for (e.g. a specific document), there’s no benefit from all these data stores. When your data sets become enormous or your systems need to process thousands of messages a second, you need to an environment that is efficient, tunable and ready for scaling. We all need well-designed search technology.A few days ago, a book called "Scaling Apache Solr" landed on my desk. The author, Hrishikesh Vijay Karambelkar, has written an extremely useful guide to one of the most popular open-source search platforms, Apache Solr. Solr is a full-text, standalone, Java search engine based on Lucene, another successful Apache project. For people working with Solr, like myself, this book should be on their Christmas shopping list! It’s one of the best on this subject.Karambelkar is an enterprise architect with a long history in both commercial products and open source technology. As he says, he currently spends most of his time solving problems for the software industry and developing the next generation of products.The book is divided into 10 chapters. Basically, the first three are an introduction to Apache Solr and cover its architecture, features, configuration and setting up. Chapter One contains many practical cases of Apache Solr, to help beginners understand the topic.Chapter Four is very interesting and describes a common pattern for enterprise search solutions. These patterns focus on data processing/integration and how to meet the requirements of users (interface, relevancy, general experience).The rest of the book mainly refers to the central topic, that is distributing search queries and how to scale/optimize a system. The book discusses all Apache Solr concepts like replication, fault tolerance, sharding and illustrates them with helpful examples. The book precisely explains SolrCloud - a bundle of built-in distributed capabilities available from version 4.0.Chapter 8, dedicated to optimization, drew my attention. It is full of useful tips concerning JVM parameters and manipulating data structures or caching layers as well."Scaling Apache Solr" covers both basic and advanced subjects. The information is well organised, clear and concise. Lots of examples and cases in this book can be absorbed by beginners. I was nicely surprised by the chapter describing integration possibilities. There’s some great information about using Solr with Cassandra, MapReduce paradigm or R (programming language for computational statistics) although I would have preferred this subject to be covered in more detail. The book has two more advantages: first, it discusses designing an enterprise search system in general terms and second, it can be treated as an introduction to large volume data processing.I believe I need to emphasize that many sections related to defining a schema, importing data, running SolrCloud or searching in near real time (NRT) are not just a raw documentation, they also have the author's well-judged advice and comments.Unfortunately, I felt some of the more advanced topics were not described in enough detail. For example, index merging, documents relevance or using dynamic fields in data structure. Moreover, reading the book, I had a feeling that some parts do not fit the title, such as the section about clustering with Carrot2 or integration with PHP web portal.In summary, I can say that I have read this book with pleasure and satisfaction, which in fact is rare regarding technology publications! For me, as a person who has been working with Solr since version 1.3, it was a great way to review and sort out some of its aspects. On the other hand, I'm pretty sure, that people starting their experience with Apache Solr will take a lot from this book. Although, it is mainly focused on advanced problems, it starts with the basics.Despite some little imperfections I can truly recommend this book, especially because it describes the concrete technology in an easy-to-read way and also refers to some general architectural patterns.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.