Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Scaling Apache Solr

You're reading from   Scaling Apache Solr Optimize your searches using high-performance enterprise search repositories with Apache Solr

Arrow left icon
Product type Paperback
Published in Jul 2014
Publisher
ISBN-13 9781783981748
Length 298 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Hrishikesh Vijay Karambelkar Hrishikesh Vijay Karambelkar
Author Profile Icon Hrishikesh Vijay Karambelkar
Hrishikesh Vijay Karambelkar
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Understanding Apache Solr FREE CHAPTER 2. Getting Started with Apache Solr 3. Analyzing Data with Apache Solr 4. Designing Enterprise Search 5. Integrating Apache Solr 6. Distributed Search Using Apache Solr 7. Scaling Solr through Sharding, Fault Tolerance, and Integration 8. Scaling Solr through High Performance 9. Solr and Cloud Computing 10. Scaling Solr Capabilities with Big Data A. Sample Configuration for Apache Solr Index

Challenges in enterprise search

The presence of a good enterprise search solution in any organization is an important aspect of information availability. Absence of such a mechanism can possibly result in poor decision making, duplicated efforts, and lost productivity due to the inability to find the right information at any time. Any search engine typically comprises the following components:

  1. Crawlers or data collectors focus mainly on gathering the information on which a search should run.
  2. Once the data is collected, it needs to be parsed and indexed. So parsing and indexing is another important component of any enterprise search.
  3. The search component is responsible for runtime search on a user-chosen dataset.
  4. Additionally, many search engine vendors provide a plethora of components around search engines, such as administration and monitoring, log management, and customizations.

Today public web search engines have become mature. More than 90 percent of online activities begin with search engines (http://searchengineland.com/top-internet-activities-search-email-once-again-88964) and more than 100 billion global searches are being made every month (http://searchenginewatch.com/article/2051895/Global-Search-Market-Tops-Over-100-Billion-Searches-a-Month). While the focus of web-based search is more on finding out content on the Web, enterprise searches focus on helping employees find out the relevant information stored in their corporate network in any form. Corporate information lacks useful metadata that an enterprise search can use to relate, unlike web searches, which have access to HTML pages that carry a lot of useful metadata for best results. Overall, building an enterprise search engine becomes a big challenge.

Many enterprise web portals provide searches over their own data; however, they do not really solve the problem of unified data access because most of the enterprise data that is outside the purview of these portals largely remains invisible to these search solutions. This data is mainly part of various sources such as external data sources, other departmental data, individual desktops, secured data, proprietary format data, and media files. Let's look at the challenges faced in the industry for enterprise search as shown in the following figure:

Challenges in enterprise search

Let's go through each challenge in the following list and try to understand what they mean:

  • Diverse repositories: The repositories for processing the information vary from a simple web server to a complex content management system. The enterprise search engine must be capable of dealing with diverse repositories.
  • Security: Security in the enterprise has been one of the primary concerns along with fine-grained access control while dealing with enterprise search. Corporates expect data privacy from enterprise search solutions. This means two users running the same search on enterprise search may get two different sets of results based on the document-level access.
  • Variety of information: The information in any enterprise is diverse and has different dimensions, such as different types (including PDF, doc, proprietary formats, and so on) of document or different locale (such as English, French, and Hindi). An enterprise search would be required to index this information and provide a search on top of it. This is one of the challenging areas of enterprise searches.
  • Scalability: The information in any enterprise is always growing and enterprise search has to support its growth without impacting its search speed. This means the enterprise search has to be scalable to address the growth of an enterprise.
  • Relevance: Relevance is all about how closely the search results match the user expectations. Public web searches can identify relevance from various mechanisms such as links across web pages, whereas enterprise search solutions differ completely in the relevance of entities. The relevance in case of enterprise search involves understanding of current business functions and their contributions in the relevance ranking calculations. For example, a research paper publication would carry more prestige in an academic institution search engine than an on-the-job recruitment search engine.
  • Federation: Any large organization would have a plethora of applications. Some of them carry technical limitations, such as proprietary formats and inability to share the data for indexing. Many times, enterprise applications such as content management systems provide inbuilt search capabilities on their own data. Enterprise search has to consume these services and it should provide a unified search mechanism for all applications in an enterprise. A federated search plays an important role while searching through various resources.

    Tip

    A federated search enables users to run their search queries on various applications simultaneously in a delegated manner. Participating applications in a federated search perform the search operation using their own mechanism. The results are then combined and ranked together and presented as a single search result (unified search solution) to the user.

Let's take a look at fictitious enterprise search implementation for a software product development company called ITWorks Corporation. The following screenshot depicts how a possible search interface would look:

Challenges in enterprise search

A search should support basic keyword searching, as well as advanced searching across various data sources directly or through a federated search. In this case, the search is crawling through the source code, development documentation, and resources capabilities, all at once. Given such diverse content, a search should provide a unified browsing experience where the result shows up together, hiding the underlying sources. To enable rich browsing, it may provide refinements based on certain facets as shown in the screenshot. It may provide some interesting features such as sorting, spell checking, pagination, and search result highlighting. These features enhance the user experience while searching for information.

You have been reading a chapter from
Scaling Apache Solr
Published in: Jul 2014
Publisher:
ISBN-13: 9781783981748
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image