Scaling Big Data with Hadoop and Solr, Second Edition: Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr

Vijay Karambelkar

$9.99 ~~$39.99~~

3 (4 Ratings)

eBook Apr 2015 166 pages 1st Edition

Vijay Karambelkar

$9.99 ~~$39.99~~

3 (4 Ratings)

eBook Apr 2015 166 pages 1st Edition

What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

View table of contents

Preview Book

Scaling Big Data with Hadoop and Solr, Second Edition

Chapter 2. Understanding Apache Solr

In the previous chapter, we discussed how big data has evolved to cater to the needs of various organizations, in order to deal with a humongous data size. There are many other challenges while working with data of different shapes. For example, the log files of any application server have semi-structured data or Microsoft Word documents, making it difficult to store the data in traditional relational storage. The challenge to handling such data is not just related to storage: there is also the big question of how to access the required information. Enterprise search engines are designed to address this problem.

Today, finding the required information within a specified timeframe has become more crucial than ever. Enterprises without information retrieval capabilities suffer from problems such as lost productivity of employees, poor decisions based on faulty/incomplete information, duplicated efforts, and so on. Given these scenarios, it is...

Configuring Solr

Apache Solr allows extensive configuration to meet the needs of the consumer. Configuring the instance revolves around the following:

Defining a schema
Configuring Solr parameters

First, let's try and understand the Apache Solr structure, and then, look at all these steps to understand the configuration of Apache Solr.

Understanding the Solr structure

The Apache Solr home folder mainly contains the configuration and index-related data. These are the following major folders in the Solr collection:

Directory	Purpose
`conf/`	This folder contains all the configuration files of Apache Solr and is mandatory. Among them, `solrconfig.xml`, and `schema.xml` are important configuration files.
`data/`	This folder stores the data related to indexes generated by Solr. This is a default location for Solr to store this information. This location can be overridden by modifying `conf/solrconfig.xml`.
`lib/`	This folder is optional. If it exists, Solr will load any Jars found in this folder...

Querying for information in Solr

We have already seen how Apache Solr effectively uses different request handlers to provide consumers with extensive ways of getting search results. Each Request Handler uses its own query parser, which extracts the parameters and their values from the query string and forms Lucene Query Objects. The standard query parser allows greater precision over search data; DisMaxQueryParser and Extended DisMaxQueryParser provide a Google-like searching syntax while searching. Depending upon which request handler called, the query syntax is changed. Let's look at some of the important terms:

Term	Meaning
`q?<string>`	The query string `<String>` can support wildcards (`:`); for example, `title:Scaling*`
`fl=id,book-name`	The field list that a search response will return
`sort=author asc`	Results/facets to be sorted by authors in an ascending order
`price[* TO 100]&rows=10&start=5`	Looks for price between 0 and 100; limits the result to...

Description

This book is aimed at developers, designers, and architects who would like to build big data enterprise search solutions for their customers or organizations. No prior knowledge of Apache Hadoop and Apache Solr/Lucene technologies is required.

Who is this book for?

What you will learn

Understand Apache Hadoop, its ecosystem, and Apache Solr
Explore industrybased architectures by designing a big data enterprise search with their applicability and benefits
Integrate Apache Solr with big data technologies such as Cassandra to enable better scalability and high availability for big data
Optimize the performance of your big data search platform with scaling data
Write MapReduce tasks to index your data
Configure your Hadoop instance to handle realworld big data problems
Work with Hadoop and Solr using realworld examples to benefit from their practical usage
Use Apache Solr as a NoSQL database

What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Frequently bought together

$48.99

$54.99

$54.99

Total $ 158.97

Winston May 28, 2015

Great Book....Big data is all the rave these days. As technologists we are faced with ever increasing ways to make sense of our data and organize it in a way that makes best business and personal use. The author does a good job of explaining the uses of Hadoop and Solr. I just wish there was more to read but what was offered has me yearning for more in the next edition hopefully.

Amazon Verified review

David Jun 10, 2015

Good book but requires that you clearly understand the targeted audience. The book is clear and is a must have for administrator of Hadoop and Solr. It explains how to configure correctly and scale such infrastructure. It also address the most common issues and how to deal with them. As such, it will probably save a lot of time and effort to Hadoop/Solr administrators.It also requires the reader to already have a good knowledge of Hadoop which make sense for a booking called scaling ;). If you are new to Hadoop, you should probably start learning on the Internet or go to a book that will introduce all the concepts because at the exception of Chapter 1 which refresh your memory, you will have to know what are the elements mentioned by the author.

PJG May 21, 2015

This book is a good to Solr and how it can be used to tackle distributed search scenarios. The first chapter is an introduction to the Hadoop stack and it gives a good description and overview of HDFS and fundamental MapReduce concepts.Chapter two gives an overview of the architecture of Apache Solr, and describes how you can install and configure it. The third chapter describes the problems which Solr can solve on its own and identifies the benefits of distributed search. It introduces different data processing work flows, and describes the advantages and disadvantages of each work flow. This chapter highlights one of the downsides of the book, namely that it reads like a very theoretical guide, rather than providing hands-on and practical advice.The fourth chapter describes how to integrate Hadoop, Solr, and HBase by using Lily. The chapter ends by describing how to divide the Solr index into multiple shards by using SolrCloud and ZooKeeper.Finally, the last chapter focuses upon optimising the performance of Apache Solr, and this is where the advice is very practical and applicable.Overall, the book contains good material but ideally there would be more on applying the theory covered in practice. At only 166 pages, it feels rather light on content, which is a shame as it's a good quality book overall.

J. Depeau Nov 08, 2016

I bought this book as I needed to learn more about Solr for work, and this looked like a really comprehensive and pretty technical guide. While there is definitely a lot of information to be found in this book, the hard part is actually weeding through everything to get it. This book desperately needs an editor! It's extremely hard to read - the writing is poor and unclear, and it's just generally littered with errors and mistakes. It's a shame, as I believe the author knows the topic and has a lot of knowledge to pass on. But for the money I spent on this book I expect something which is clear, easy to read and understand, and which has been professionally edited.

Scaling Big Data with Hadoop and Solr, Second Edition: Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr

What do you get with eBook?

Scaling Big Data with Hadoop and Solr, Second Edition

Chapter 2. Understanding Apache Solr

Setting up Apache Solr

Prerequisites for setting up Apache Solr

Note

The Apache Solr architecture

Configuring Solr

Understanding the Solr structure

Loading data in Apache Solr

Extracting request handler – Solr Cell

Querying for information in Solr

Setting up Apache Solr

Prerequisites for setting up Apache Solr

Note

Page 1 of 7

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Scaling Big Data with Hadoop and Solr, Second Edition: Understand, design, build, and optimize your big data search engine with Hadoop and Apache Solr

What do you get with eBook?

Contact Details

Billing Address

Prerequisites for setting up Apache Solr

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs