Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Solr Cookbook - Third Edition

You're reading from   Solr Cookbook - Third Edition Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes

Arrow left icon
Product type Paperback
Published in Jan 2015
Publisher
ISBN-13 9781783553150
Length 356 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Rafal Kuc Rafal Kuc
Author Profile Icon Rafal Kuc
Rafal Kuc
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Apache Solr Configuration FREE CHAPTER 2. Indexing Your Data 3. Analyzing Your Text Data 4. Querying Solr 5. Faceting 6. Improving Solr Performance 7. In the Cloud 8. Using Additional Functionalities 9. Dealing with Problems 10. Real-life Situations Index

Indexing PDF files


The library on the corner, we used to go to, wants to expand its collection and become available for the wider public through the World Wide Web. It asked its book suppliers to provide sample chapters of all the books in PDF format so that they can share it with online users. With all the samples provided by the supplier comes a problem—how to extract data for the search box from more than 900,000 PDF files. Solr can do it with the use of Apache Tika (http://tika.apache.org/). This recipe will show you how to handle such a task.

How to do it...

To index PDF files, we will need to set up Solr to use extracting request handlers. To do this, we will take the following steps:

  1. First, let's edit our Solr instance, solrconfig.xml, and add the following configuration:

    <requestHandler name="/update/extract" class="solr.extraction.ExtractingRequestHandler">
     <lst name="defaults">
      <str name="fmap.content">text</str>
      <str name="lowernames">true</str...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image