Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Administrating Solr

You're reading from   Administrating Solr Master the use of Drupal and associated scripts to administrate, monitor, and optimize Solr

Arrow left icon
Product type Paperback
Published in Oct 2013
Publisher Packt
ISBN-13 9781783283255
Length 120 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Surendra Mohan Surendra Mohan
Author Profile Icon Surendra Mohan
Surendra Mohan
Arrow right icon
View More author details
Toc

Geospatial search


Geomatics (also known as geospatial technology or geomatics engineering) is a discipline of gathering, storing, processing, and delivering geographic information, or spatial referenced information. This geographic information is based out of longitudes (vertical lines) and latitudes (horizontal lines) and can be effectively used in various ways and forms. For instance, you wish to store the location of your company when your company has multiple locations; or sorting the search results based on the distance from a point. To be more specific, geospatial is playing around with different co-ordinates throughout the globe.

In this section, we will talk about and understand how to:

  • Store geographical points in the index

  • Sort results by a distance from a point

Storing geographical points in the index

You might come across situations wherein you are supposed to store multiple locations of a company in the index. Yes of course, we can add multiple dynamic fields and remember the field names in our application, but that isn't comfortable. No worries, Solr will be able to handle such a situation and the next example will guide you how to store pairs of fields (in our case, location co-ordinates/geographical point).

Let us define three fields in the field definition section of our schema.xml file to store company's data:

<field name="id" type="string" indexed="true" stored="true" required="true" /> 
<field name="name" type="text" indexed="true" stored="true" /> 
<field name="location" type="point" indexed="true" stored="true" multiValued="true" /> 

In addition to the preceding fields, we shall also have one dynamic field defined in our schema.xml file as shown:

<dynamicField name="*_d" type="double" indexed="true" stored="true"/>

Our point type should look like this:

<fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>

Now, let us look into our example data which I stored in the geodata.xml file:

<add> 
<doc> 
<field name="id">1</field> 
<field name="name">company</field> 
<field name="location">10,10</field> 
<field name="location">30,30</field> 
</doc> 
</add>

Let us now index our data and for doing so, run the following command from the exampledocs directory (where our geodata.xml file resides).

java -jar post.jar geodata.xml

After we index our data, now it's time to run our following query to get the data:

http://localhost:8080/solr/select?q=location:10,10

If you get the following response, then its bingo! You have done it.

<?xml version="1.0" encoding="UTF-8"?> 
<response> 
<lst name="responseHeader"> 
<int name="status">0</int> 
<int name="QTime">3</int> 
<lst name="params">
<str name="q">location:10,10</str> 
</lst> 
</lst> 
<result name="response" numFound="1" start="0"> 
<doc> 
<str name="id">1</str> 
<arr name="location"> 
<str>10,10</str> 
<str>30,30</str> 
</arr> 
<arr name="location_0_d"> 
<double>10.0</double> 
<double>30.0</double> 
</arr> 
<arr name="location_1_d"> 
<double>10.0</double> 
<double>30.0</double> 
</arr> 
<str name="name">company</str> 
</doc> 
</result> 
</response>

We have four fields, one of them being a dynamic field which we have defined in our schema.xml file. The first field is the one responsible for holding the unique identifier. The second one is responsible for holding the name of the company. The third one, named location, is responsible for holding the geographical points and of course can have multiple values. The dynamic field will be used as a helper for the point type.

Then, we have the point type definition, which is based on the solr.PointType class and is defined by the following two attributes:

  • dimension: The number of dimensions that the field will store. In our case, as we have stored a pair of values, we set this attribute to 2.

  • subFieldSuffix: It is used to store the actual values of the field. This is where our dynamic field comes into play. Using this field, we instruct Solr that our helper field will be the dynamic field ending with the suffix of _d.

How did this type of field actually work? When defining a two dimensional field, like we did, there are actually three fields created in the index. The first field is named like the field we added in the schema.xml file, so in our case it is location. This field will be responsible for holding the stored value of the field. Additionally, this field will only be created when we set the field attribute store to true.

The next two fields are based on the dynamic field. Their names would be field_0_d and field_1_d. Fields are ordered as the field name, _ character, the index of the value, another _ character, and finally the suffix defined by the subFieldSuffix attribute of the type.

Now, let us understand how the data is indexed. If you look at our example data file, you will see that the values in each pair are separated by the comma character. And that's how you can add the data to the index.

Querying is just the same as the pairs should be represented, except it differs from the standard one-valued fields as each value in the pair is separated by a comma character which is passed in the query.

Looking at the response, you can see that besides the location field, there are two dynamic fields (location_0_d and location_1_d) created.

Sort results by a distance from a point

Taking forward the above described scenario (as discussed in the Storing Geographical points in the index section of this chapter), imagine a scenario wherein you got to sort your search results based on the distance from a user's location. This section will show you how to do it.

Let us assume that we have the following index which we have added to the field definition section of schema.xml.

<field name="id" type="string" indexed="true" stored="true" required="true" /> 
<field name="name" type="string" indexed="true" stored="true" /> 
<field name="x" type="float" indexed="true" stored="true" /> 
<field name="y" type="float" indexed="true" stored="true" />

Here in this example, we have assumed that the user location will be provided from the application making the query.

Our example data looks like this:

<add> 
<doc> 
<field name="id">1</field> 
<field name="name">Company 1</field> 
<field name="x">56.4</field> 
<field name="y">40.2</field> 
</doc> 
<doc> 
<field name="id">2</field> 
<field name="name">Company 2</field> 
<field name="x">50.1</field> 
<field name="y">48.9</field> 
</doc> 
<doc> 
<field name="id">3</field> 
<field name="name">Company 3</field> 
<field name="x">23.18</field> 
<field name="y">39.1</field> 
</doc> 
</add>

Suppose that the user is using this search application standing at the North Pole. Our query to find the companies and sort them in ascending order on the basis of the distance from the North Pole would be:

http://localhost:8080/solr/select?q=company&sort=dist(2,x,y,0,0)+asc

Our result would look something like this:

<?xml version="1.0" encoding="UTF-8"?> 
<response> 
<lst name="responseHeader">
<int name="status">0</int> 
<int name="QTime">2</int> 
<lst name="params"> 
<str name="q">company</str> 
<str name="sort">dist(2,x,y,0,0) asc</str> 
</lst> 
</lst> 
<result name="response" numFound="3" start="0"> 
<doc> 
<str name="id">3</str> 
<str name="name">Company 3</str> 
<float name="x">23.18</float> 
<float name="y">39.1</float> 
</doc> 
<doc> 
<str name="id">1</str> 
<str name="name">Company 1</str> 
<float name="x">56.4</float> 
<float name="y">40.2</float> 
</doc> 
<doc> 
<str name="id">2</str> 
<str name="name">Company 2</str> 
<float name="x">50.1</float> 
<float name="y">48.9</float> 
</doc> 
</result> 
</response>

As you can see in the index structure and the data, every company is described by four fields: the unique identifier (id), company name (name), the latitude of the company's location (x), and the longitude of the company's location (y).

To achieve the expected results, we run a standard query with a non-standard sort. The sort parameter consists of a function name, dist, which calculates the distance between points. In our example, the function (dist(2,x,y,0,0)) takes five parameters, which are:

The first parameter mentions the algorithm used to calculate the distance. In our case, the value 2 tells Solr to calculate the Euclidean distance.

The second parameter x contains the latitude.

The third parameter y contains the longitude.

The fourth parameter is the latitude value of the point from which the distance will be calculated (Latitude value of North Pole is 0).

The fifth parameter is the longitude value of the point from which the distance will be calculated (Longitude of North Pole is 0).

If you would like to explore more about the functions available for you with Solr, you may navigate to Solr Wiki page at http://wiki.apache.org/solr/FunctionQuery

You have been reading a chapter from
Administrating Solr
Published in: Oct 2013
Publisher: Packt
ISBN-13: 9781783283255
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image