Storing geographical points in the index
You might come across situations wherein you are supposed to store multiple locations of a company in the index. Yes of course, we can add multiple dynamic fields and remember the field names in our application, but that isn't comfortable. No worries, Solr will be able to handle such a situation and the next example will guide you how to store pairs of fields (in our case, location co-ordinates/geographical point).
Let us define three fields in the field definition section of our schema.xml
file to store company's data:
In addition to the preceding fields, we shall also have one dynamic field defined in our schema.xml
file as shown:
Our point type should look like this:
Now, let us look into our example data which I stored in the geodata.xml
file:
Let us now index our data and for doing so, run the following command from the exampledocs
directory (where our geodata.xml
file resides).
After we index our data, now it's time to run our following query to get the data:
http://localhost:8080/solr/select?q=location:10,10
If you get the following response, then its bingo! You have done it.
We have four fields, one of them being a dynamic field which we have defined in our schema.xml
file. The first field is the one responsible for holding the unique identifier. The second one is responsible for holding the name of the company. The third one, named location, is responsible for holding the geographical points and of course can have multiple values. The dynamic field will be used as a helper for the point type.
Then, we have the point type definition, which is based on the solr.PointType
class and is defined by the following two attributes:
dimension
: The number of dimensions that the field will store. In our case, as we have stored a pair of values, we set this attribute to 2.
subFieldSuffix
: It is used to store the actual values of the field. This is where our dynamic field comes into play. Using this field, we instruct Solr that our helper field will be the dynamic field ending with the suffix of _d
.
How did this type of field actually work? When defining a two dimensional field, like we did, there are actually three fields created in the index. The first field is named like the field we added in the schema.xml
file, so in our case it is location. This field will be responsible for holding the stored value of the field. Additionally, this field will only be created when we set the field attribute store to true
.
The next two fields are based on the dynamic field. Their names would be field_0_d
and field_1_d
. Fields are ordered as the field name, _
character, the index of the value, another _
character, and finally the suffix defined by the subFieldSuffix
attribute of the type.
Now, let us understand how the data is indexed. If you look at our example data file, you will see that the values in each pair are separated by the comma character. And that's how you can add the data to the index.
Querying is just the same as the pairs should be represented, except it differs from the standard one-valued fields as each value in the pair is separated by a comma character which is passed in the query.
Looking at the response, you can see that besides the location field, there are two dynamic fields (location_0_d
and location_1_d
) created.
Sort results by a distance from a point
Taking forward the above described scenario (as discussed in the Storing Geographical points in the index section of this chapter), imagine a scenario wherein you got to sort your search results based on the distance from a user's location. This section will show you how to do it.
Let us assume that we have the following index which we have added to the field definition section of schema.xml
.
Here in this example, we have assumed that the user location will be provided from the application making the query.
Our example data looks like this:
Suppose that the user is using this search application standing at the North Pole. Our query to find the companies and sort them in ascending order on the basis of the distance from the North Pole would be:
http://localhost:8080/solr/select?q=company&sort=dist(2,x,y,0,0)+asc
Our result would look something like this:
As you can see in the index structure and the data, every company is described by four fields: the unique identifier (id
), company name (name
), the latitude of the company's location (x
), and the longitude of the company's location (y
).
To achieve the expected results, we run a standard query with a non-standard sort. The sort parameter consists of a function name, dist
, which calculates the distance between points. In our example, the function (dist(2,x,y,0,0)
) takes five parameters, which are:
The first parameter mentions the algorithm used to calculate the distance. In our case, the value 2
tells Solr to calculate the Euclidean distance.
The second parameter x
contains the latitude.
The third parameter y
contains the longitude.
The fourth parameter is the latitude value of the point from which the distance will be calculated (Latitude value of North Pole is 0
).
The fifth parameter is the longitude value of the point from which the distance will be calculated (Longitude of North Pole is 0
).
If you would like to explore more about the functions available for you with Solr, you may navigate to Solr Wiki page at http://wiki.apache.org/solr/FunctionQuery