We start with our index having two fields, id
and _version_
. The id
field is used as the unique identifier; we informed Solr about this by adding the unqiueKey
section in schema.xml
. We will need it for functionalities such as document updates, deletes by identifiers, and so forth. The _version_
field is used by Solr internally, and is required by some Solr functionalities (such as optimistic locking); this is why we include it. The rest of the fields will be added automatically.
We also need to define the field types that we will use. Apart from the string
type used by the id
field, and the long
type used by the _version_
field, it contains types our documents will use. We will also define these types in our custom processor chain in the solrconfig.xml
file.
The next thing is very important; the managed schema factory that we defined in solrconfig.xml
, which is a ManagedIndexSchemaFactory
type (the class
property set to this value). By adding this section, we say that we want Solr to manage our schema.xml
file. This means that Solr will load the schema.xml
file during startup, change its name to schema.xml.bak
, and will then create a file called managed-schema
(the value of the managedSchemaResourceName
property). From this point, we shouldn't modify our index structure manually—we should either let Solr do it during indexation or add and alter fields using the schema API (we will talk about this in the Altering the index structure on a live collection recipe in Chapter 8, Using Additional Functionalities). Since I assume that we will use the schema API, I've set the mutable
property to true
. If we want to disallow using the schema API, we should set the mutable
property to false
.
Note
Note that you need to have a single schemaFactory
defined, and it needs to be set to the ManagedIndexSchemaFactory
type. If it is not set to this type, field discovery will not work and the indexation will result in an error.
We also need to include an update request processor chain. Since we want all index requests to use our custom request chain, we add the update.chain
property and set it to add-unknown-fields
in the defaults
section of our update
request handler configuration.
Finally, the second most important thing in this recipe is our update request processor chain called add-unknown-fields
(the same as we used in the update processor configuration). It defines several update processors that allow us to get the functionality of fields and their types' discoveries. The solr.RemoveBlankFieldUpdateProcessorFactory
processor factory removes empty fields from the documents we send to indexation. The solr.ParseBooleanFieldUpdateProcessorFactory
processor factory is responsible for parsing Boolean fields; solr.ParseLongFieldUpdateProcessorFactory
parses fields that have data that uses the long type; solr.ParseDoubleFieldUpdateProcessorFactory
parses fields with data of double type; and solr.ParseDateFieldUpdateProcessorFactory
parses the date-based fields. We specify the format we want Solr to recognize (we will discuss this in more detail in the Using parsing update processors to parse data recipe in Chapter 2, Indexing Your Data).
Finally, we include the solr.AddSchemaFieldsUpdateProcessorFactory
processor factory that adds the actual fields to our managed schema. We specify the default field type to text
by adding the defaultFieldType
property. This type will be used when no other type will match the field. After the default field type definition, we see four lists called typeMapping
. These sections define the field type mappings Solr will use. Each list contains at least one valueClass
property and one fieldType
property. The valueClass
property defines the type of data Solr will assign to the field type defined by the fieldType
property.
In our case, if Solr finds a date (<str name="valueClass">java.util.Date</str>
) value in a field, it will create a new field using the tdates
field type (<str name="fieldType">tdates</str>
). If Solr finds a long or an integer value, it creates a new field using the tlongs
field type. Of course, a field won't be created if it already exists in our managed schema. The name of the field created in our managed schema will be the same as the name of the field in the indexed document.
Finally, the solr.LogUpdateProcessorFactory
processor factory tells Solr to write information about the update to log, and the solr.RunUpdateProcessorFactory
processor factory tells Solr to run the update itself.
As we can see, our data includes fields that we didn't specify in the schema.xml
file, and the document was indexed properly, which allows us to assume that the functionality works. If you want to check how our index structure looks like after indexation, use the schema API; you can do it yourself after reading the Retrieving information about the index structure recipe in Chapter 8, Using Additional Functionalities.
One thing to remember is that by default, Solr is able to automatically detect field types such as Boolean, integer, float, long, double, and date.