Handling unclean data
What do we mean by unclean data? In the last section, we discussed a customer searching for pink sweater
, where pink
is the color and sweater
is the type of clothing. However, the system or the search engine cannot interpret the input in this fashion. Therefore, in our e-commerce schema design earlier, we created a query that searched across all fields available in the index. We then created a separate copyField
class to handle search across fields, such as clothes_color
, that are not being searched in the default query.
Now, will our query give good results? What if there is a brand named pink
? Then what would the results be like? First of all, we would not be sure whether pink
is intended to be the color or the brand. Suppose we say that pink
is intended to be the color, but we are also searching across brands and it will contain pink
as the brand name. The results will be a mix of both clothes_color
and brand
. In our query, we are boosting brand, so what happens is...