Summary
In this chapter, we covered a lot of ground in the information retrieval and NLP fields, including the basics of IR and how to apply machine learning to text. While doing this, we implemented a naive search engine first, and then used a learning to rank approach on top of Apache Lucene for an industrial-strength IR model.
In the next chapter, we will look at Gradient Boosting Machines, and at XGBoost, an implementation of this algorithm. This library provides state-of-the-art performance for many Data Science problems, including classification, regression, and ranking.