Summary
This chapter focused on identifying hateful and offensive language in tweets. Considering the intriguing nature of the specific task, we tried to provide a strong model from a technical perspective. In this respect, we had the opportunity to work with more advanced neural architectures and also strengthen our knowledge of new ML concepts.
Throughout the chapter, we had the chance to observe the benefits of transfer learning, which allow the construction of sophisticated applications with minimal effort. The BERT language model is a typical example and permits the fine-tuning of pre-trained models with our custom datasets. This chapter focused on more advanced techniques for text classification that belong to the family of boosting algorithms, particularly XGBoost, the hype of which was driven by its superior performance in various competitions.
The role of the validation set to fine-tune the model’s hyperparameters and the strategies to deal with imbalanced data...