NLP Competition – Google Quest Q&A Labeling
In this chapter, we will talk about Natural Language Processing (NLP) applications – specifically, text classification. In order to demonstrate our approach, we will be using the data from the Google Quest Q&A Labeling contest: https://www.kaggle.com/competitions/google-quest-challenge.
What was this competition about? The following is the official description:
Computers are really good at answering questions with single, verifiable answers. But humans are often still better at answering questions about opinions, recommendations, or personal experiences.
Humans are better at addressing subjective questions that require a deeper, multidimensional understanding of context – something computers aren’t trained to do well…yet. Questions can take many forms – some have multi-sentence elaborations, others may be simple curiosity or a fully developed problem. They can have multiple intents or seek advice and opinions. Some may be helpful and others interesting. Some are simple right or wrong.
Unfortunately, it’s hard to build better subjective question-answering algorithms because of a lack of data and predictive models. That’s why the CrowdSource team at Google Research, a group dedicated to advancing NLP and other types of ML science via crowdsourcing, has collected data on a number of these quality scoring aspects.
In this competition, you’re challenged to use this new dataset to build predictive algorithms for different subjective aspects of question-answering. The question-answer pairs were gathered from nearly 70 different websites, in a “common-sense” fashion. Our raters received minimal guidance and training, and relied largely on their subjective interpretation of the prompts. As such, each prompt was crafted in the most intuitive fashion so that raters could simply use their common sense to complete the task. By lessening our dependency on complicated and opaque rating guidelines, we hope to increase the re-use value of this dataset. What you see is what you get!
Demonstrating these subjective labels can be predicted reliably can shine a new light on this research area. Results from this competition will inform the way future intelligent Q&A systems will get built, hopefully contributing to them becoming more human-like.
What can we gather from this introduction? First, the algorithms we build here are supposed to mimic the feedback given by the human evaluator; since this feedback constitutes our ground truth, we can expect some noise in the labels. Second, there are multiple aspects of each answer to be predicted, and those are averaged across evaluators – which means our problem is likely to be well represented by multivariate regression.
We’ve structured this chapter similarly to the previous one about computer vision problems:
- We discuss how to start building a baseline solution
- We then examine the top-performing solutions
The code for this chapter can be found at https://packt.link/kwbchp4.