Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
The Kaggle Workbook

You're reading from   The Kaggle Workbook Self-learning exercises and valuable insights for Kaggle data science competitions

Arrow left icon
Product type Paperback
Published in Feb 2023
Publisher Packt
ISBN-13 9781804611210
Length 172 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Luca Massaron Luca Massaron
Author Profile Icon Luca Massaron
Luca Massaron
Konrad Banachewicz Konrad Banachewicz
Author Profile Icon Konrad Banachewicz
Konrad Banachewicz
Arrow right icon
View More author details
Toc

NLP Competition – Google Quest Q&A Labeling

In this chapter, we will talk about Natural Language Processing (NLP) applications – specifically, text classification. In order to demonstrate our approach, we will be using the data from the Google Quest Q&A Labeling contest: https://www.kaggle.com/competitions/google-quest-challenge.

What was this competition about? The following is the official description:

Computers are really good at answering questions with single, verifiable answers. But humans are often still better at answering questions about opinions, recommendations, or personal experiences.

Humans are better at addressing subjective questions that require a deeper, multidimensional understanding of context – something computers aren’t trained to do well…yet. Questions can take many forms – some have multi-sentence elaborations, others may be simple curiosity or a fully developed problem. They can have multiple intents or seek advice and opinions. Some may be helpful and others interesting. Some are simple right or wrong.

Unfortunately, it’s hard to build better subjective question-answering algorithms because of a lack of data and predictive models. That’s why the CrowdSource team at Google Research, a group dedicated to advancing NLP and other types of ML science via crowdsourcing, has collected data on a number of these quality scoring aspects.

In this competition, you’re challenged to use this new dataset to build predictive algorithms for different subjective aspects of question-answering. The question-answer pairs were gathered from nearly 70 different websites, in a “common-sense” fashion. Our raters received minimal guidance and training, and relied largely on their subjective interpretation of the prompts. As such, each prompt was crafted in the most intuitive fashion so that raters could simply use their common sense to complete the task. By lessening our dependency on complicated and opaque rating guidelines, we hope to increase the re-use value of this dataset. What you see is what you get!

Demonstrating these subjective labels can be predicted reliably can shine a new light on this research area. Results from this competition will inform the way future intelligent Q&A systems will get built, hopefully contributing to them becoming more human-like.

What can we gather from this introduction? First, the algorithms we build here are supposed to mimic the feedback given by the human evaluator; since this feedback constitutes our ground truth, we can expect some noise in the labels. Second, there are multiple aspects of each answer to be predicted, and those are averaged across evaluators – which means our problem is likely to be well represented by multivariate regression.

We’ve structured this chapter similarly to the previous one about computer vision problems:

  • We discuss how to start building a baseline solution
  • We then examine the top-performing solutions

The code for this chapter can be found at https://packt.link/kwbchp4.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime