In this particular section, we are going to visit the familiar task of text classification, but with a different dataset. We are going to try to solve the Jigsaw Toxic Comment Classification Challenge.
Kaggle – text categorization challenge
Getting the data
Note that you will need to accept the terms and conditions of the competition and data usage to get this dataset.
For a direct download, you can get the train and test data from the data tab on the challenge website.
Alternatively, you can use the official Kaggle API (github link) to download the data via a Terminal or Python program as well.
In the case of both direct download and Kaggle API, you have to split your train data into smaller train and validation...