Understanding datasets
In order to develop a chatbot, we are using two datasets. These datasets are as follows:
Cornell Movie-Dialogs dataset
bAbI dataset
Cornell Movie-Dialogs dataset
This dataset has been widely used for developing chatbots. You can download the Cornell Movie-Dialogs corpus from this link: https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html. This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts.
This corpus has 220,579 conversational exchanges between 10,292 pairs of movie characters. It involves 9,035 characters from 617 movies. In total, it has 304,713 utterances. This dataset also contains movie metadata. There are the following types of metadata:
Movie-related metadata includes the following details:
Genre of the movie
Release year
IMDb rating
Character-related metadata includes the following details:
Gender of 3,774 characters
Total number of characters in movies
When you download this dataset, you...