Seq2seq for chatbots
A second target application of sequence-to-sequence networks is question-answering, or chatbots.
For that purpose, download the Cornell Movie--Dialogs Corpus and preprocess it:
wget http://www.mpi-sws.org/~cristian/data/cornell_movie_dialogs_corpus.zip -P /sharedfiles/ unzip /sharedfiles/cornell_movie_dialogs_corpus.zip -d /sharedfiles/cornell_movie_dialogs_corpus python 0-preprocess_movies.py
This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts.
Since source and target sentences are in the same language, they use the same vocabulary, and the decoding network can use the same word embedding as the encoding network:
if opt.dataset == "chatbot": embeddings = encoder_params[0]
The same commands are true for chatbot
dataset:
python 1-train.py --dataset chatbot # training python 1-train.py --dataset chatbot --model model_chatbot_e100_n2_h500 # answer my question