Training: cross-entropy
To train the first approximation of the model, the cross-entropy method is used and implemented in train_crossent.py
. During the training, we randomly switch between the teacher-forcing mode (when we give the target sequence on the decoder's input) and argmax chain decoding (when we decode the sequence one step at a time, choosing the token with the highest probability in the output distribution). The decision between those two training modes is taken randomly with the fixed probability of 50%. This allows for combining the characteristics of both methods: fast convergence from teacher forcing and stable decoding from curriculum learning.
Implementation
What follows is the implementation of the cross-entropy method training from train_crossent.py
.
SAVES_DIR = "saves"
BATCH_SIZE = 32
LEARNING_RATE = 1e-3
MAX_EPOCHES = 100
log = logging.getLogger("train")
TEACHER_PROB = 0.5
In the beginning, we define hyperparameters...