The experiment results
Unfortunately, the paper provided no details about very important aspects of the method, like training hyperparameters, how deeply cubes were scrambled during the training, and the obtained convergence. To fill in the missing blanks, I experimented with various values of hyperparameters (.ini files are available in the GitHub repo), but still my results are very different from those published in the paper. I observed that the training convergence of the original method is very unstable. Even with a small learning rate and a large batch size, the training eventually diverges, with the value loss component growing exponentially. Examples of this behavior are shown in Figure 21.5 and Figure 21.6 (obtained from the 2 × 2 environment):