- [Sut88]
Richard S Sutton. “Learning to predict by the methods of temporal differences”. In: Machine learning 3 (1988), pp. 9–44.
- [HS96]
Sepp Hochreiter and Jürgen Schmidhuber. “LSTM can solve hard long time lag problems”. In: Advances in neural information processing systems 9 (1996).
- [RK04]
Reuven Y Rubinstein and Dirk P Kroese. The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning. Vol. 133. Springer, 2004.
- [SL08]
Alexander L Strehl and Michael L Littman. “An analysis of model-based interval estimation for Markov decision processes”. In: Journal of Computer and System Sciences 74.8 (2008), pp. 1309–1331.
- [Kro+11]
Dirk P Kroese et al. “Cross-entropy method’”. In: European...