Introduction
The core of the proposed reinforcement learning framework is the Ensemble of Identical Independent Evaluators (EIIE) topology. Here, EIIE is a neural network that takes the asset history as the input and evaluates the potential growth of the asset in future. The evaluation score of each asset is used to calculate the portfolio weights for the next trading period.
The portfolio weights (which we will discuss later) are actually the market actions of the portfolio managing agent powered by reinforcement learning. An asset whose target weight is increased will be bought, while the assets with decreased target weights will be sold. Thus, the portfolio weights from the last period of trading are also fed as an input to EIIE. Therefore, the portfolio weights of each period are stored in portfolio vector memory (PVM).
The EIIE is trained in by Online Stochastic Batch Learning (OSBL) where the reward functions of the reinforcement learning framework are the average logarithmic returns...