Hindsight experience replay was introduced by OpenAI as a method to deal with sparse rewards, but the algorithm has also been shown to successfully generalize across tasks due in part to the novel mechanism by which HER works. The analogy used to explain HER is a game of shuffleboard, the object of which is to slide a disc down a long table to reach a goal target. When first learning the game, we will often repeatedly fail, with the disc falling off the table or playing area. Except, it is presumed that we learn by expecting to fail and give ourselves a reward when we do so. Then, internally, we can work backward by reducing our failure reward and thereby increasing other non-failure rewards. In some ways, this method resembles Pierarchy (a form of HRL that we looked at earlier), but without the extensive pretraining parts.
Using hindsight experience replay
The next collection...