Replay memory
Now, we build the experience replay buffer, which is used for storing all the agent's experience. We sample a minibatch of experience from the replay buffer for training the network:
class ReplayMemoryFast:
First, we define the __init__
method and initiate the buffer size:
def __init__(self, memory_size, minibatch_size): # max number of samples to store self.memory_size = memory_size # minibatch size self.minibatch_size = minibatch_size self.experience = [None]*self.memory_size self.current_index = 0 self.size = 0
Next, we define the store
function for storing the experiences:
def store(self, observation, action, reward, newobservation, is_terminal):
Store the experience as a tuple (current state, action
, reward
, next state, is it a terminal state):
self.experience[self.current_index] = (observation, action, reward, newobservation, is_terminal) self.current_index += 1 self.size = min(self.size...