Categorical DQN
The algorithm for a categorical DQN is given as follows:
- Initialize the main network parameter with random values
- Initialize the target network parameter by copying the main network parameter
- Initialize the replay buffer , the number of atoms, and also and
- For N number of episodes, perform step 5
- For each step in the episode, that is, for t = 0, . . ., T – 1:
- Feed the state s and support values to the main categorical DQN parameterized by , and get the probability value for each support value. Then compute the Q value as .
- After computing the Q value, select an action using the epsilon-greedy policy, that is, with probability epsilon, select a random action a and with probability 1-epsilon, select an action as .
- Perform the selected action and move to the next state and obtain the reward r.
- Store the transition information in the replay buffer .
- Randomly sample a transition...