- A replay buffer is used in DQN in order to store past experiences, sample a mini-batch of data from it, and use it to train the agent.
- Target networks help in the stability of the training. This is achieved by keeping an additional neural network whose weights are updated using an exponential moving average of the weights of the main neural network. Alternatively, another approach that is also widely used is to copy the weights of the main neural network to the target network once every few thousand steps or so.
- One frame as the state will not help in the Atari Breakout problem. This is because no temporal information is deductible from one frame only. For instance, in one frame alone, the direction of motion of the ball cannot be obtained. If, however, we stack up multiple frames, the velocity and acceleration of the ball can be ascertained.
- L2 loss is known to overfit...
United States
United Kingdom
India
Germany
France
Canada
Russia
Spain
Brazil
Australia
Argentina
Austria
Belgium
Bulgaria
Chile
Colombia
Cyprus
Czechia
Denmark
Ecuador
Egypt
Estonia
Finland
Greece
Hungary
Indonesia
Ireland
Italy
Japan
Latvia
Lithuania
Luxembourg
Malaysia
Malta
Mexico
Netherlands
New Zealand
Norway
Philippines
Poland
Portugal
Romania
Singapore
Slovakia
Slovenia
South Africa
South Korea
Sweden
Switzerland
Taiwan
Thailand
Turkey
Ukraine