Implementation of A3C
We will now look at how to implement A3C using Python and TensorFlow. Here, the policy network and value network share the same feature representation. We implement two kinds of policies: one is based on the CNN architecture used in DQN, and the other is based on LSTM.
We implement the FFPolicy
 class for the policy based on CNN:
class FFPolicy: def __init__(self, input_shape=(84, 84, 4), n_outputs=4, network_type='cnn'): self.width = input_shape[0] self.height = input_shape[1] self.channel = input_shape[2] self.n_outputs = n_outputs self.network_type = network_type self.entropy_beta = 0.01 self.x = tf.placeholder(dtype=tf.float32, shape=(None, self.channel, self.width, self.height)) self.build_model()
The constructor requires three arguments:
- Â
input_shape
n_outputs
network_type
Â
input_shape
 is the size of the input image. After data preprocessing, the input is an 84x84x4...