We will now look at how to implement A3C using Python and TensorFlow. Here, the policy network and value network share the same feature representation. We implement two kinds of policies: one is based on the CNN architecture used in DQN, and the other is based on LSTM.
We implement the FFPolicy class for the policy based on CNN:
class FFPolicy:
def __init__(self, input_shape=(84, 84, 4), n_outputs=4, network_type='cnn'):
self.width = input_shape[0]
self.height = input_shape[1]
self.channel = input_shape[2]
self.n_outputs = n_outputs
self.network_type = network_type
self.entropy_beta = 0.01
self.x = tf.placeholder(dtype=tf.float32,
shape=(None, self.channel, self.width, self.height))
self.build_model()
The constructor requires three...