Naive Neural Network policy for Reinforcement Learning
We proceed with the policy as follows:
- Let us implement a naive neural network-based policy. Define a new policy to use the neural network based predictions to return the actions:
def policy_naive_nn(nn,obs): return np.argmax(nn.predict(np.array([obs])))
- Define
nn
as a simple one layer MLP network that takes the observations having four dimensions as input, and produces the probabilities of the two actions:
from keras.models import Sequential from keras.layers import Dense model = Sequential() model.add(Dense(8,input_dim=4, activation='relu')) model.add(Dense(2, activation='softmax')) model.compile(loss='categorical_crossentropy',optimizer='adam') model.summary()
This is what the model looks like:
Layer (type) Output Shape Param # ================================================================= dense_16 (Dense) (None, 8) 40 _________________________________________________________________...