- The term state is commonly used in the terminology of solving Markov decision processes, and it refers to the same entity as an observation in RL spaces. When describing MDPs specifically, it is consistent with the existing terminology to use the word state.
- We know the Q-function has converged when updates to the Q-table no longer change its values.
- The Q-table remains at its current values when the Q-function has converged.
- When the Q-function has converged, meaning updates to the Q-table no longer cause its values to change, we know that the agent has found the optimal path to the goal.
- The function numpy.argmax() returns the index of the maximum element in an array.
- The function numpy.max() returns the value of the maximum element in an array.
- The randomly-acting agent cannot learn from its actions. The Q-learning...