Model-based approaches
The approaches we've so far shown can do a good job of learning all kinds of tasks, but an agent trained in these ways can still suffer from significant limitations:
It trains very slowly; a human can learn a game like Pong from a couple of plays, while for Q-learning, it may take millions of playthroughs to get to a similar level.
For games that require long-term planning, all the techniques perform very badly. Imagine a platform game where a player must retrieve a key from one side of a room to open a door on the other side. There will rarely be a passage of play where this occurs, and even then, the chance of learning that it was the key that lead to the extra reward from the door is miniscule.
It cannot formulate a strategy or in any way adapt to a novel opponent. It may do well against an opponent it trains against, but when presented with an opponent showing some novelty in play, it will take a long time to learn to adapt to this.
If given a new goal within an environment...