Let's assume that our agent even manages to figure out a consistent strategy that delivers rewards. What's next? Should they simply stick to that same strategy, generating the same reward for eternity? Or rather should they keep trying new things all the time? Perhaps, by not exploiting a known strategy, the agent can have a chance at a much bigger reward in the future? This is known as the explore-exploit dilemma, referring to the degree to which agents should explore new strategies or exploit known strategies.
At the extreme, we can better appreciate the explore-exploit dilemma by understanding how it can be detrimental to rely on known strategies for immediate reward in the long run. Experiments with rats, for example, have shown that these animals will starve themselves to death if given a mechanism to trigger the release of dopamine ...