One of the problems with Imitation Learning is that it often focuses the agent down a path that limits its possible future moves. This isn't unlike you being shown the improper way to perform a task and then doing it that way, perhaps without thinking, only to find out later that there was a better way. Humanity, in fact, has been prone to this type of problem over and over again throughout history. Perhaps you learned as a child that swimming right after eating was dangerous, only to learn later in life through your own experimentation, or just common knowledge, that that was just a myth, a myth that was taken as fact for a very long time. Training an agent through observation is no different you limit the agent's vision in many ways to a narrow focus that is limited by what it was taught. However, there is a way to allow an agent to revert...




















































