Mean field game theory is the study of decision making in very large populations of small interacting agents. This theory understands the behavior of multiple agents each individually trying to optimize their position in space and time, but with their preferences being partly determined by the choices of all the other agents.
The paper considers the problem of representing and learning the behavior of a large population of agents, to construct an effective predictive model of the behavior. For example, a population’s behavior directly affects the ranking of a set of trending topics on social media, represented by the global population distribution over topics. Each user’s observation of this global state influences their choice of the next topic in which to participate, thereby contributing to future population behavior.
Classical predictive methods such as time series analysis are also used to build predictive models from data. However, these models do not consider the behavior as
the result of optimization of a reward function and so may not provide insight into the motivations that produce a population’s behavior policy. Alternatively, methods that employ the underlying population network structure assume that nodes are only influenced by a local neighborhood and do not include a representation of a global state. Hence, they face difficulty in explaining events as the result of uncontrolled implicit optimization.
MFG (mean field games) overcomes the limitations of alternative predictive methods by determining how a system naturally behaves according to its underlying optimal control policy. The paper proposes a novel approach for estimating the parameters of MFG. The main contribution of the paper is in relating the theories of MFG and Reinforcement Learning within the classic context of Markov Decision Processes (MDPs). The method suggested uses inverse RL to learn both the reward function and the forward dynamics of the MFG from data.
The paper covers the problem in three sections-- theory, algorithm, and experiment. The theoretical contribution begins by transforming a continuous time MFG formulation to a discrete time formulation and then relates the MFG to an associated MDP problem.
In the algorithm phase, an RL solution is suggested to the MFG problem. The authors relate solving an optimization problem on an MDP of a single agent with solving the inference problem of the (population-level) MFG. This leads to learning a reward function from demonstrations using a maximum likelihood approach, where the reward is represented using a deep neural network. The policy is learned through an actor-critic algorithm, based on gradient descent with respect to the policy parameters.
The algorithm is then compared with previous approaches on toy problems with artificially created reward functions. The authors then demonstrate the algorithm on real-world social data with the aim of recovering the reward function and predicting the future trajectory.
Overall Score: 26/30
Average Score: 8.66
The reviewers are unanimous in finding the work in this paper highly novel and significant. According to the reviewers, there is still minimal work at the intersection of machine learning and collective behavior, and this paper could help to stimulate the growth of that intersection. On the flip side, surprisingly, the paper was criticized with the statement “scientific content of the work has critical conceptual flaws”. However, the author refutations persuaded the reviewers that the concerns were largely addressed.