Future of Humanity Institute Research Fellow Jan Leike and Machine Intelligence Research Institute Research Fellows Jessica Taylor and Benya Fallenstein have just presented new results at UAI 2016 that resolve a longstanding open problem in game theory: “**A formal solution to the grain of truth problem**.”

Game theorists have techniques for specifying agents that eventually do well on iterated games against other agents, so long as their beliefs contain a “grain of truth” — nonzero prior probability assigned to the actual game they’re playing. Getting that grain of truth was previously an unsolved problem in multiplayer games, because agents can run into infinite regresses when they try to model agents that are modelling them in turn. This result shows how to break that loop: by means of reflective oracles.

In the process, Leike, Taylor, and Fallenstein provide a rigorous and general foundation for the study of multi-agent dilemmas. This work provides a surprising and somewhat satisfying basis for approximate Nash equilibria in repeated games, folding a variety of problems in decision and game theory into a common framework.

The paper’s abstract reads:

A Bayesian agent acting in a multi-agent environment learns to predict the other agents’ policies if its prior assigns positive probability to them (in other words, its prior contains a

grain of truth). Finding a reasonably large class of policies that contains the Bayes-optimal policies with respect to this class is known as thegrain of truth problem. Only small classes are known to have a grain of truth and the literature contains several related impossibility results.In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class. When the environment is unknown, Bayes-optimal agents may fail to act optimally even asymptotically. However, agents based on Thompson sampling converge to play ε-Nash equilibria in arbitrary unknown computable multi-agent environments. While these results are purely theoretical, we show that they can be computationally approximated arbitrarily closely.

Traditionally, when modeling computer programs that model the properties of other programs (such as when modeling an agent reasoning about a game), the first program is assumed to have access to an oracle (such as a halting oracle) that can answer arbitrary questions about the second program. This works, but it doesn’t help with modeling agents that can reason about *each other*.

While a halting oracle can predict the behavior of any isolated Turing machine, it cannot predict the behavior of another Turing machine that has access to a halting oracle. If this were possible, the second machine could use its oracle to figure out what the first machine-oracle pair thinks it will do, at which point it can do the opposite, setting up a liar paradox scenario. For analogous reasons, two agents with similar resources, operating in real-world environments without any halting oracles, cannot perfectly predict each other in full generality.

Game theorists know how to build formal models of asymmetric games between a weaker player and a stronger player, where the stronger player understands the weaker player’s strategy but not vice versa. For the reasons above, however, games between agents of similar strength have resisted full formalization. As a consequence of this, game theory has until now provided no method for *designing* agents that perform well on complex iterated games containing other agents of similar strength.

Usually, the way to build an ideal agent is to have the agent consider a large list of possible policies, predict how the world would respond to each policy, and then choose the best policy by some metric. However, in multi-player games, if your agent considers a big list of policies that both it and the opponent might play, then the best policy for the opponent is usually some alternative policy that was not in your list. (And if you add that policy to your list, then the new best policy for the opponent to play is now a new alternative that wasn’t in the list, and so on.)

This is the grain of truth problem, first posed by Kalai and Lehrer in 1993: define a class of policies that is large enough to be interesting and realistic, and for which *the best response to an agent that considers that policy class is inside the class*.

Taylor and Fallenstein have developed a formalism that enables a solution: *reflective* oracles capable of answering questions about agents with access to equivalently powerful oracles. Leike has led work on demonstrating that this formalism can solve the grain of truth problem, and in the process shows that the Bayes-optimal policy generally does not converge to a Nash equilibrium. Thompson sampling, however, does converge to a Nash equilibrium — a result that comes out of another paper presented at UAI 2016, Leike, Lattimore, Orseau, and Hutter’s “Thompson sampling is asymptotically optimal in general environments.”

The key feature of reflective oracles is that they avoid diagonalization and paradoxes by randomising in the relevant cases. This allows agents with access to a reflective oracle to consistently reason about the behaviour of arbitrary agents that also have access to a reflective oracle, which in turn makes it possible to model agents that converge to Nash equilibria by their own faculties (rather than by fiat or assumption).

This framework can be used to, e.g., define games between multiple copies of AIXI. As originally formulated, AIXI cannot entertain hypotheses about its own existence, or about the existence of similarly powerful agents; classical Bayes-optimal agents must be larger and more intelligent than their environments. With access to a reflective oracle, however, Fallenstein, Soares, and Taylor have shown that AIXI can meaningfully entertain hypotheses about itself and copies of itself while avoiding diagonalization.

The other main novelty of this paper is that reflective oracles turn out to be limit-computable, and so allow for approximation by anytime algorithms. The reflective oracles paradigm is therefore likely to be quite valuable for investigating game-theoretic questions involving generally intelligent agents that can understand and model each other.

This research was carried out before Jan Leike started working at the Future of Humanity Institute.

*Cross-posted from the Machine Intelligence Research Institute blog*