Future of Humanity Institute researcher Jan Leike won the ‘Best Student Paper’ award at the Uncertainty in Artifical Intelligence conference in New York this year. His paper, “Thompson sampling is asymptotically optimal in general environments”, was co-authored with Tor Lattimore of the University of Alberta, Laurent Orseau of Google DeepMind and Marcus Hutter of ANU.
The abstract reads:
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markov, nonergodic, and partially observable. We show that
Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.
This research was carried out before Jan Leike started working at the Future of Humanity Institute.