Thompson sampling is asymptotically optimal in general environments. (Leike, J., Lattimore, T., Orseau, L. & Hutter, M. (2016). Proceedings of the Thirty-Second Uncertainty in Artificial Intelligence Conference)

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markov, nonergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

FHI researchers cited in UK Parliamentary “Robotics and artificial intelligence” report

Posted on 20 October 201627 October 2016

The UK House of Commons Science and Technology Committee have released a report concluding their recent enquiry on robotics and artificial intelligence. The report cites oral evidence given by FHI researcher Dr. Owen Cotton-Barratt, and discusses the work of FHI researcher Dr. Stuart Armstrong.

A formal solution to the grain of truth problem. Proceedings of the Thirty-Second Uncertainty in Artificial Intelligence Conference. (Leike, J., Taylor, J., Fallenstein, B. (2016).)

In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class. When the environment is unknown, Bayes-optimal agents may fail to act optimally even asymptotically…

Safely interruptible agents. (Orseau, L. & Armstrong, S. (2016). Proceedings of the Thirty-Second Uncertainty in Artificial Intelligence Conference.)

The probable necessity for humans to be able to interrupt nonoptimal actions by reinforcement learning agents is a problem if the agent learns how to avoid such interruptions. This paper explores a way to make sure a learning agent will not learn to prevent (or seek!) being interrupted by the environment or a human operator.

Müller, V. C. (2014). Risks of general artificial intelligence. Journal of Experimental & Theoretical Artificial Intelligence, 26(3), 297-301.

https://doi.org/10.1080/0952813X.2014.895110

Müller, V. C. (2014). Risks of general artificial intelligence. Journal of Experimental & Theoretical Artificial Intelligence, 26(3), 297-301.

http://www.aleph.se/papers/Ethics%20of%20brain%20emulations%20draft.pdf

Müller, V. & Bostrom, N. (2016). Future progress in artificial intelligence: A survey of expert opinion. In Fundamental Issues of Artificial Intelligence (pp. 553-571). Berlin: Springer.

http://philpapers.org/archive/MLLFPI.pdf

Racing to the precipice: a model of artificial intelligence development. (Armstrong, S., Bostrom, N., & Shulman, C. 2016. AI & Society, 1-6)

http://link.springer.com/article/10.1007/s00146-015-0590-y/fulltext.html

Sandberg, A. (2014). Ethics of brain emulations. Journal of Experimental & Theoretical Artificial Intelligence, 26(3), 439-457.

http://www.aleph.se/papers/Ethics%20of%20brain%20emulations%20draft.pdf

Learning the preferences of ignorant, inconsistent agents. (Evans, O., Stuhlmüller, A., & Goodman, N. 2016. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence)

http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12476/11601