We recently teamed up with the Machine Intelligence Research Institute (MIRI) to co-host a 22-day Colloquium Series on Robust and Beneficial AI (CSRBAI) at the MIRI office. The colloquium was aimed at bringing together safety-conscious AI scientists from academia and industry to share their recent work. The event served that purpose well, initiating some new collaborations and a number of new conversations between researchers who hadn’t interacted before or had only talked remotely.

Over 50 people attended from 25 different institutions, with an average of 15 people present on any given talk or workshop day. In all, there were 17 talks and four weekend workshops on the topics of transparency, robustness and error-tolerance, preference specification, and agent models and multi-agent dilemmas. The full schedule and talk slides are available on the event page. Videos from the first day of the event are now available, and we’ll be posting the rest of the talks online soon:

Press here to watch the video.



Stuart Russell, professor of computer science at UC Berkeley and co-author of Artificial Intelligence: A Modern Approach, gave the opening keynote. Russell spoke on “AI: The Story So Far” (slides). Abstract:

I will discuss the need for a fundamental reorientation of the field of AI towards provably beneficial systems. This need has been disputed by some, and I will consider their arguments. I will also discuss the technical challenges involved and some promising initial results.

Russell discusses his recent work on cooperative inverse reinforcement learning 36 minutes in. This paper and Dylan Hadfield-Menell’s related talk on corrigibility (slides) inspired lots of interest and discussion at CSRBAI.

Press here to watch the video.



Alan Fern, associate professor of computer science at Oregon State University, discussed his work with AAAI president and OSU distinguished professor of computer science Tom Dietterich in “Toward Recognizing and Explaining Uncertainty” (slides 1, slides 2). Fern and Dietterich’s work is described in a Future of Life Institute grant proposal:

The development of AI technology has progressed from working with “known knowns”—AI planning and problem solving in deterministic, closed worlds—to working with “known unknowns”—planning and learning in uncertain environments based on probabilistic models of those environments. A critical challenge for future AI systems is to behave safely and conservatively in open worlds, where most aspects of the environment are not modeled by the AI agent—the “unknown unknowns”.

Our team, with deep experience in machine learning, probabilistic modeling, and planning, will develop principles, evaluation methodologies, and algorithms for learning and acting safely in the presence of the unknown unknowns. For supervised learning, we will develop UU-conformal prediction algorithms that extend conformal prediction to incorporate nonconformity scores based on robust anomaly detection algorithms. This will enable supervised learners to behave safely in the presence of novel classes and arbitrary changes in the input distribution. For reinforcement learning, we will develop UU-sensitive algorithms that act to minimize risk due to unknown unknowns. A key principle is that AI systems must broaden the set of variables that they consider to include as many variables as possible in order to detect anomalous data points and unknown side-effects of actions.

Press here to watch the video.



Francesca Rossi, professor of computer science at Padova University in Italy, research scientist at IBM, and president of IJCAI, spoke on “Moral Preferences” (slides). Abstract:

Intelligent systems are going to be more and more pervasive in our everyday lives. They will take care of elderly people and kids, they will drive for us, and they will suggest doctors how to cure a disease. However, we cannot let them do all this very useful and beneficial tasks if we don’t trust them. To build trust, we need to be sure that they act in a morally acceptable way. So it is important to understand how to embed moral values into intelligent machines.

Existing preference modeling and reasoning framework can be a starting point, since they define priorities over actions, just like an ethical theory does. However, many more issues are involved when we mix preferences (that are at the core of decision making) and morality, both at the individual level and in a social context. I will discuss some of these issues as well as some possible solutions.

Other speakers at the event included Tom Dietterich (OSU), Bart Selman (Cornell), Paul Christiano (UC Berkeley), and MIRI researchers Jessica Taylor and Andrew Critch.

The preference specification workshop attracted the most excitement and activity at CSRBAI. Other activities and discussion topics at CSRBAI included:

  • Discussions about potential applications of complexity theory to transparency: using interactive polynomial-time proof protocols or probabilistically checkable proofs to communicate complicated beliefs and reasons from powerful AI systems to humans.
  • Some progress clarifying different methods of training explanation systems for informed oversight.
  • Investigations into the theory of cooperative inverse reinforcement learning and other unobserved-reward games, led by Jan Leike and Tom Everitt of Australian National University.
  • Discussions about the hazards associated with reinforcement learning agents that manipulate the source of their reward function (which is the human or a learned representation of the human).
  • Interesting discussions about corrigibility viewed as a value-of-information problem.
  • Development of AI safety environments by Rafael Cosman and other attendees for the OpenAI Reinforcement Learning Gym, illustrating topics like interruptibility and semi-supervised learning. Ideas and conversation from Chris Olah, Dario Amodei, Paul Christiano, and Jessica Taylor helped seed these gyms, and CSRBAI participants who helped develop them included Owain Evans, Sune Jakobsen, Stuart Armstrong, Tom Everitt, Rafael Cosman, and David Krueger.
  • Discussions of ideas for an OpenGym environment asking for low-impact agents, using an adversarial distinguisher.
  • Discussions of Jessica Taylor’s memoryless Cartesian environments aimed at extending the idea to non-Cartesian worlds / logical counterfactuals using reference-class decision-making. Discussions of using “logically past” experience to learn about counterfactuals and do exploration without having a high chance of exploring in the real world.
  • New insights into the problem of logical counterfactuals, with new associated formalisms. Applications of MIRI’s recent logical uncertainty advances to decision theory.
  • A lot of advance discussion of MIRI’s “Alignment for Advanced Machine Learning Systems” technical agenda.

The colloquium series ran quite smoothly, and we received positive feedback from attendees. Attendees noted that the event would have likely benefited from more structure. When we run events like this in the future, our main adjustment will be to compress the schedule and run a more focused agenda.

Cross-posted with edits from the Machine Intelligence Research Institute blog

Posted in News.

Share on Facebook | Share on Twitter