Academics at FHI bring the tools of mathematics, philosophy, social sciences, and the natural sciences to bear on big-picture questions about humanity and its prospects. Our mission is to shed light on crucial considerations that might shape humanity’s long-term future.
We currently divide our work into four focus areas: Macrostrategy, AI Safety, AI Governance, and Biosecurity.
Investigating which crucial considerations are shaping what is at stake for the future of humanity
FHI’s big picture research focuses on the long-term consequences of our actions today and the complicated dynamics that are bound to shape our future in significant ways. A key aspect to this is the study of existential risks – events that endanger the survival of Earth-originating, intelligent life or that threaten to drastically and permanently destroy our potential for realising a valuable future. Our focus within this area lies in the impact of future technology capabilities and impacts (including the possibility and impact of Artificial General Intelligence or ‘Superintelligence’), existential risk assessment, anthropics, population ethics, human enhancement ethics, game theory, and consideration of the Fermi paradox.
Many of the core concepts and techniques within this field originate from research by FHI scholars. They are already having a practical impact in the effective altruism movement.
Featured macrostrategy publications
The Vulnerable World Hypothesis (2019)
Scientific and technological progress might change people’s capabilities or incentives in ways that would destabilize civilization. For example, advances in DIY biohacking tools might make it easy for anybody with basic training in biology to kill millions; novel military technologies could trigger arms races in which whoever strikes first has a decisive advantage; or some economically advantageous process may be invented that produces disastrous negative global externalities that are hard to regulate. This paper introduces the concept of a vulnerable world: roughly, one in which there is some level of technological development at which civilization almost certainly gets devastated by default, i.e. unless it has exited the ‘semi-anarchic default condition’. Several counterfactual historical and speculative future vulnerabilities are analyzed and arranged into a typology. A general ability to stabilize a vulnerable world would require greatly amplified capacities for preventive policing and global governance. The vulnerable world hypothesis thus offers a new perspective from which to evaluate the risk-benefit balance of developments towards ubiquitous surveillance or a unipolar world order.
That is not dead which can eternal lie: the aestivation hypothesis for resolving Fermi’s paradox (2017)
Anders Sandberg, Stuart Armstrong, Milan Cirkovic
If a civilization wants to maximize computation it appears rational to aestivate until the far future in order to exploit the low-temperature environment: this can produce a 1030 multiplier of achievable computation. We hence suggest the “aestivation hypothesis”: the reason we are not observing manifestations of alien civilizations is that they are currently (mostly) inactive, patiently waiting for future cosmic eras.
Underprotection of unpredictable statistical lives compared to predictable ones (2016)
Marc Lipsitch, Nicholas G. Evans, Owen Cotton-Barratt
Existing ethical discussion considers the differences in care for identified versus statistical lives. However, there has been little attention to the different degrees of care that are taken for different kinds of statistical lives.
Superintelligence asks the questions: What happens when machines surpass humans in general intelligence? Will artificial agents save or destroy us? Nick Bostrom lays the foundation for understanding the future of humanity and intelligent life.
The unilateralist’s curse: the case for a principle of conformity
Nick Bostrom, Thomas Douglas & Anders Sandberg
This article considers agents that are purely motivated by an altruistic concern for the common good, and shows that if each agent acts on her own personal judgment as to whether the initiative should be undertaken, then the initiative will move forward more often than is optimal. It explores the unilateralist’s curse.
Existential risk reduction as global priority
This paper discusses existential risks. It raises that despite the enormous expected value of reducing the possibility of existential risk, issues surrounding human-extinction risks and related hazards remain poorly understood.
Global Catastrophic Risks
Nick Bostrom, Milan M. Cirkovic
In Global Catastrophic Risks, 25 leading experts look at the gravest risks facing humanity in the 21st century, including asteroid impacts, gamma-ray bursts, Earth-based natural catastrophes, nuclear war, terrorism, global warming, biological weapons, totalitarianism, advanced nanotechnology, general artificial intelligence, and social collapse. The book also addresses over-arching issues – policy responses and methods for predicting and managing catastrophes.
Anthropic Bias explores how to reason when you suspect that your evidence is biased by “observation selection effects”–that is, evidence that has been filtered by the precondition that there be some suitably positioned observer to “have” the evidence.
Probing the improbable: methodological challenges for risks with low probabilities and high stakes
Toby Ord, Rafaela Hillerbrand, Anders Sandberg
This paper argues that there are important new methodological problems which arise when assessing global catastrophic risks and we focus on a problem regarding probability estimation.
The reversal test: eliminating status quo bias in bioethics
Nick Bostrom, Toby Ord
Explores whether we have reason to believe that the long-term consequences of human cognitive enhancement would be, on balance, good.
How unlikely is a doomsday catastrophe?
Max Tegmark, Nick Bostrom
This article considers existential risk and how many previous bounds on their frequency give a false sense of security. It derives a new upper bound of one per 10^9 years (99.9% c.l.) on the exogenous terminal catastrophe rate that is free of such selection bias, using planetary age distributions and the relatively late formation time of Earth.
Astronomical waste: the opportunity cost of delayed technological development
This paper considers how with very advanced technology, a very large population of people living happy lives could be sustained in the accessible region of the universe. It emphasizes that for every year that development of such technologies and colonization of the universe is delayed, there is an opportunity cost.
Examination of how technological trends, geopolitics, and governance structures will affect the development of advanced artificial intelligence.
The AI Governance Research Group strives to help humanity capture the benefits and manage the risks of artificial intelligence. Researchers conduct research into important and neglected issues within AI governance, drawing on Political Science, International Relations, Computer Science, Economics, Law, and Philosophy. Their research is used to advise decision-makers in private industry, civil society, and policy. More detail of their work and a full list of publications can be found here.
The group’s work is guided by their Research Agenda and Theory of Impact and includes examination of how technological trends, geopolitics, and governance structures will affect the development of advanced artificial intelligence.
Researchers are active in international policy circles, regularly hosting discussions with leading academics in the field, and advising governments and industry leaders. Recent policy engagement and events include writing in The Washington Post about Covid-19 contact tracing apps, presenting evidence to The US Congress on China’s AI strategy, and a live webinar with Daron Acemoğlu, Diane Coyle, and Joseph Stiglitz on the economics of AI and COVID-19. For all our policy writing, see Policy & Public Engagement.
The core staff comprises an interdisciplinary team of policy experts and researchers. Research affiliates work on a wide variety of domains, including China-US relations, cybersecurity, EU policy, and AI progress forecasting.
More detail of work, events, and a full list of publications can be found here.
AI Governance: A Research Agenda (2018)
This research agenda by Allan Dafoe proposes a framework for research on AI governance. It provides a foundation to introduce and orient researchers to the space of important problems in AI governance. It offers a framing of the overall problem, an enumeration of the questions that could be pivotal, and references to published articles relevant to these questions.
Artificial Intelligence: American Attitudes and Trends (2019)
Baobao Zhang and Allan Dafoe
This report by Baobao Zhang and Allan Dafoe presents the results from an extensive look at the American public’s attitudes toward AI and AI governance, with questions touching on: workplace automation; attitudes regarding international cooperation; the public’s trust in various actors to develop and regulate AI; views about the importance and likely impact of different AI governance challenges; and historical and cross-national trends in public opinion regarding AI. Our results provide preliminary insights into the character of U.S. public opinion regarding AI.
Read More; and see HTML version.
Featured in Bloomberg, Vox, Axios and the MIT Technology Review.
Deciphering China’s AI Dream: The context, components, capabilities, and consequences of China’s strategy to lead the world in AI (2018)
This report examines the intersection of two subjects, China and artificial intelligence, both of which are already difficult enough to comprehend on their own. It provides context for China’s AI strategy with respect to past science and technology plans, and it also connects the consistent and new features of China’s AI approach to the drivers of AI development (e.g. hardware, data, and talented scientists). In addition, it benchmarks China’s current AI capabilities by developing a novel index to measure any country’s AI potential and highlights the potential implications of China’s AI dream for issues of AI safety, national security, economic development, and social governance.
When Will AI Exceed Human Performance? Evidence from AI Experts
Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang and Owain Evans
Advances in artificial intelligence (AI) will transform modern life by reshaping transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI.
Policy Desiderata in the Development of Machine Superintelligence
Nick Bostrom, Allan Dafoe, Carrick Flynn
This paper seeks to initiate discussion of challenges and opportunities posed by the potential development of superintelligence by identifying a set of distinctive features of the transition to a machine intelligence era. From these distinctive features, we derive a correlative set of policy desiderata—considerations that should be given extra weight in long-term AI policy compared to other policy contexts.
Strategic implications of openness in AI development
This paper attempts a preliminary analysis of the global desirability of different forms of openness in AI development (including openness about source code, science, data, safety techniques, capabilities, and goals).
Unprecedented technological risks
Over the next few decades, the continued development of dual-use technologies will provide major benefits to society. They will also pose significant and unprecedented global risks, this report gives an overview of these risks and their importance, focusing on risks of extreme catastrophe.
Researching computer science techniques for building safer artificially intelligent systems
Surveys of leading AI researchers suggest a significant probability of human-level artificial intelligence being achieved this century. The goal of the AI Safety Research Group is to ensure that as the capabilities of AI systems increase, they remain aligned with human values. You can find our publications on our Google Scholar page.
FHI’s existing research on AI Safety is broad. On the theoretical end, our interests include the incentives of AI systems and the limitations of value learning. On the experimental side, we have been interested in training deep learning models to decompose complex tasks and to be more robust to large errors. Some examples of FHI’s research are: , , . Groups at DeepMind, CHAI, MIRI and OpenAI are also conducting highly relevant research. FHI collaborates with and advises leading AI research organisations, such as Google DeepMind.
In late 2020, a working group was established to use causal models to study incentive concepts, and their application to AI safety. It was founded by Ryan Carey (FHI) and Tom Everitt (DeepMind) and includes other researchers from Oxford and the University of Toronto. More information about the working group can be found here.
FHI also supports the Alignment Newsletter financially and operationally. The Alignment Newsletter is a weekly publication by Rohin Shah with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here.
Featured AI safety publications
Trial without Error: Towards Safe RL with Human Intervention (2017)
William Saunders, Girish Sastry, Andreas Stuhlmueller, Owain Evans
How can AI systems learn safely in the real world? Self-driving cars have safety drivers, people who sit in the driver’s seat and constantly monitor the road, ready to take control if an accident looks imminent. Could reinforcement learning systems also learn safely by having a human overseer?
This paper introduces exploration potential, a quantity for that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem’s reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class).
Safely interruptible agents
Laurent Orseau, Stuart Armstrong
This paper provides a formal definition of safe interruptibility and exploits the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can easily be made so, like Sarsa. It shows that even ideal, uncomputable reinforcement learning agents can be made safely interruptible.
A formal solution to the grain of truth problem
Jan Leike, Jessica Taylor, Benya Fallenstein
A Bayesian agent acting in a multi-agent environment learns to predict the other agents’ policies if its prior assigns positive probability to them (in other words, its prior contains a grain of truth). Finding a reasonably large class of policies that contains the Bayes-optimal policies with respect to this class is known as the grain of truth problem. This paper presents a formal and general solution to the full grain of truth problem.
Thompson sampling is asymptotically optimal in general environments
Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter
This paper discusses a variant of Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. It shows that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.
Learning the preferences of ignorant, inconsistent agents
Owain Evans, Andreas Stuhlmüller, Noah D. Goodman
An analysis of what people value and how this relates to machine learning.
Off-policy Monte Carlo agents with variable behaviour policies
This paper looks at the convergence property of off-policy Monte Carlo agents with variable behaviour policies. It presents results about convergence and lack of convergence
Nate Soares, Benja Fallenstein, Eliezer Yudkowsky, Stuart Armstrong
An introduction to the notion of corrigibility and analysis of utility functions that attempt to make an agent shut down safely if a shutdown button is pressed, while avoiding incentives to prevent the button from being pressed or cause the button to be pressed, and while ensuring propagation of the shutdown behavior as it creates new subsystems or self-modifies.
Learning the preferences of bounded agents
Owain Evans, Andreas Stuhlmüller, Noah D. Goodman
This paper explicitly models structured deviations from optimality when inferring preferences and beliefs. They use models of bounded and biased cognition as part of a generative model for human choices in decision problems, and infer preferences by inverting this model.
Working with institutions around the world to reduce risks from especially dangerous pathogens
Rapid developments in biotechnology and genetic engineering will pose novel risks and opportunities for humanity in the decades to come. Arms races or proliferation with advanced bioweapons could pose existential risks to humanity, while advanced medical countermeasures could dramatically reduce these risks. Human enhancement technologies could radically change the human condition. FHI’s biotechnology research group conducts cutting-edge research on the impacts of advanced biotechnology and their impacts on existential risk and the future of humanity. In addition to research, the group regularly advises policymakers: for example, FHI researchers have consulted with the US President’s Council on Bioethics, the US National Academy of Sciences, the Global Risk Register, the UK Synthetic Biology Leadership Council, as well as serving on the board of DARPA’s SafeGenes programme and directing iGEM’s safety and security system.
Featured biotechnology publications
Beyond risk-benefit analysis: pricing externalities for gain-of-function research of concern
Owen Cotton-Barratt, Sebastian Farquhar, Andrew Snyder-Beattie
In this policy working paper, we outline an approach for handling decisions about Gain of Function research of concern.
Human Agency and Global Catastrophic Biorisks
Piers Millett, Andrew Snyder-Beattie
Given that events such as the Black Death and the introduction of smallpox to the Americas have comprised some of the greatest catastrophes in human history, it is natural to examine the possibility of global catastrophic biological risks (GCBRs). In the particularly extreme case of human extinction or permanent collapse of human civilization, such GCBRs would jeopardize the very existence of many thousands of future generations. Does the category of GCBR merit special research effort?
Existential Risk and Cost-Effective Biosecurity
Piers Millett, Andrew Snyder-Beattie
This paper provides an overview of biotechnological extinction risk, makes some estimates for how severe the risks might be, and compares the cost-effectiveness of reducing these extinction-level risks with existing biosecurity work. The authors find that reducing human extinction risk can be more cost-effective than reducing smaller-scale risks, even when using conservative estimates. This suggests that the risks are not low enough to ignore and that more ought to be done to prevent the worst-case scenarios.
Embryo Selection for Cognitive Enhancement: Curiosity or game-changer?
Carl Shulman, Nick Bostrom
In this article, we analyze the feasibility, timescale, and possible societal impacts of embryo selection for cognitive enhancement. We find that embryo selection, on its own, may have significant (but likely not drastic) impacts over the next 50 years, though large effects could accumulate over multiple generations. However, there is a complementary technology – stem cell-derived gametes – which has been making rapid progress and which could amplify the impact of embryo selection, enabling very large changes if successfully applied to humans.