Researchers have proposed several paths by which advances in AI may cause especially long-lasting harms—harms that are especially worrying from the perspective that the long-term impacts of our actions are very important. This week, we’ll consider: What are some of the best arguments for and against various claims about how AI poses existential risk?
For context on the field’s current perspectives on these questions, a 2020 survey of AI safety and governance researchers (Clarke et al., 2021) found that, on average, researchers currently guess there is:
- A 10% chance of existential catastrophe from misaligned, influence-seeking AI
- A 6% chance of existential catastrophe from AI-exacerbated war or AI misuse
- A 7% chance of existential catastrophe from “other scenarios”
Note that there were high levels of uncertainty and disagreement in the above survey’s results. These imply that many researchers must be wrong about important questions, which arguably makes skeptical and questioning mindsets (with skepticism toward both your own view and others’) especially valuable.
Core Readings
Additional Recommendations
Analyses of arguments that misaligned, influence-seeking AI poses existential risk:
Clarifications of / elaborations on Christiano’s “What failure looks like,” and subsequent work:
Analyses of sources of risk not focused on misaligned, influence-seeking AI:
On other risks from harmful competitive situations:
[19] I emphasize mean (not median) estimates because mean estimates equally weigh surveyed opinions, arguably making for more useful overall risk assessments (if two doctors told you some cold medicine was very safe, and one doctor told you it would likely kill you, should you pay more attention to the average or the median of their views?). (Some work suggests the geometric mean would be better–thanks to Michael Aird for flagging this to me. I don’t emphasize this value because I don’t know it or have raw survey data.)
I also emphasize unconditional risk estimates (rather than estimates which condition on some AI existential catastrophe having occurred and just ask what form that catastrophe took); this is because the unconditional estimates seem closer to what we care about (assuming investment into mitigating various risks should depend on the absolute level of risk they pose).
Still, for transparency, here are the other statistics from the survey that I was shown (in addition to the ones in the published overview), shared here with permission (see the published overview for definitions of scenarios 1--5):
- Conditional probability for scenarios 1-3 together: Mean 49%; Median 50%.
- Unconditional probability for scenarios 1-3 together: Mean 10%; Median 6%.
- Unconditional probability for scenarios 4-5 together: Mean 7%; Median 2%.
- Unconditional probability for "Other scenarios": Mean 6%; Median 3%.
[20] Some of these summary statistics (shared with permission) are from the full results rather than the public overview.
[21] This is the average of total risk levels assigned to scenarios 1, 2, and 3 from the survey—scenarios that all (arguably) center on misaligned, influence-seeking AI. Note that, as discussed in the linked survey overview, there is also wide disagreement over which of scenarios 1, 2, and 3 is most likely.