Global Challenges Project - Library
  • EA Student Groups Handbook
Artificial general intelligence

Artificial general intelligence

ℹ️
This is based on the 2023 AGI Safety Fundamentals curriculum

Artificial general intelligence (AGI) is the key concept underpinning this course, so it's important to start by exploring what we mean by AGI and examine the reasons for thinking that the field of machine learning is heading towards it.

First, we will examine the current state of machine learning and then consider what AGI is. These two topics will help you form your views on whether modern machine learning is heading towards the development of AGI.

Second, we will consider how these capabilities might develop over time. We'll cover a report that measures how long it'll take to afford the necessary compute to train a human-equivalent intelligence and arguments that scaling current techniques leads to higher - and potentially more general - capabilities.

Finally, we'll examine texts that speculate the potential step changes in ML capabilities still to come.

Core readings:

πŸ”—
Visualizing the deep learning revolution (Ngo, 2022) (20 mins)
πŸ”—
On the opportunities and risks of foundation models (Bommasani et al., 2022) (only pages 3-6, focusing mostly on understanding what figures 1 & 2 are communicating) (10 mins)
πŸ”—
Four background claims (Soares, 2015) (15 mins)
πŸ”—
AGI safety from first principles (Ngo, 2020) (only sections 1 and 2.1) (15 mins)
πŸ”—
Why and how of scaling large language models (Joseph, 2022) (only first 5 minutes, stopping at 'parallelization') (5 mins)
πŸ”—
Biological Anchors: A Trick That Might Or Might Not Work (Alexander, 2022) (only Part I, ending at β€œHow sensitive is this to changes in assumptions”) (20 mins)
πŸ”—
For those with extensive ML background:
  1. Future ML systems will be qualitatively different (Steinhardt, 2022) (10 mins)
  2. More Is Different for AI (Steinhardt, 2022) (5 mins)
πŸ”—
Intelligence explosion: evidence and import (Muehlhauser and Salamon, 2012) (only pages 10-15) (15 mins)

Optional readings:

Successes of deep learning:

πŸ”—
Collection of GPT-3 results (Sotala, 2020) Sotala collects many examples of sophisticated behavior from GPT-3.
πŸ”—
Creating a Space Game with OpenAI Codex (OpenAI, 2021) (10 mins)
πŸ”—
CICERO: an AI agent that negotiates, persuades, and cooperates with people (Bakhtin et al., 2022)
πŸ”—
AlphaStar: mastering the real-time strategy game StarCraft II (Vinyals et al., 2019) (20 mins)
πŸ”—
Generally capable agents emerge from open-ended play (DeepMind, 2021) (25 mins)

AGI:

πŸ”—
Three Impacts of Machine Intelligence (Christiano, 2014) (15 mins)
πŸ”—
AI: racing towards the brink (Harris and Yudkowsky, 2018) (110 mins) (audio here)
πŸ”—
Most important century (Karnofsky, 2021)
πŸ”—
General intelligence (Yudkowsky, 2017) and The power of intelligence (Yudkowsky, 2007) (35 mins)
πŸ”—
Understanding human intelligence through human limitations (Griffiths, 2020) (40 mins)

Scaling and AI forecasting:

πŸ”—
The Bitter Lesson (Sutton, 2019) (5 mins)
πŸ”—
AI and compute: how much longer can computing power drive AI progress? (Lohn and Musser, 2022) (30 mins)
πŸ”—
AI and efficiency (Hernandez and Brown, 2020) (15 mins)
πŸ”—
2022 expert survey on progress in AI (Stein-Perlman, Weinstein-Raum and Grace, 2022) (15 mins)
πŸ”—
AI Forecasting: One Year in (Steinhardt, 2022) (10 mins)
πŸ”—
Large language models can self-improve (Huang et al., 2022)

Notes:

Instead of AGI, some people use the terms β€œhuman-level AI” or β€œstrong AI”. β€œSuperintelligence” refers to AGI which is far beyond human-level intelligence. The opposite of general AI is called narrow AI. Some readings instead focus on the concept of transformative AI, defined as AI which has effects as large as (or larger than) the industrial revolution. In theory this could be achieved using narrow AI, but in practice it seems likely to be roughly equivalent to AGI.

Next in the AGI Safety Fundamentals curriculum

Topics

Reward misspecification and instrumental convergence
Reward misspecification and instrumental convergence
Reward misspecification and instrumental convergence