Agent foundations or AI governance, and careers in alignment

ℹ️

This is based on the 2023 AGI Safety Fundamentals curriculum

This week, participants can choose between two topics: agent foundations research, and AI governance research. Then both streams will spend the second half of the session covering careers in alignment.

The first two readings cover the agent foundations research agenda (pursued primarily by the Machine Intelligence Research Institute (MIRI)), which aims to develop better theoretical frameworks for describing AIs embedded in real-world environments.

The next two readings cover AI governance. In Clarke’s (2022) taxonomy (pictured below), they focus on strategy research, tactics research and field-building, not on developing, advocating or implementing specific policies. Those interested in exploring AI governance in more detail, including evaluating individual policies, should look at the curriculum for the parallel AI governance track of this course.

We finish with a compilation of resources related to careers in alignment.

Core readings:

🔗

Embedded agents, part 1 (Demski and Garrabrant, 2018) (15 mins)

🔗

Read two of the following three blog posts, which give brief descriptions of work on agent foundations.

Logical induction: blog post (Garrabrant et al., 2016) (10 mins)

Garrabrant et al. (2016) provide an idealized algorithm for induction under logical uncertainty (e.g. uncertainty about mathematical statements).

Logical decision theory (Yudkowsky, 2017) (only up to the beginning of the “Evidential versus counterfactual conditioning” section) (10 mins)

Yudkowsky (2017) outlines a novel decision theory which accounts for correlations between the decisions of different agents.

Progress on causal influence diagrams: blog post (Everitt et al., 2021) (15 mins)

Everitt et al. formalize the concept of an RL agent having an incentive to influence different aspects of its training setup.

🔗

AI Governance: Opportunity and Theory of Impact (Dafoe, 2020) (25 mins)

🔗

Cooperation, conflict and transformative AI: sections 1 & 2 (Clifton, 2019) (25 mins)

🔗

Careers in alignment (Ngo, 2022) (30 mins)

Optional readings:

Agent foundations:

🔗

MIRI’s approach (Soares, 2015) (25 mins)

🔗

Cheating Death in Damascus: blog post (Soares and Levenstein, 2017) (10 mins)

🔗

Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents (Critch, 2016) (40 mins)

🔗

Infra-bayesianism unwrapped (Shimi, 2021) (45 mins)

🔗

Causal inference in statistics: a primer (Pearl et al., 2016)

AI governance:

🔗

The semiconductor supply chain (Khan, 2021) (up to page 15) (15 mins)

🔗

Assessing the new semiconductor export controls (Reynolds, 2022) (10 mins)

🔗

The global AI talent tracker (Macro Polo, 2020) (5 mins)

🔗

Sharing powerful AI models (Shevlane, 2022) (10 mins)

🔗

Deciphering China’s AI dream (Ding, 2018) (95 mins) (see also his podcast on this topic)

🔗

AI Governance: a research agenda (Dafoe, 2018) (120 mins)

🔗

Some AI governance research ideas (Anderljung and Carlier, 2021) (60 mins)

🔗

Our AI governance grantmaking so far (Muehlhauser, 2020) (15 mins)

🔗

The longtermist AI governance landscape: a basic overview (Clarke, 2022) (15 mins)

Notes:

“Accident” risks, as discussed in Dafoe (2020), include the standard risks due to misalignment which we’ve been discussing for most of the course. I don’t usually use the term, because “accident” has connotations of being non-deliberate, whereas the other risks would be driven by “deliberate” misbehavior from AIs.
Compared with the approaches discussed over the last few weeks, agent foundations research is less closely-connected to existing systems, and more focused on developing new theoretical foundations for alignment. Given this, there are many disagreements about how relevant it is for deep-learning-based systems.

Agent foundations or AI governance, and careers in alignment

Core readings:

Optional readings:

Notes:

Other topics to check out

Topics