The Alignment Problem

"The government takes the long term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for the UK and the world, seriously." - UK National AI Strategy

Resources

🔗

Why AI alignment could be hard with modern deep learning - Cotra (2021) (20 mins)

This article introduces two ways in which modern AI techniques may create misaligned AI systems.

🔗

Intro to AI Safety, Remastered

🔗

Specification gaming: the flip side of AI ingenuity (Krakovna et al., 2020) (15 mins)

DeepMind researchers elaborate on the difficulty of adequately specifying objectives for AI systems. Failures to solve this may lead to what the first reading refers to as “sycophants.”

🔗

9 Examples of Specification Gaming

🔗

“Inner Alignment: Explain like I'm 12 Edition” (Harth, 2020) (15 mins)

This piece elaborates on the risk of “deceptive alignment,” another potential challenge in aligning AI systems with human values. It may lead to what the first reading refers to as “schemers.”

🔗

AI alignment landscape (Christiano, 2020) (30 mins)

This talk outlines a strategic landscape of how people (including governance people) can contribute to solving AI alignment.

Also explore

Goal misgeneralization

The Alignment Problem

Resources

Also explore

Topics (1)