Can we truly align AI with human values? - Q&A with Brian Christian — University of Oxford, Medical Sciences Division

You are already a highly successful writer, researcher, and influencer in AI and ethics. Why did you decide – aged 39 years – to go back to being a student and start a PhD?

One of the fun things about being an author is that you get to have an existential crisis every time you finish a book. When The Alignment Problem was published in 2021, I found myself ‘unemployed’ again, wondering what my next project would be. But writing The Alignment Problem had left me with a sense of unfinished business. As with all my books, the process of researching it had been driven by a curiosity that doesn’t stop when it reaches the frontiers of knowledge: what began as scholarly and journalistic questions simply turned into research questions. I wanted be able to go deeper into some of those questions than a lay reader would necessarily be interested in.

The Alignment Problem is widely said to be one of the best books about artificial intelligence. How would you summarise the central idea of the book?

The alignment problem is essentially about how we get machines to behave in accordance with human norms and human values. We are moving from traditional software, where behaviour is manually and explicitly specified, to machine learning systems that essentially learn by examples. How can we be sure that they are learning the right things from the right examples, and that they will go on to actually behave in the way that we want and expect?

This is a problem which is getting increasingly urgent, as not only are these models becoming more and more capable, but they are also being more and more widely deployed throughout many different levels of society. The book charts the history of the field, explains its core ideas, and traces its many open problems through personal stories of about a hundred individual researchers.

So, how does this relate to your DPhil project?

Under the supervision of Professor Chris Summerfield in the Human Information Processing lab (part of the Department of Experimental Psychology), I am exploring how cognitive science and computational neuroscience can help us to develop mathematical models that capture what humans actually value and care about. Ultimately, these could help enable AI systems that are more aligned with humans – and which could even give us a deeper understanding of ourselves.

Read the full interview on the University of Oxford website.

What Elio can help teach us about eye patching, stigma and the developing brain

11 July 2025

Disney Pixar’s latest film, Elio, follows a familiar-sounding character, a lovable and imaginative young hero who dreams of finding a place where he truly belongs. But amid the colour and chaos of the film’s outer space setting, one subtle detail stands out: Elio wears an eye patch.

Can we truly align AI with human values? - Q&A with Brian Christian

Similar stories

What Elio can help teach us about eye patching, stigma and the developing brain

Oxford’s OrganOx wins the MacRobert Award 2025

Ethnic disparities persist in COVID-19 and cardiovascular disease

Helen Byrne wins Naylor Prize and Lectureship in Applied Mathematics

First of its kind study for children with arthritis reveals possible new disease targets

Oxford researchers become EMBO members

Cookies on this website

Can we truly align AI with human values? - Q&A with Brian Christian

Similar stories

What Elio can help teach us about eye patching, stigma and the developing brain

Oxford’s OrganOx wins the MacRobert Award 2025

Ethnic disparities persist in COVID-19 and cardiovascular disease

Helen Byrne wins Naylor Prize and Lectureship in Applied Mathematics

First of its kind study for children with arthritis reveals possible new disease targets

Oxford researchers become EMBO members