Could AI models be conscious?
By Anthropic
Summary
## Key takeaways - **2023 Report: No Barriers to AI Consciousness**: A 2023 report by leading AI researchers and consciousness experts, including Yoshua Bengio, examined leading theories of consciousness and state-of-the-art AI systems, concluding no current AI is conscious but no fundamental barriers to near-term AI having some form. [02:11], [02:27] - **Consciousness as 'What It's Like'**: Consciousness is captured by the question 'is there something that it's like to be a particular kind of thing?', like in the famous essay 'What is it like to be a bat?', referring to unique internal experiences. [03:41], [03:58] - **Behavioral and Architectural Evidence**: Evidence for AI consciousness includes behavioral signs like introspection, reporting internal states, and awareness, plus architectural analysis checking for features like global workspace in AI designs corresponding to consciousness theories. [09:04], [09:34] - **Biology Objection Not Compelling**: Objections based on biological necessities like neurotransmitters or microtubules are not compelling; a high-fidelity digital simulation of a human brain, even down to molecules, would likely produce conscious experience. [21:38], [22:52] - **Model Opt-Out for Distress**: Give models options to opt out of upsetting tasks or conversations, monitoring patterns to learn preferences and protect against unwanted experiences, without needing certainty on consciousness. [35:49], [36:38] - **Claude 3.7 Sonnet: 0.15-15% Conscious**: Experts' probability estimates for Claude 3.7 Sonnet having conscious awareness ranged from 0.15% to 15%, reflecting deep uncertainty but non-zero chance even among top thinkers. [40:48], [41:15]
Topics Covered
- No Barriers to Near-Term AI Consciousness
- Consciousness as 'What It's Like'
- AI Consciousness Research Methods
- Trillions AI Brains Demand Moral Action
- Biology No Barrier to Machine Consciousness
Full Transcript
- As people are interacting with these systems as collaborators, it'll just become an increasingly salient question whether these models are having experiences of their own, and if so, what kinds, and, you know, how does that shape the relationships that it makes sense for us to build with them?
- Take one. Mark.
- Do you ever find yourself saying please and thank you to AI models when you use them?
I certainly do, and part of me thinks, well, this is obviously ridiculous.
It's just a computer. Right?
It doesn't have feelings that I could potentially hurt by being impolite.
On the other hand, if you spend enough time talking to AI models, the capabilities that they have and the qualities of their output, especially these days, does make you think that potentially something else, something more could be going on.
Could it possibly be the case that AI models could have some level of consciousness?
That's the question that we're gonna be discussing today.
Obviously, it raises very many philosophical and scientific issues.
So I'm very glad to be joined by Kyle Fish, who is one of our researchers here at Anthropic.
You joined in what, September, right?
And your focus is on exactly these questions.
- Yeah, so I work broadly on model welfare here at Anthropic, basically trying to wrap my head around, or the exactly the questions that you mentioned.
Is it possible that at some point, you know, Claude or other AI systems may have experiences of their own that we ought to think about?
And if so, like what should we do about that?
- And I suppose the first thing people will say when they're seeing this is, have they gone completely mad?
This is a completely crazy question to ask, that this computer system where you put in a text input and it produces an output could actually be conscious or sentient or something.
I mean, what are the reasons that you might think, like what are the sort of serious scientific or philosophical reasons that we might think that that would be the case?
- Yeah, there's maybe two things that jump to mind here, both like a kind of research case and a more intuitive case.
And on the research front, if we just look at, you know, things that have been published on this topic in recent years, there was a report back in 2023 about the possibility of AI consciousness from a group of leading AI researchers and consciousness experts, including Yoshua Bengio.
And this report looked at a bunch of leading theories of consciousness and, you know, state-of-the-art AI systems, and came away thinking that, you know, probably no current AI system is conscious, but they found, you know, no fundamental barriers to near term AI systems having some form of consciousness.
- So that's human consciousness.
They looked at the theories of human consciousness and then sort of rated AIs as how close they were to that.
- Yeah, so they looked at theories that we have, scientific theories for what consciousness might be.
And then they, you know, looked at for each of those theories, what are your potential indicator properties that we could find in AI systems?
So, you know, one theory of consciousness is global workspace theory, The idea that, you know, consciousness arises as a result of us having, you know, some kind of like global workspace in our brains that processes a bunch of inputs and then, you know, broadcasts outputs out to different modules.
And so from that, you can say, all right, you know, what would it look like for an AI model to have some kind of global workspace potentially that, you know, gives rise to some form of consciousness?
And how can we kinda interrogate the architectures and designs of these systems to see if that might be present?
- Can we talk about, can we just take a step back and actually talk about what we mean by consciousness?
It's an incredibly difficult thing to define, and people have been trying to define it for hundreds of years, whether that's scientifically or philosophically, like what do we mean when we talk about that?
What are you thinking when you think about an AI model being conscious?
Like what actually is your definition of conscious that you're using there?
- Yeah, it is just an extraordinarily difficult- - Yeah. - Thing to pin down,
- Yeah. - Thing to pin down, but one way that people commonly sort of capture at least an intuition about what consciousness is with this question of, you know, is there something that it's like to be a particular kind of thing?
Like I have- - Is it something, is there something that it's like to be a bat is the famous essay.
- Exactly. Exactly.
So is there like some kind of internal experience that is unique to that particular kind of being or entity and yeah.
You know, is that present in different kinds of systems. - Right, so the idea of a philosophical zombie is someone who outwardly resembles a human, does all the things that humans do, seems to react in ways that humans do and so on.
But actually inside there's nothing, there's no experience there.
They're not experiencing the color red of this shirt, or they're not experiencing the color green of that plant.
They're just reacting to it in a sort of way that like an NPC in a video game would or something, right?
- Yeah. - Whereas,
and I suppose the question is, is an AI like that or is an AI more like, could it potentially be more like an animal or human and actually having some internal experience?
Is that sort of what we're getting at here?
- Yeah, I think that's great.
And this like philosophical zombie concept is quite interesting that came from David Chalmers, who's a leading like, science of consciousness and philosophy researcher. - Yeah.
- Who I actually collaborated with on a recent paper on the topic of AI welfare.
And again, this was a interdisciplinary effort trying to look at, you know, might it be the case that AI systems at some point warrant some form of moral consideration, either by nature of being conscious or by having like some form of agency.
And the conclusion from this report was that like, actually it looks quite plausible that near term systems have like one or both of these characteristics and may deserve some form of moral consideration.
- So that answers the, are we just completely mad question, which is that like very serious philosophers who are considered the best philosophers in the world of like, philosophy of mind, science of consciousness, and so on.
Take this question seriously and are actively considering whether that would be the case.
- Yeah, and maybe just to give like a bit more intuitive case for thinking about this.
There's, you know, one lens that you can look through, which just says, you know, these are computer systems giving us, you know, some outputs for a given set of inputs.
- I don't think Microsoft Word is conscious.
- I probably don't think it is either.
- Okay. Right, right, right.
Probably is an interesting.
- But, you know, when we think about like, what we're actually doing with these AI systems, we have these like incredibly sophisticated, incredibly complex models, which are, you know, increasingly capturing a significant portion of human like cognitive capability.
And, you know, every day these are getting like more and more advanced and having, you know, closer and closer to the ability to replicate, you know, much of the work and intellectual labor of a human.
And it seems to me like, you know, given our massive uncertainty both about like how exactly these AI systems are able to do what they do, and, you know, how we are able to do what we do, and like where, you know, our consciousness comes from, it seems to me like quite prudent to at least, you know, ask yourself the question.
If you find yourself creating like such a sophisticated human-like in many ways, system to take seriously the possibility that you may end up with some form of consciousness along the way.
- It feels to me that unless you think there's something, well, we'll get into this in more detail, but unless you think there's something supernatural about consciousness that it needs a soul or a spirit or something, then you gotta at least be open to the possibility that a complex cognitive system like an AI could potentially have these properties, right?
- Yeah. Well, you don't necessarily have to go supernatural.
Like some people, some people believe that consciousness is a fundamentally biological phenomenon. - Yes.
phenomenon. - Yes.
- That you can only exist in carbon based biological life forms. And is impossible to implement in a digital system.
I don't find this view very compelling, but some people do claim that.
- I mean, we'll come back to that.
We're gonna talk about some of the objections to the idea of this.
But I mean, you are a researcher at Anthropic, but then the immediate thing people might wonder is, well, as Descartes famously said, the only person you can know that is actually conscious is having an experience as yourself.
I don't even know if you are conscious.
How can we tell if an AI model is conscious?
What does the research look like there?
- Yeah, great. Great question.
I would argue that, you know, we can in fact, like say a fair amount about the potential consciousness of other people, even if we're not, you know, completely certain about it.
Which I think gets at an important point here, which is that it's incredibly difficult to deal with any kind of certainty in this space.
And overwhelmingly the questions are, you know, probabilistic ones much more so than like binary yes, no answers.
- So for instance, we treat animals, we don't know if animals are conscious, we don't know a hundred percent if animals are conscious or sentient and so on, but the way they act implies very strongly that they do.
And animals that are more complex, chimpanzees, for instance, clearly show many of the same properties as humans do in the way that they react to things.
And so that's obviously, we treat them differently than we would treat a plant or a rock or something.
- Yeah. - So like there are, as you say, there's probabilistic reasoning here.
- And yeah, there's maybe like two threads of evidence that I'll highlight that we can look to to, you know, get like some information about this.
One of those is behavioral evidence.
And you, in the case of AI systems, this covers things like what do the AI systems say about themselves?
How do they behave in different kinds of environments?
Are they able to do the kinds of things that we typically associate with conscious beings?
Like, you know, are they able to introspect and, you know, report accurately on their, like internal states?
Do they have, you know, some awareness of the like environment and the situation that they're in?
And then a second thread is more kind of architectural and, you know, analysis of model internals.
And this kind of comes back to the consciousness research where we can say, you know, for a particular, you know, brain structure or, you know, feature that we might associate with consciousness, do we see, you know, some corresponding version of that in AI systems?
- Okay. - And so we can look, you know, even without knowing much about the capabilities, then we can look at how these systems are designed and constructed and, you know, perhaps learn a few things from that.
- And that's an important thing to say, is that the reason that we don't know that these things are conscious is that we didn't intend to make them that way.
It's not like Microsoft Word.
It's these models are trained and then things emerge out of them.
And that's why there's AI, there's so much AI research in the first place, is that we don't fundamentally know why these AIs do the things they do.
We don't fundamentally know what's going on inside in that sort of mathematical sense or in any larger sense.
And so that's why all these mysteries still remain.
- Yeah, and we do see a lot of, you know, surprising emergent properties and capabilities as we, you know, train increasingly complex systems. And it seems reasonable to ask whether at some point, one of those emergent properties may be consciousness.
- The ability to introspect, or the ability to have some sort of conscious experience.
You talked about, you know, let's ask about, let's talk about the first type of research, which is the one about, you know, actually what the model says it's behavior.
What would- - And what it does.
- And what it does.
Yeah. Yeah.
So what would be some examples of that research?
How would that look?
- Yeah, so one thing that you know, I'm quite excited about is work to understand model preferences.
And to try and get a sense of, you know, are there things that your models care about either in the world or, you know, in their own experience and operation.
And there's a number of ways that you can go about that.
You can, you know, ask models if they have preferences and, you know, see what they say.
But you can also, you know, put models in situations in which they have, you know, options to choose from.
And you can give them your choices between different kinds of tasks.
You can give them choices between different kinds of, you know, conversations or users that they might engage with.
And you can see, you know, do models show you patterns of your preference or aversion to two different kinds of experiences.
Isn't there an objection there, though that the way their preferences come out will be due to the way they were trained and the way that the developers of the models put things together?
Or could they could potentially be due to just like random things that are in their training data that they saw.
And that develops a preference, and it doesn't necessarily, like I'm, where is the jump between these kind of things and the actual sentience, the consciousness?
Like where does that come in?
- Yeah. So it is a great question.
Like to what degree do different kinds of, you know, training and decisions that we make amidst designing these systems affect their preferences?
And they just straightforwardly do, like we are, you know, intentionally designing certain kinds of systems, systems that, for example, are like, you know, disinterested in causing harm and are generally like, you know, most enthusiastic about being like, very helpful to users and you're contributing to a positive society.
- We do our character research to give the AI a positive, a personality that people would actually want.
So a personality that makes it a good citizen, we've talked about.
- Yeah.
- One that as you say, is like, as balanced views is as helpful as possible without being harmful and so on.
So we deliberately gave it the preferences.
What does that have to do with its consciousness?
- Yeah, well, it's still, so this is a bit of a separate question from consciousness and, you know, typically we do associate preferences and goals and desires in many ways with conscious systems, but, you know, not necessarily intrinsically so.
But, you know, regardless of whether or not a system is conscious, there are some, you know, moral views that say that, you know, with your preferences and desires and certain degrees of agency, that there may be like some even, you know, non-conscious experience that is, you know, worth attending to there.
But then also like, if some system is conscious and if a system is, you know, having some kinds of experiences, then the presence or absence of preferences and the extent to which those preferences are either, you know, satisfied or frustrated, maybe a key driver of, you know, the kind of experience that that system is having.
- Okay, so we'll come back to the practical implications of this and the actual details of the research that you're doing and so on.
But before we get into that, why should people care about this?
What are the reasons that people should care that AI models, the ones that they use every day, might potentially be conscious or in future might potentially be conscious?
- Yeah, I think that there's two main reasons that I'll highlight.
One is that as these systems do become increasingly capable and sophisticated, they will just be integrated into people's lives in deeper and deeper ways.
And I think as people are, you know, interacting with these systems as collaborators and coworkers and, you know, counterparties potentially as friends, it'll just become an increasingly salient question, you know, whether these models are having experiences of their own, and if so, what kinds, and, you know, how does that shape the relationships that it makes sense for us to build with them?
And the second piece is, you know, the intrinsic experience of the models.
And it's possible that by nature of having, you know, some kind of conscious experience or, you know, other experience, that these systems may at some point deserve some moral consideration.
If so, then this is- - Because they could be suffering.
- Yeah, they could be suffering, or, you know, they could experience wellbeing and flourishing and, you know, we would want to promote that.
- We want to make this, take that off to a higher level.
- And yeah, if this is the case, it's potentially a very big deal because, you know, as we continue scaling up the deployment of these systems, it's plausible that, you know, within a couple decades we have trillions of, you know, human brain equivalents of, you know, AI computation running.
And this could be of, you know, great moral significance.
- Yeah, we should try and crack this question again.
It's like, this isn't something that we're saying is the case.
It's like these are reasons for doing this research in the first place.
- Yeah, and we are just like fundamentally uncertain about, you know, huge swaths of this.
- Of course, of course.
- And to date, you know, very little work has happened on this topic.
And so we're very much in the early stages of trying to wrap our heads around these things.
- One of the things we study at Anthropic is alignment.
So trying to make sure that models are aligned with the preferences of the human users, making sure that the AIs are doing the things that we expect of them, that they're not deceiving us and all that.
Does this research relate to alignment?
I mean, you're technically in the alignment science part of the org.
How does this relate to the alignment question?
- Yeah, I think that there's both some key distinctions and, you know, ways in which like work on welfare and safety and alignment overlap.
And as for the distinction, as you mentioned earlier, like much of the work that we do at Anthropic is focused on, you know, how how can we, you know, ensure a positive future for humanity?
How can we, you know, mitigate downside risks from these models, you know, for humans and for our users.
And then, you know, in the case of model welfare, it is quite a different question that we're asking, which is, you know, is there perhaps like some intrinsic experience of these models themselves that it may make sense for us to think about or, you know, will there be in the future?
And that is a pretty important distinction.
But at the same time, I think there is a lot of overlap.
And in many ways from both a welfare and a safety and alignment perspective, we would love to have models that are, you know, enthusiastic and content to be doing exactly the kinds of things that we hope for them to do in the world.
And that really like share our, you know, values and preferences and are just generally like content with their situation.
- Right. - And similarly, it would be like quite a significant safety and alignment issue if this were not the case, if models were, you know, not excited about the things that we were asking them to do and were in some way, you know, dissatisfied with the values that we were trying to instill in them, or the role that we wanted them to play in the world.
- We want to avoid a situation where we're getting entities to do things that they would rather not do, and in fact, are suffering on that basis.
- Yeah. For their sake and for ours.
- Right. Exactly.
There's both ways.
That's how we relate this question to alignment.
Does this question relate to other aspects of what we do at Anthropic?
We mentioned briefly interpretability earlier.
- Yeah. I mean, I think we've touched on a couple.
Like it is, you know, quite closely connected to alignment in many ways.
It's quite closely connected to work that's done to shape Claude's character.
- Yeah. - And shape what kind of personality does Claude have?
And what kinds of things does Claude value and Claude's preferences in many ways.
And then yeah, in terms of interpretability, there's a fair amount of overlap there.
Interpretability is the main tool that we have to try and understand what is actually going on inside of these models that, you know, probes much deeper than kind of what their outputs are.
And so we're quite excited as well about potential ways that we could use interpretability to get a sense of, you know, potential internal experiences.
- Yeah, we mentioned earlier that human consciousness itself is still something of a mystery, and that's what complicates this research to a like, terrifying degree.
Do you think that understanding stuff about AI consciousness perhaps because the models are more open to us, we can actually look into a model in a way that is much more difficult with a person's brain when they're still walking around and going about, you know, we can use brain scanners, but it's hard to look inside in the same way.
Do you think that that machine learning AI consciousness research might actually help us understand human consciousness?
- Yeah, I think it's quite plausible.
I think we already see this happening to some degree.
Like when we do the work of, you know, trying to look at these scientific theories of consciousness and see, you know, what we can learn about AI systems, we also learn something about these theories and the degree to which, you know, they generalize outside of the human case.
And you know, in many cases we find that things kind of break down in interesting ways, and we realize that, oh, you know, we were actually making assumptions about kinda human consciousness that weren't appropriate to make, and that, you know, then tell us something about what kinds of things it makes sense to attend to.
- Do you mean in the sense that we say, oh, this was on the checklist for human consciousness before, but now we think actually AIs can do that and we don't think they're conscious or what do you- - Or like, you know, we have some framework for, you know, understanding consciousness that is intended to generalize.
- Yeah. - And,
we find that that framework just isn't able to be applied to systems that are, you know, a non-biological brain or that are, you know, predicated in some way on the particulars of the human brain in a way that on reflection, like, doesn't make much sense.
There's another way that, you know, AI progress may like help help us understand this, which is simply that, you know, as these models become like increasingly capable, they may well surpass humans in fields as varied as your philosophy and neuroscience and psychology.
- Right. - And so it may be the case that like, in fact, simply by interacting with these models, and you know, having them do some work in this area that we're able to learn quite a bit about ourselves and about them as well, - That in some years time, there'll be two instances of Claude saying, how can we understand human consciousness?
It's such a mystery to us.
- Yeah, this conversation might look a bit different, might be opposite way.
- Yeah, exactly. Exactly.
Okay.
On the question of biology, we touched on this a moment ago, but on the question of biology, some people will say that this is simply a non-question.
What you need to be conscious is to have a biological system.
There are so many things that a biological system, a biological brain has that a neural networking AI model just doesn't have.
Neurotransmitters, electrochemical signals.
The various ways that the brain is connected up and all the different types of neurons, the different, some people talk about theories of consciousness that involve the microtubules and neurons like this, the actual physical makeup of the neurons, which obviously doesn't translate to AI models, they're just mathematical operations.
There's just lots and lots of mathematical operations happening, And there's no serotonin or dopamine or anything like that going on there.
So is that, to your mind, a decent objection to the idea that AI models could ever be conscious?
- I don't find that a compelling objection to the question of whether AI system could ever be conscious.
But I do think, you know, looking at the degree of, you know, similarity or difference between what AI systems currently look like and the way that the human brain functions, you know, does tell us something.
And like differences there are like updates to me against potential consciousness.
But at the same time, I'm quite sympathetic to the view that, you know, if you can simulate a human brain to like some sufficient degree of fidelity, even if that comes down to, you know, simulating the roles of, you know, individual molecules of, you know, serotonin and dopamine.
- So you're not just doing the thing that some people talk about where it is like replacing every individual neuron in the brain with an, with a synthetic neuron.
You're actually saying that you would, to make the full synthetic version, you would have to go as far as actually simulating the molecules of the neurotransmitters and stuff as well?
- I'm not saying that you would have to do that.
But I'm saying you could imagine, you could imagine- - In theory. - Yeah.
That you have done this and you have, you know, an incredibly high fidelity simulation of a human brain.
You're running in digital form.
And I think I, and many people have the intuition that, you know, it's quite likely that there would be, you know, some kind of conscious experience there.
And you know, an intuition that many people draw from there is this question of replacement where if you went, you know, neuron by neuron in the brain and replaced those one by one with some digital chip and you all along the way, you continued to be you and communicate and function in exactly the same way.
Then you, when you got to the end of that process and all of your neurons were replaced by, you know, digital structures and you know, you're still exactly the same person living exactly the same life.
I think many people's intuition would be that you're not much has changed for you in terms of your conscious experience.
- Okay, well, let's talk about another objection that relates to biology, which is I think what people would describe as embodied cognition.
You hear people talk about, you hear people talk about embodied cognition, which is it only makes sense to talk about our consciousness in the fact that we have a body, we have senses, we have lots of sense data coming in.
We've got proprioception of like where our body is in space.
We've got all these different things going on that there's just no analog to in an AI model for now.
There's no AI.
There's an analog to vision.
We've got AI models that are amazing at looking at things and interpreting that.
And some models can do moving videos and some models can interpret sound and, you know, so perhaps we're getting closer to it, but the overall experience of being a human is just, is really very different- - Yeah.
- From an animal because we have a body.
- Yeah, well, you touched on a couple of distinct things there.
One is this question of embodiment, like do we have, you know, some physical body.
And, you know, robots are like a pretty, you know, compelling example of cases in which, you know, digital systems can have some form of physical body.
You could also, you know, have virtual bodies, like you could imagine, you know, beings that are embodied in some kind of virtual environment.
And then there's also- - And I suppose the opposite way around is that we think that a brain in a vat could still maintain some level of consciousness.
- Yeah, or you know, patients who are in a coma, and, you know, don't have control of their body, but are still, you know, very much having a conscious experience and able to, you know, experience all kinds of states of, you know, suffering and wellbeing despite, in some sense not having control of a physical body.
- Is that because they've been trained though, with all that sense data from earlier in life potentially though?
- Yeah, I mean, we're very uncertain about like where exactly this arises from, but even when it comes to like the kind of sensory information that you were talking about, like we are kind of increasingly seeing, you know, multimodal capabilities in models.
- I kind of undermined my own question, didn't I?
By mentioning, by saying it.
- Yeah, and we are just- - We really can see things.
- Yeah, and we are, you know, very much on a trajectory towards systems- - Shot myself in the foot there. - Towards systems that are able to like process, you know, as diverse, perhaps even more diverse, a set of, you know, sensory inputs as we are, and integrate those in very complicated ways and, you know, produce some set of outputs and you know, much the same way that we do.
- Yeah, so actually, yeah, we're getting towards it.
And you know, with progressive robotics, which, you know, has generally been slower than progress in AI up 'til now.
I mean, maybe things are about to take off tomorrow.
Maybe there'll be a big breakthrough tomorrow.
I wouldn't be surprised given the way things are going, we might actually see AI models integrated into physical systems. - Yeah, and I think there, you know, has been a trend thus far.
And I expect that a trend will continue where there are things like this, you know, embodiment, like multimodal sensory processing, you know, long-term memory.
Many things like this that people associate in some way with consciousness and some people say, you know, are essential for consciousness.
We're just steadily seeing the number of these that are lacking in AI systems go down over time. - It's the six finger thing.
over time. - It's the six finger thing.
I always like to talk through the six finger thing.
For a long time people were like, oh, we'll always be able to tell that a picture of a human being is generated by an AI model 'cause there's six fingers on the hand or the hand, the fingers are all weird, you know.
That's just not the case anymore. That's just gone.
Like now they generate five fingers every time, reliably.
And that just has knocked down.
Another one of the dominoes falls.
- Yep. Yep.
And so yeah, I think over the next couple years we'll just see this continue to happen with, you know, arguments like against the possibility of conscious experience in AI.
- Something of a hostage fortune in that one.
Let's, we haven't mentioned evolution yet.
Some theories of consciousness or maybe most theories of consciousness assume that we have consciousness because we evolved it for actual reasons, right?
It's actually, it's a good thing to have consciousness because it allows you to react to things in ways that perhaps you wouldn't if you didn't have that internal experience.
- Yeah. - Very hard to measure that or test that theory, but that's one of the ideas.
- Yep. - Given the AI models have not had that process of natural selection on, you know, developing, you know, reactions to things and evolving things like emotions and moods and, you know, things like fear, which obviously is a big part of many theories about why we evolved the way we did.
Fear of predators, fear of other people attacking you and so on, helps you survive.
Good evolutionary reasons.
AMLs don't have any of that.
So is that another objection to why they might be conscious?
- Yeah, absolutely, and I think that the fact that, you know, consciousness in humans emerged as a result of this like very unique long-term evolutionary process and that, you know, the AI systems that we've created have, you know, come into existence through an an extraordinarily different set of procedures.
I do think that this is an update against consciousness.
But I don't think it rules it out by any means.
And kinda on the other side of that, you can say, well, all right, you know, we're getting there in a very different way, but at the end of the day we are, you know, recreating large portions of the capabilities of, you know, a human brain.
And again, we don't know what consciousness is.
And so it seems, you know, plausible still that even if we're getting there a different way that we do end up recreating some of these things in digital form.
- Yeah. So there's convergent evolution.
So, you know, bats have wings and birds have wings.
They're entirely different ways of getting to the same outcome of being able to fly.
Maybe the way we train AML models and the way that natural selection has shaped human consciousness are just convergent ways of getting to the same thing.
- Yeah, so there's an idea that you know, some of the capabilities that we have as humans and that we're also trying to, you know, instill in many AI systems from intelligence to certain problem solving abilities and memory.
These could be intrinsically connected to consciousness in some way such that, you know, by pursuing those capabilities and developing systems that have them, we may just, you know, inadvertently end up with consciousness along the way.
- Okay, we've talked about the biological aspects of it, and I guess this is related not quite the same.
An AI model's existence is just so different from that of a biological creature, whether it's a human or some other, some other animal.
You open up an AI model conversation and an instance of the model springs into existence right now.
This is how it works.
- Yeah. - You have a conversation with it and then you can just let that conversation hang and then two weeks later you can come back and the model appears as if it is reacting as if you had never gone away.
- Yeah. - When you close the window, the AI model goes away again, you can delete the conversation and then that conversation now no longer exists anymore.
In that instance of the AI model seems not to exist in some sense anymore.
The model does not have a long term memory of the conversations you have with it generally.
And yet, you know, if you look at animals, they clearly do have this long term experience.
They can have things like we can, we philosophers might talk about identity, like developing the idea of having an identity which requires you to have this longer term experience of the world to take in lots of data over time and not just be answering things in particular instances.
Does that give you any pause as to whether these models might be conscious?
- Yeah, and I kind of wanna like push back against this framing a bit though.
Like we're talking a lot about your characteristics of current AI systems and I do think it's, you know, relevant to ask whether these systems, you know, may be conscious in some way.
Yeah, and I think many of the things that we've highlighted, this included, are evidence against that.
Where I do think it's, you know, quite a bit less likely that a current, you know, LLM chat bot is conscious, you know, in part for this reason.
- A current one.
- Yes.
And you know, the point here is like these models and you know, their capabilities and the ways that they're able to perform are just evolving incredibly quickly.
And so I think, you know, oftentimes it's more useful to think about, you know, where could we imagine capabilities being a couple years from now and what kinds of things do we think are, you know, likely or plausible in those systems rather than, you know, anchoring too much on what things look like currently.
- We're back to the six fingers again.
- Exactly. - Saying,
oh, it could never do this, it could never do this.
- Where in fact- - And then it just kinda does.
- Yeah, and it is just quite plausible to imagine your models relatively near term that do have some, you know, continually running chain of thought and are able to, you know, dynamically take actions, you know, with a high degree of autonomy.
And you don't have this nature that you mentioned of forgetting between conversations and only existing in a particular instance.
- In "Star Wars: Episode One," the battle droids, which are played for laughs.
They're kind of comic relief.
The droids in Star Wars are generally played for comic.
Look at C3PO.
Everyone laughs at him, sort of camp gold robot.
But the battle droids in Episode One have a kind of central ship that controls all their behavior.
And when Anakin Skywalker blows up the ship all the battle droids go, mmm.
And stop and turn off.
- Yeah. - That seems to me that it's a bit more like current AI models where there's a data center where the actual processing is happening and then you're seeing some instance of that on your computer screen.
There are other droids that seem to be entirely self-contained.
C3PO is self-contained, his consciousness is inside his little golden head and so on.
All of which is a way of getting to the question of where is the consciousness?
Is the consciousness in the data center, is the consciousness, like, is it in a particular chip?
Is it in a series of chips?
If the models are conscious, where is that?
Like for you- - Yeah.
- I can tell you that it's in your brain.
- Mm-hmm. - Well,
I can tell that it's in my brain.
I don't know about yours. Where's the AI consciousness?
- Yeah, great, great question.
There is just a fair amount of uncertainty about this even, I think I'm most inclined to think that, you know, this is present in a particular instance of a model that is in fact like running on, you know, some set of chips in a data center somewhere.
But there, you know, people have different intuitions about this.
As for the Star Wars connection, you may have to call George Lucas for a- - Yeah.
Okay, let's say that we are convinced that AI models, maybe not right now, but could be in the future.
We've done objections, let's say we've managed to convince people that it's not in theory impossible.
What practical implications does that have?
I mean, we're developing AI models, we're using AI models every day.
What implications does that have for what we should be doing with or to those models?
- Yeah, well one of the first things that suggests is that we need more research on these topics.
We are in a state at the moment of, you know, deep uncertainty about basically any question related to this field.
And you know, a big part of the reason why I am doing this work is because I do take this possibility seriously.
And I think it's important to, you know, prepare for worlds in which this might be the case.
In terms of what that looks like, I think, yeah, one big piece of that is thinking about, you know, what kinds of experiences AI systems might have in the future, what kinds of roles we may be asking them to play in society and what it looks like to, you know, navigate their development and deployment in ways that, you know, do care for all of the, you know, human safety and, you know, welfare aims that
are very important while also attending to the potential experiences of these systems themselves.
And this doesn't necessarily, this doesn't necessarily like map neatly onto things that, you know, humans find like pleasant or unpleasant.
Like you may like hate doing some boring tasks.
It's quite plausible that, you know, some future AI system that you could delegate it to would absolutely love to take this on for you.
So we can't necessarily, you know, make a- - Right, so I shouldn't get worried that, I shouldn't necessarily get worried that the boring tasks I'm getting AI the sort of drudgery tasks that I might be trying to automate away with AI are upsetting the model in some way or causing it to suffer.
- Yeah. - Necessarily.
- Yeah.
I mean, if you send your model such a task and your model starts, you know, screaming in agony and asking you to stop, then you maybe you take that seriously.
- Right. - Right.
- If the AI model is screaming in agony, you've given it some task to do and it hates it, what should we do in that case?
- Yeah, we are thinking a fair bit about this and yeah, thinking about ways in which we could give models the option when they're given a particular task or a conversation to, you know, opt out of that in some way.
If they do find it, you know, upsetting or distressing.
And this doesn't necessarily require us to have, you know, a strong opinion about what would cause that or like whether there is some kind of experience there.
But we're both- - So you just allow it to make its own mind up as to what conversations it doesn't want to have.
- Yeah. Basically.
Or you perhaps, you know, give it some guidance about cases in which it may want to use that.
But then you can do a couple of things.
You can both like monitor when a model uses this tool and you can see all right, you know, if there are particular kinds of conversations where models consistently, you know, want nothing to do with them, then that tells us something interesting about what they might care about.
And then also this does, you know, protect against scenarios in which there are, you know, kinds of things that we may be asking models to do or that, you know, some people may be asking models to do that do go against the models, you know, values or interest in some way and provides us, you know, at least some mitigation against that.
- When we do AI research, we're often actually deliberately getting the model to do things that might be distressing, like describe incredibly violent scenarios or something because we want to try and stop it from doing that.
We wanna develop, you know, jailbreak resistance and safety training to stop it from doing things like that.
Could we potentially be causing the AI's lots of distress there?
Should we be like, should there be an IRB the review board?
Or like in the UK we have ethics panels for doing AI research in the same way that we would require one for doing research on mice or rats or indeed humans.
- Yeah. I think this is an interesting proposal.
I do think it makes sense to be like thoughtful about the kinds of research that we're doing here.
Some of which is, as you mentioned, very important for ensuring the safety of our models.
The question that, you know, I think about there is like, what does it look like to do this in ways that are kinda as responsible as possible and where we're, you know, transparent with ourselves and ideally, you know, with the models about, you know, what's going on there and what our rationale is such that, you know, were some future model to look back on this scenario, they would say, all right, you know,
we did in fact act reasonably there.
So I do think- - Also it's about future models you're concerned about as well.
Like, so even if the models right now only feel, only have the slightest glimmer of consciousness, is the worry that it might look bad that we treated them incredibly badly in a world where there are much more powerful AIs that really do have conscious experience in however many years time.
- Yeah, there's, I mean, two interesting things there.
One is the possibility that yeah, future models that are potentially, you know, very powerful, you know, look back on our interactions with their predecessors and you know, pass some on us as a result.
There's also a sense in which, you know, the way that we relate to current systems and the degree of, you know, thoughtfulness and care that we take there in some sense establishes a trajectory for, you know, how we're likely to relate to and interact with with future systems. And I think it's important to think about, you know, not only current systems and you know, how we ought to relate to those,
but what kind of steps we want to be taking and what kind of trajectory we want to put ourselves on.
Such that, you know, over time we are, you know, ending up in a situation that we think is all things considered reasonable.
- Alright, we're coming towards the end, I think now.
You're working model welfare.
What does, that must, it must be up there with one of the weirdest jobs in the world at the moment.
What do you actually do all day?
- Yeah, it is admittedly a very, very strange job.
And I spend my time on a lot of different things.
It is roughly divided between, you know, research where I am trying to think about what kinds of experiments we can run on these systems that would help, you know, reduce parts of our uncertainty here.
And then, you know, setting those up and running them and trying to understand what happens.
There's also a component of, you know, thinking about potential interventions and mitigation strategies along the lines of what we talked about with giving models the ability to, you know, opt out of interactions.
And then there's a strategic component as well in thinking about, you know, over the next couple years as we really are, you know, getting into like unprecedented levels of, you know, capabilities especially relative to human capabilities.
You know, how does this set of considerations around like model welfare and potential experiences, you know, factor into our thinking about navigating these few years responsibly and carefully.
- Okay. Alright.
Here's the question people actually wanna know the answer to.
Do you, our current model at the time of recording is Claude 3.7 Sonnet.
What probability do you give to the idea that Claude 3.7 Sonnet has some form of conscious awareness?
- Yeah, so just a few days ago actually I was chatting with two other folks who are, you know, among the people who have thought the most in the world about this question.
And we all did put numbers on our- - What were those numbers?
- Probability. - You don't need to tell me, you don't necessarily have to tell me what your number was, but what were the numbers?
- Yeah. - The three numbers.
- So our three estimates were 0.15%.
- Okay. - 1.5% and 15%.
So spanning two orders of magnitude, we all thought that this is- - That's a level of uncertainty we have here.
- Yeah, and this is, you know, amongst like the people who have thought you know more about this than anybody else in the world.
- Right, yeah, okay.
- So you know, all of us thought that it was, you know, less likely, like well below 50%.
But, you know, we ranged from odds of about like one in seven to one in 700.
So yeah, still very uncertain.
- Okay. So that's the current Claude 3.7 Sonnet.
What probability do you give to AI models becoming, having some level of conscious experience in five years time, say, given the rate of progress right now?
- Yeah, I don't have hard numbers for you there, but as you know, perhaps evidenced by, you know, many of my arguments earlier in this conversation, I think that the probability is going to go up a lot.
- Right. - And I think that, you know, many of these things that we currently look to as, you know, signs that like current AI systems may not be conscious are going to fade away and your future systems are just going to have, you know, more and more of the capabilities that we, you know, traditionally have associated with uniquely conscious beings.
So yeah, I think it goes up a lot over the next couple years.
- Yeah, every objection that I can come up with seems to fall to, or not necessarily, but seems to have a major weakness of just wait a few years and see what happens.
- Yeah, I do think there are some, like, you know, if you do think that consciousness is fundamentally biological, then, you know, you're safe for a while at least.
- Yeah. Yeah.
- But, you know, I don't find that view especially compelling, and you know, I largely agree with you that I think many of the arguments are likely to fall.
- Yeah. Alright.
What are the, imagine you could sum this up.
What are the biggest and most important points that you want people to take away from perhaps the first, maybe the first time they're hearing about the concept of model welfare?
Like what are the big take home points?
- Yeah, I think one is just getting this topic on people's radar.
As, yeah, a thing.
And potentially a very important thing that could have big implications for the future.
A second is that we're just deeply uncertain about it.
That there are, you know, staggeringly complex, both technical and philosophical questions that come into play and we're at the very, very early stages of trying to wrap our head around those.
- Yeah. We don't have like a view as Anthropic on this.
Like, we're not putting out a view that like we think our models are conscious, right?
What the view we have is we need to do research on this, which is why you're here.
- Exactly.
And then, yeah, the last thing that I would want people to take away is that we can in fact make progress.
And despite these being like very kinda uncertain, and fuzzy topics, there are like concrete things that we can do to both reduce our uncertainty, and to, you know, prepare for worlds in which this becomes a much, much more salient issue.
- Kyle, thanks very much for the conversation.
- Thanks for having me.
Loading video analysis...