AlphaFold: Grand challenge to Nobel Prize with John Jumper
By Google DeepMind
Summary
## Key takeaways - **Nobel Call Anxiety Story**: John stayed home nervous with 1-in-10 odds, planned to sleep through announcement but couldn't; at 10:30 assumed no win until wife urged wait, then Sweden called with life-changing news withheld as Nobel for 60-90 seconds. [02:06], [03:16] - **Biologists' Amazement Reaction**: Researchers checked AlphaFold database expecting to mock AI but saw unpublished structures, tweeting 'How did DeepMind get my structure?'; couldn't believe machine did years of lab work instantly. [09:42], [10:29] - **Sperm-Egg Protein Discovery**: Two groups used AlphaFold to test 2000 sperm surface proteins against egg proteins, identifying one key for fusion; experiments confirmed removing/changing it prevented fertilization, linking to infertility. [14:01], [15:07] - **AlphaFold 3 Diffusion Shift**: Switched to diffusion architecture starting from noisy protein images refined to precise structures, enabling DNA/RNA/small molecules unlike AlphaFold 2's protein backbone focus; de-emphasized evolutionary data for geometry. [16:04], [22:17] - **Unexpected Protein Design Hack**: Users discovered AlphaFold 2 predicts protein interactions by linking sequences with random amino acids in middle, becoming best system for checking if proteins stick without intended multi-protein design. [35:27], [35:57] - **AlphaFold Like Roman Aqueducts**: Like Romans building bridges without Newton's gravity equations or modern jets using wind tunnels despite incomplete turbulence theory, AlphaFold enables reliable biology advances without full interpretability. [26:54], [27:50]
Topics Covered
- AI Compresses Years of Lab Work to Minutes
- AlphaFold Pinpoints Fertilization Protein
- Drop Evolution Data for Biomolecule Modeling
- Prioritize Utility Over Perfect Interpretability
- Protein Design Transforms Beyond Prediction
Full Transcript
I saw this comment from someone on Twitter saying, "How did they get a copy of my structure? How how did they get how did DeepMind get my this thing that I had done and not yet published?" Like
they couldn't they couldn't believe that this was like literally a machine doing years of painstaking work like this all
at once kind of a flash. So you go from this broad hypothesis there's some protein on the surface of sperm that does this. Alpha fold says I think it's
does this. Alpha fold says I think it's this one. And then you go do your
this one. And then you go do your detailed experiments to confirm. And now
you can think about questions like infertility. Now if you see mutations in
infertility. Now if you see mutations in that protein, maybe that's a cause of infertility. Maybe we can think about
infertility. Maybe we can think about treating that.
Welcome to Google Deep Mind the podcast.
I'm Professor Hannah Fry. Now today we are talking about Alpha Fold, one of the most extraordinary technological breakthroughs in modern science. a tool
that has been described as the most useful thing that AI has ever done. And
in truth, that might be an understatement. This is a Google
understatement. This is a Google DeepMind AI system that solves one of biologyy's grandest challenges, predicting the 3D structures of proteins, the fundamental building
blocks of life. Its latest version, Alpha 3, can now model the structure and interactions of all of life's molecules with unprecedented accuracy. And the
impact has been seismic. Alpha Fold has mapped hundreds of millions of protein structures and more than 3 million researchers across 190 countries now use
its database. It is transforming drug
its database. It is transforming drug discovery. And in 2024, the Nobel Prize
discovery. And in 2024, the Nobel Prize in Chemistry was awarded to Google Deep Minds Demisa Salvis and John Jumper who is our guest on today's podcast. Now,
this is a story that we have been following on this podcast since season 1, which was nearly eight years ago, long before it hit the headlines. So, if
you are coming to Alpha Fold for the first time and wondering what all the fuss is about, you can find our previous explainer episodes linked in the description. Welcome to the podcast,
description. Welcome to the podcast, John.
>> Oh, it's exciting.
>> I don't think I've interviewed you since you won your Nobel Prize. Tell Where
were you when you found out about it?
>> I stayed home because I was nervous enough. I thought there was a chance,
enough. I thought there was a chance, like one in 10 a chance. And so I figured I would be disappointed at home and I was kind of just sitting in the bed. My original plan was I'll sleep
bed. My original plan was I'll sleep through it and a phone call wakes me up then I've got the Nobel but I couldn't sleep >> cuz you knew the day that the phone call might happen, right?
>> You know the day. I knew in fact the kind of time that it was scheduled to be announced at 11:00. I knew that winners were called about an hour beforehand.
>> So by about 10:30 I said, "Oh well, I guess not this year." And I told my wife and she goes, "No, no, wait." And as like as she's telling me to wait, my phone lights up with a phone call from
Sweden. And thankfully it was not the
Sweden. And thankfully it was not the world's meanest prank call. And and
yeah, it was just kind of this extraordinary thing. And you answer and
extraordinary thing. And you answer and they say, "Is Dr. John Jumper available?" Yes, I have some wonderful
available?" Yes, I have some wonderful news. Great. Can you please hold and
news. Great. Can you please hold and right so they they get I think >> you hold. Well, I think I think they were trying to re part of the problem was they didn't have either Dimis or my phone number initially. So anyway, they
ended up calling us very late, but then they were finally arranging and they pulled the person on and he says, you know, I have something like I have some life-changing news and they don't say the word no bell for like 60 90 seconds,
which was the longest minute of my life as there's no other explanation for this call. And I remember the very first
call. And I remember the very first thing I did is run to get a shower because I knew I was going to get no time for the rest of the day. But after
that, you know, it was announced, you come in, you see the team, you have this amazing kind of celebration. We we
bought the local waitress out of sparkling wine. Um,
sparkling wine. Um, >> only the best will do.
>> It was it was I'm not a connoisseur and we were celebrating with friends and there was just this incredible kind of party just across the floors of our building. It was amazing.
building. It was amazing.
>> The thing is it's an extraordinary story of you as an individual, right? because
your first PhD, your physics PhD, you you dropped out, right?
>> Yeah. Yeah.
>> And so going from that, which I think must have been, you know, quite a hard experience to live through to like being a Nobel Prize winner and having your tool being used in tens of thousands of
of academic papers.
>> I mean, I will say dropping out was a very lucky thing for me. I was I was doing the wrong thing. I didn't really want to and I so I just left. And
because I left, I actually fell into this computational biology group that was doing amazing work on custom computer chips to simulate proteins. And
then then I go back and I do my PhD now in chemistry by another set of accidents. And I didn't have those great
accidents. And I didn't have those great computers. So why why not get into AI?
computers. So why why not get into AI?
Why not try and use sophisticated algorithms to make up for, you know, a lack of compute? I have to be the the first person to get into AI because of a lack of computational capability rather than an abundance. And then I got lucky
enough to kind of find a job that had something to do with everything I'd ever tried to do in my past and it then it worked out and I get a Nobel.
>> Do people react differently to you now then?
>> Oh, I mean well I think there's there's all sorts of people, you know, they're the people that I did my chemistry PhD with who who knew me as a you know pretty good physicist and a lousy
chemist. Um there was uh you know the
chemist. Um there was uh you know the people that I work with every day and I'm still I think just John but now John with a Nobel so he's busy. But then
there are all the people I meet. I mean,
I would, you know, I'll get on phone calls and like a surprising number of my phone calls start with it's such an honor to speak with you and I and I sometimes think and also with you.
There's a certain type of difference or at least excitement and it's a symbol of this giant AI world and what it can mean in terms of applying AI to solve real world problems. And then you're a Nobel
Prize winner. So you're allowed to have
Prize winner. So you're allowed to have an opinion on anything and it's supposed to be a bit valid even if it's not. And
so people want you to show up to things just so that you can symbolize that a Nobel Prize was won and then and then you're done, which is not a very satisfying thing to do as a scientist, but you have this platform that maybe
you can use to affect how the public thinks about science, how it funds science. So all of these things kind of
science. So all of these things kind of roll together in this wild combination.
I'm I would say roughly at the midpoint of my career in terms of kind of time since undergrad and time since rough retirement. And so I've got to figure
retirement. And so I've got to figure out what to do in the second half and that's a you know that's a fair amount to live up to.
>> Yeah, absolutely. A lot of pressure going on in that thing is is that we're I mean we're still only 5 years on from that that CASP breakthrough really when
AlphaFold 2 smashed the prediction challenge. Did you realize at the time
challenge. Did you realize at the time the potential significance of of the work that you were doing?
>> We were sure of two things and totally unsure of some others. I think the two things we were pretty sure of we all right we were very sure it worked even before we entered CASP we had measured very well
we knew about how we would do in CASP that we understood and we were careful we knew that we had solved this grand challenge but the normal thought in a grand challenge in science is that
you'll solve it and there'll be a great celebration and then you will go build effective useful systems that use the ideas that enabled you to solve the grand challenge and that this was kind
of the beginning of an era I think the real shock to me is, you know, those weights that we train, that system, that piece, that piece of computer software has been so
incredibly practically important to scientists working in this field to this day that that the actual bit of software is used that makes this difference in all these different application areas,
all this different type of science published on top of this as a blackbox computer program. And the extent to
computer program. And the extent to which that has entered into scientific practice has been really, I think, beyond my imagination. Yeah, I mean it's really difficult to to overstate the the
genuine significance that this has had.
I saw one thing where uh Alfold was described as the most useful thing that AI has ever done, right? That hasn't
sort of landed with the public yet, has it?
>> I think it's hard for people to appreciate and you work in science communication >> how very hard science is, how very hard curing disease is. We have to work
extremely hard to get smaller bits of knowledge about how say the cell works, how the body works for protein structure prediction or protein structure, what alpha fold does. I think it's hard for
people to appreciate this process, you know, takes a year in the lab. I've seen
PhD thesis that are progress toward determining the structure of X. And that
doesn't mean they finished it, just they feel like they're a little bit closer and they need to graduate. I think that of one protein >> of one protein of one individual piece and the notion that we'll turn that work
into a machine that gives you a really good answer in 5 minutes and then that enables so much more work downstream of it. I think there's something like I
it. I think there's something like I haven't looked at the recent number but 30 35,000 different scientific papers that site alphafold 35,000 different contributions to our understanding of
biology that build on top of this advance and I think the right kind of way to think of alpha fold is not certainly that we've solved all problems in biology we very much haven't I think
for this slice of biology that cares about what structures in the cell look like structural biology maybe we've made it 10% faster overall across the whole
thing we've amplified this enormous effort in societal expense and then ultimately we will have transformative science and there's certain narrow areas say protein design that are just being
transformed by this understanding.
>> I think one of the ways for me that really demonstrates just how important this was as a breakthrough was the way that biologists reacted when you published the 200 million protein
structures. Just tell me a little bit
structures. Just tell me a little bit about that because you just put it out there right?
>> Oh yeah. Yeah. So the the original release was a bit smaller but was still I think it was like 400,000.
What I remember was there was maybe a week in between when we had put out our code and the real experts were playing with it and they were like this you know this really works on hard problems but all the other biologists were no no
these can't be real hard problems like I work on. And then of course we put out
work on. And then of course we put out this huge database, Alphafold database, and people are like, "Well, let me just see how dumb the AI engine was and click on their protein of interest," I think expecting to make fun of it. And then
they sat there and they were amazed. I I
saw this comment from someone on Twitter saying, "How did they get a copy of my structure? How how did they get how did
structure? How how did they get how did DeepMind get my this thing that I had done and not yet published?" like they couldn't they couldn't believe that this
was like literally a machine doing years of painstaking work like this all at once kind of a flash and what was I think also amazing about it is how
rapidly this turned into a community understanding of what Alphold does what it doesn't do for example not sensitive to single amino acid changes and how to
build this into their workflow and do work on top of I thought it would take kind of years for people to really figure out what's the right way to build it in. How do I make sure that I look at you know the confidence measures where alpha fold is
saying this looks like a reliable answer this doesn't and I this happened within a matter of months that science is a community that people developed this incredibly rapid not totally perfect but
actually really good understanding very very fast and so people were doing excellent science on top of alpha fold you know we released this in I think July or something like that and people
were almost immediately doing really excellent science on top of it that really cool work was coming out by the end of that year. And I think that's just a a testament to how much
scientists are looking for really effective tools that help them push knowledge forward and then when they find it, they use it. Well,
>> how do you stay on top of uh of the work that people are doing with Alvold now? I
mean, because it's so embedded in the way that people do biology now that I mean, presumably they're not sending you an email every time.
>> Thank goodness. Um, I actually end up I still pretty often just put the word alpha fold into, you know, searching on whatever X or and just see the the random work. What I love is the the
random work. What I love is the the random things that pop up that says, "Oh, we use this to do this weird thing." I think the other way, one of
thing." I think the other way, one of the nice things about being at a company is if really cool things happen, someone will notice and they'll post it to the, you know, this chat room that we have to
collect various uh, you know, things that people find cool. And so it's so valuable to collect that experience and it's so much fun, right? You feel this kind of vicarious ownership of a little bit of that work.
>> Well, okay. Tell me then, what is the most random unusual use of alpha fold that you've come across? One that I really love is um this protein in bumblebees and they're trying to
understand bumblebee uh populations, you know, reproduction, but their biology to try and, you know, enable pollination and understand things like
colony collapse. And so, um in fact,
colony collapse. And so, um in fact, there were some important uh proteins involved in the honeybee life cycle that they were studying with alpha fold. And
you can see ultimately how this kind of leads on to be conservation. And I think it's so interesting to see the structural biology of this echoing into all these things that we care about from
food to industrial production to everything else. They're all connected
everything else. They're all connected because it's all the same biology, right? The you know plants, animals,
right? The you know plants, animals, they they have basically the same proteins. We were definitely thinking
proteins. We were definitely thinking most about human health. We weren't
thinking how am I going to help uh you know honeybee populations and but here we are. There was another really nice
we are. There was another really nice story that people were trying to understand uh human fertilization when an egg and a sperm uh meet and come together and eventually fuse.
>> They want to find the exact proteins that were involved in sperm sticking to eggs right?
>> And sperm sticking to egg. And they they I think had the full picture of all the egg proteins, but not of all the sperm proteins. And in fact, there were two
proteins. And in fact, there were two independent groups that did this. And
they said, "Well, there are only 2,000 proteins that we know that are on the outside of sperm. Why don't we just try all of them and see which ones stick to the proteins that we know are on egg?
And if you think about doing this experimentally, that's like, well, I'll spend the next two millennia um you know, a year of time $100,000 each of the next millennium and I'll I'll get a
nice paper in nature is not that's not a feasible approach. But alpha fold is
feasible approach. But alpha fold is pretty fast and they had some computers available. So they tried all of them and
available. So they tried all of them and they both came out with this one protein tm something. I can't remember the
tm something. I can't remember the number, but this was the one that they didn't know what it did before and now they find out it's alpha fold says it sticks to the egg and that this is how kind of the first step of fertilization
and so of course they don't just trust alpha fold right it's a computational system >> so they went and they said well what happens if I remove that protein or if I change that protein they find if they change that protein or remove that
protein then sperm and egg will get close but they won't fertilize so you go from this broad hypothesis there's some protein on the surface of sperm that does this alpha fold says I think it's this one and then you go do your
detailed experiments to confirm and now you can think about questions like infertility now if you see mutations in that protein maybe that's a cause of infertility maybe we can think about
treating that and we go I think from kind of rough hypothesis alpha fold in the middle confirm with experiment and now maybe we can think ultimately about
something like drug design on top of this but we have to get this biological understanding first that we bring meaning to all those pieces in the cell and that's what AlphaFold I think really
helps with in the early stages is bringing meaning to the parts of the cell and then later companies like isomorphic use it in order to build small molecules that have targeted effects. We should talk about the
effects. We should talk about the difference between alpha 2 and alpha 3 though, right? Because I mean alpha 2
though, right? Because I mean alpha 2 was like predicting the structure of proteins from these uh you know strings of amino acids. But of course biology isn't isn't only about proteins, right?
You've got all of these other you know biomolelecules. You've got DNA. You've
biomolelecules. You've got DNA. You've
got you've got RNA. You've got like small drug molecules for example. You've
got ions you know charged particles etc. Like all of these are interacting with each other. Um, so how early in the
each other. Um, so how early in the process did you know that you needed to change the fundamental model of alpha 2 in order to incorporate those additional molecules?
>> So even before the world knew about alpha fold 2, >> we were sitting there and dreaming and and we were dreaming for two reasons.
One is that we had a lot of, for example, proteins that exist naturally in what are called complexes, multiple proteins stuck together. And sometimes
there's no real way to predict their structure without predicting them all together. So we're already thinking
together. So we're already thinking about this multi-roin problems. For example, as you say, bind drugs, small molecules, maybe 20 atoms. You can think of aspirin, right? Sticks to a protein.
And we knew that this was really important. We said, but later. And we
important. We said, but later. And we
started to talk about this kind of dream about this goal. We would call whole PDB, right? So the protein datab bank,
PDB, right? So the protein datab bank, the PDB is the data source we use, but we take it in and we threw away a lot of the things. Oh, this has RNA or DNA
the things. Oh, this has RNA or DNA attached to the protein. Well, let's
throw away the RNA or DNA and just have AlphaFold predict the protein. And
>> because you can't handle those extra molecules, >> we couldn't handle that complexity that we were very like driven by. We have 20 amino acids that produces 20 types of structures and then we will predict and
all our code was kind of based around that. And we're like, it's a challenge
that. And we're like, it's a challenge for later, but eventually we'll start doing it. And one of the things we
doing it. And one of the things we almost immediately realized is a lot of the decisions we made in Alphafold were very good and very helpful and very annoying to extend to more complicated
things. The other bit of work is that we
things. The other bit of work is that we were trying to figure out how to simplify alpha fold and we thought okay well alpha fold is complicated but maybe there are some things we can remove.
>> How has the architecture shifted from alpha fold 2 to alpha 3 then? So there's
a lot of changes but I would say there's two big themes of changes when we were trying to handle much more of the kind of the DNA small molecules etc. We
adopted this thing called a diffusion architecture a different way in which we handle our uncertainty and I can tell you more about that but then I think the other one was really thinking a lot
about the role of evolution and evolutionary data. Well, let me ask you
evolutionary data. Well, let me ask you about that then because because this is one of the things that I remember being quite key to Alpha Fold 2. This idea
that actually proteins have evolved in lots of different creatures numerous times and actually there is like some clues about their evolutionary history that will indicate where amino acids are
likely to end up in the final folded shape. So that even if you're starting
shape. So that even if you're starting with a string of amino acids, you're not going in totally blind because I mean this sort of stuff's happened before, right? like that ended up being quite a
right? like that ended up being quite a a key part of the model, but it was also I think potentially one of the parts that made the model quite I don't know like inflexible to other molecules. Is
that fair? So, Alphafold 2 used evolutionary information in this exuberant way at kind of every part of almost every block it was saying and here's the evolutionary information in case you need it. But a lot of what we
studied in Alpha Fold 3 that we knew we were moving toward didn't have evolutionary information. And so we were
evolutionary information. And so we were shouting at it with nothing. And we were kind of worried that this, you know, was both slowing down the network but also possibly leading to some bad dynamics in
how it works. And so we decided to just take that out of most of the network and otherwise emphasize the geometric information, the thing that really is always there. And that turned out to
always there. And that turned out to work exceedingly well, actually better than we expected.
>> I wonder if there's an analogy that we can use here, okay, for for the difference in this architecture, right?
Let's imagine that you've got a uh you're planning a wedding and you got to do the seating plan for the entire wedding, right? And uh you have all of
wedding, right? And uh you have all of these guests. Those are your amino acids
these guests. Those are your amino acids and you have to sort of work out where each one's got to sit. And there's a couple of different ways you can think about this. So you could think about
about this. So you could think about like, you know, pair-wise interactions.
So like this person is sitting next to this person. Is that a good interaction?
this person. Is that a good interaction?
But then what does that mean for this person over here or that table over there? Um, but you could also
there? Um, but you could also potentially think about the history of what you know about those people. How am
I doing so far as an analogy?
>> I I I can I can go for this. I think you know, you know, some people went to school together.
Some people used to date and had a terrible breakup, right? Those might be >> sit there together.
>> You probably don't want to sit them right next to each other and unless you're really looking for sparks. But
before we just talked about where the wedding guests sit and now we think about where the flower arrangements are.
>> That's nice, >> right? You know that we think about all
>> right? You know that we think about all these other things that come together to become uh the reception dinner.
>> In this analogy then Alpha 2 was very focused on the history of the of the guests, right? We're sort of like
guests, right? We're sort of like continually checking looking at where where they might best fit based on their past. And that's great for proteins, but
past. And that's great for proteins, but it's difficult once you try and include other elements of the reception. And
once you start bringing in other biomolelecules, it you you don't want to focus so much on history.
>> I think that's all true. I think one thing though I would say is that we kind of always made this history available, this evolutionary history and the analogy kind of what we know about their
past. And what we would find is that
past. And what we would find is that AlphaFold we think was not relying on it much other than at the very beginning that it was like saying, "Oh, well these people should probably be together.
These people should probably be apart. I
know a couple of things." But then it kind of trained itself to ignore it. And
so by inspecting and seeing that we're probably not using that information, maybe we should stop kind of attaching it constantly into the processing.
>> But then as a result, you managed to massively simplify the model. I wouldn't
say we massively simplified but we made it massively more accurate and suddenly we were doing new problems and in fact we made light adjustments and then we made a much better model and then it turned out that even that protein
protein problem something that has nothing to do with lians or nucleic acids or anything else even that protein protein problem got massively better from this kind of science and
improvement. So diffusion is this
improvement. So diffusion is this different idea in how you train a neural network. So the alpha fold 2 system was
network. So the alpha fold 2 system was really heavily based around proteins around the kind of shape of a protein backbone. In AlphaFold 3, we went to
backbone. In AlphaFold 3, we went to diffusion where you basically say, "Here's a blurry image of the protein. I
kind of took all of the protein and added some noise, some error, like you looked at it in the wrong prescription glasses, and then it guessed the right answer." And you have it constantly
answer." And you have it constantly refined. And so what this gave us was a
refined. And so what this gave us was a really great understanding of local geometry of how to make things extremely precise because that's what it does at
small scales and this way of tackling big systems. And that gave us a kind of new approach that we didn't have to get so involved in the details of exactly how proteins look because they're different than DNA. They're different
than RNA and small molecules.
And the upside is that it made it really really easy to get kind of handle this wide universe of things that we study.
The downside is that it led to a higher rate of hallucination of weird stuff appearing. And so then we needed to
appearing. And so then we needed to handle that in different ways.
>> Well, this is one of the big differences between AlphaFold 2 and Alphaold 3, right? That you have this like this
right? That you have this like this introduction of the stoasticity of the potential for hallucinations. How much
should people be concerned about that when they're using it? like is there a danger that they think because alpha fold 2 was so on the money that they think of alphaold 3 as though it's some kind of oracle.
>> I think one of the wonderful things about biologists is that you know as scientists they're deeply skeptical of their tools. Alpha fold 2 did have an
their tools. Alpha fold 2 did have an advantage that wrong answers often look stupid. They didn't no one looked at
stupid. They didn't no one looked at that and said that's definitely a protein.
>> Whereas wrong alpha answers in Alpha Fold 3 are sometimes more plausible.
But I think that people have gotten really good, not uniformly perfect, but really good at saying, well, AlphaFold 2 is also telling me how accurate it thinks it is in the confidence measure.
I should also use that. And so it's this kind of social knowledge. There's no
experiment or tool that scientists use that that is without limitation. Yeah.
>> Right. Even experimental structure determination has all these known faults. And so scientists, I think, use
faults. And so scientists, I think, use it relatively well. I haven't seen any from people using alpha fold 3. I think
because it's now such a part of the education and community of scientists that you when you use computational methods here are the confidence measures you look at.
We color our proteins by confidence and ultimately we also think of it as a tool where we'll induce hypotheses and we'll test those hypotheses experimentally.
And alpha fold 2 is not perfect by any means. It's just very very useful.
means. It's just very very useful.
How important is interpretability in all of this? I mean this idea that that
of this? I mean this idea that that humans want to understand why alphafold is folding a protein in a particular way. There's a lot of interest in it and
way. There's a lot of interest in it and you will hear people sometimes make very confident pronouncements that you know we can only use AI systems if we perfectly understand what they do. And
they almost mean if I can write down an algorithm that I could use instead of that AI system. And I think it's this desire of that's an annoying black box.
What if it just wasn't a black box? And
I kind of feel like that kind of narrow demand for we must understand it perfectly is honestly kind of a weird demand. And I think about cases in which
demand. And I think about cases in which we've been perfectly happy not having that in science. One, for example, is just experimental science in general. If
you look at say how someone crystallizes a protein early in crystalallography, it wasn't clear if those structures were going to look just like a free protein floating in liquid. and more experiments kind of said most of the time it's about
right. So we always in science totally
right. So we always in science totally worked in this kind of partial interpretability way. I think there's
interpretability way. I think there's really good applications of interpretability when we think about okay I want to understand the network so that I can change it and make a better version of say alpha folding. I
described some stories earlier about how we do that kind of work. Some people
will say well I want interpretability so I can trust it. M I think more important than that is if you really want to know whether you trust an answer, well, we have pretty good characterization that
our confidence metrics are a reliable guide and people use them in practice to decide when an answer is probably true.
In fact, I'd love to see a lot more interpretability work go on for Alphold.
What exactly leads it to generalize so widely? I think there could be more
so widely? I think there could be more done, but that won't necessarily give people what they think it will give them. I think sometimes if like okay the
them. I think sometimes if like okay the Romans for example were building bridges and aqueducts without having you know a full understanding of gravity right they didn't have like Newton's equations is
is there a way in which alpha fold here is like Roman building Roman engineering but for biology like we are able to build stuff now with the tools that you and your team are creating even though
we don't necessarily have full insight into why they're working >> I mean Romans are are way back in order to do this.
>> I mean this feels great.
>> Think about a modern jet airplane.
>> Yeah.
>> Or a modern car.
>> Yeah.
>> Right. We understand you know the Navier Stokes equations of fluid dynamics etc. A bit about turbulence.
>> Despite that low-level understanding, >> we both build wind tunnels to measure flow and we build simulations that for this precise wing geometry show you how the air goes over it. Mhm.
>> I think the Roman bridge building is maybe a better analogy of how we do AI development where we have some intuitions just like the Romans had intuitions and they built some beautiful
bridges but they didn't have all the equations and full understanding and yet they built things they needed and yet they were able to drive carts across the bridge they built right so we are in
that sense operating partially intuitionally in AI but downstream the people using tools like Alphafold I think It's more like having a great computation package that maybe you don't
exactly understand. Your expertise is
exactly understand. Your expertise is not exactly in how this airflow results in turbulence, but then you figure out how to change it and adapt and you work with this tool to do your larger scale
science. And I think that's that's
science. And I think that's that's really the slightly less Roman version of what uh AI users are doing.
>> Hey, look, the Romans there's no shade on the Romans.
>> What have the Romans given us? Yes.
>> Exactly. Exactly right. Okay. Okay.
Well, let's talk about some of the those downstream applications because um earlier this year I got to speak to Maxi Edber and and Rebecca Paul from Isomeorphic Labs um about how they're using, you know, your your alpha tool in
in drug design. Um what has that been like to see this like thing that you guys built actually being implemented in drug design in that way?
I think it's just really extraordinary to see it carried so far and to be a part of, you know, one of the things about drug design is it's not just protein
structure prediction. I like to remind
structure prediction. I like to remind people that a protein structure costs about $100,000 and a drug costs about a billion, >> right? So that that can tell you that it
>> right? So that that can tell you that it can't all be protein structure determination. I think it's really
determination. I think it's really exceptional to see people trying to build on and take these ideas further
and really find also a way in order to integrate it into application. We see this across the farm
application. We see this across the farm industry like how are we going to build processes around this that enable us to ultimately end up with molecules that are dosed in patients that pass all
these different things. You know, some will help with how does the molecule stick or what is the biology? Is this
protein a target at all? And some that we have, you know, very little to do with, you know, will this drug be metabolized in the liver, right? Maybe
there's a protein small molecule interaction that you can use to help that. But for the most part, alpha fold
that. But for the most part, alpha fold is probably not the tool. And I think it's really important that we have, you know, both the work in how do we understand biology and then the work in specifically how do we make molecules to drug targets. It's an exciting
drug targets. It's an exciting combination.
>> I think one of the things that I hadn't quite appreciated until having that conversation with them is is like actually, you know, finding a molecule to bind to a particular protein target
is such a small subset of like curing disease, right? Like I mean Alzheimer's
disease, right? Like I mean Alzheimer's is an example where we know that proteins are involved but there isn't even necessarily a place to to to target yet. Well, we don't even know. We still
yet. Well, we don't even know. We still
don't know if amloid beta accumulation is >> in the causal chain. Is it a symptom?
>> Right? That's a protein. But breaking
that up, I mean, there's been some, you know, starting to see a bit of effect. I
think it's it's an example of one of the most important things to say if you think about finding a drug that sticks to a protein, finding a drug that's non toxic, at least for the most part in
animal models, finding doing all these hard stages of early drug design that we rightly say are very very difficult, take people off in years. But the bigger problem to me is like even when you do
all that, 90% of drugs fail in clinical trials. So even though you do all those
trials. So even though you do all those things right, they still don't work or they're still not safe, right? We determine this experimentally. And a lot of this is our
experimentally. And a lot of this is our grand ignorance of biology, right? That
we don't know the causes of Alzheimer's of autism. Um even when we have ideas of
of autism. Um even when we have ideas of cause you know for example Huntington's with very clear genetic uh coralates still making a molecule that actually
makes those patients lives better is so very difficult. There are so many giant
very difficult. There are so many giant problems left. We're only starting you
problems left. We're only starting you know this world of computational biology or well continuing let's say but still there's so much left.
>> Well let me understand that then. So, so
give me an example of a way that alpha fold can be used to help understand disease.
>> You know, one case study I like uh from Alphafold, one that uh is somewhat recent, people were trying to understand how cholesterol is moved around the
body. There is this protein that is
body. There is this protein that is involved in the transport of fatty molecules from one location to another.
I believe it is also found in some of the plaques that build up that are correlated to heart disease. And even as we start to understand that biology and we have this nice piece that Alphafold
contributed the detailed structure of this molecule that they could only take an extraordinarily fuzzy picture with a method called cryeleron microscopy which is not an uncommon outcome for that
technique. But then that fuzzy piece
technique. But then that fuzzy piece actually really well matched with the alpha fold structure. So now you can say, okay, well this is that thing that is moving cholesterol. Maybe I can interfere with or change how it moves
cholesterol. Maybe I can add a small
cholesterol. Maybe I can add a small molecule. But of course, your first
molecule. But of course, your first thing you might say, whoa, that's the protein. Why don't you just add a drug
protein. Why don't you just add a drug that blocks it? And I think you would immediately find out that would be really bad. Your body didn't have this
really bad. Your body didn't have this protein by accident. The purpose of this protein is not to cause heart disease, right? The pro purpose of this is to
right? The pro purpose of this is to move fatty molecules where they need to be in the cell.
>> You sort of need that, right? You sort
of need that. You're going to sort of need that. So, what you're actually
need that. So, what you're actually going to need to figure out how to do is how might this gives you some new ideas to change how this behaves in the cell
without killing the patient and making their life better. And I think alpha fold is a part of that story. It's not
the end.
>> It feels like there's a natural next step to all of this though. If you are like predicting the shape of proteins and then using those models to like interpret the function of proteins in the human body, does it then go on to
like designing new proteins?
>> Oh yeah, people have wanted they've looked at these beautiful proteins and they said, I wish humans could do that, right? And so there's been all this
right? And so there's been all this exceptional work and in fact you know a lot of it done at David Baker's lab who with Demis and I won the Nobel and
Alphafold has actually been shockingly transformative at this at saying well how do we go from now we've built these computational systems that understand it how are we going to design our own
proteins and in fact >> you know a large portion of new approved drugs are proteins normally antibodies discovered initially in very you interesting ways injecting mice or
llamas with something that you want to build a protein against and using their natural immune system to find it. But we
are starting to talk very seriously about how are we going to design proteins to have the effect we want. And
it turns out that the most important part of that is that you can design many things you think might work. It's
extraordinarily, you know, it takes a long time. It's difficult. It's
long time. It's difficult. It's
expensive to test in the lab. So what's
been so important there is using Alphafold as a proxy for nature is trying to say how do we integrate AlphaFold's understanding of how proteins stick together when they do how do we use that
to maximally make a signal for protein design and people have gotten extraordinarily successful they've gotten really really good at getting proteins to stick just where they want.
>> Wow. Okay, but hang on. So because that wasn't the original intention of AlphaFold to like see how proteins stick together.
>> Wasn't the intention of them to see how they stick together. In fact, that was an early surprise from Twitter where two different people said, you know, if you want to know if two proteins stick together. Yeah, we we were busy making a
together. Yeah, we we were busy making a multi-proin like properly done system.
They said, well, just take those two proteins and put some random amino acids in the middle and see if they stick together that way. And that was the best system in the world for seeing if proteins stick together.
>> Wow. We didn't think we were making a system that could help people design proteins in a really deep way. We
thought we would go use this fundamental breakthrough and go on and do it. And
then people said actually it already works. And I think it was this grand
works. And I think it was this grand story that does show up again and again in AI. So maybe we should have expected
in AI. So maybe we should have expected it that if you train a model to be really really good at a task, it has to learn a lot of deep facts. Say if you want to be really good at structure prediction, it learns some deep facts
about how proteins interact. And if you do just the right experiments, you can kind of access that knowledge. But there
was this whole field of what I think people started to call alpholdology where people would find out which which things worked. They would just treat it
things worked. They would just treat it as this really cool black box that they could start experimenting with and try their own ideas on. I think there was a lot of really great science that has been and continues to be done in that
vein. you know, we're still figuring out
vein. you know, we're still figuring out and there's starting to be work on like how do we make enzymes, proteins that do chemistry? How do we do really
chemistry? How do we do really complicated sophisticated stuff? We're
still, you know, nature would still laugh at us on our ability to design proteins. But we are starting to develop
proteins. But we are starting to develop these really interesting tools that are maybe therapeutics that are also maybe ways to interrogate the cell that you can bring two proteins together and see
how the cell changes because you do that. I think we'll get this interplay
that. I think we'll get this interplay in not only the tools we use for therapeutics but now our ability to poke the cell in exciting ways to interrogate it and every time we develop that we'll
develop a more interventional understanding of the cell that we will bring forward to medicine and synthetic biology.
>> Is this alpha proteio that you're describing here? So alpha proteio is uh
describing here? So alpha proteio is uh uh Google deepbind's internal effort to do protein design and thinking about problems in binding and enzymes and
really trying to figure out how do we get especially for these super hard problems how do we get reliable systems and I think what we're seeing is in the
design space still is a lot of success but when you're actually designing proteins you have to go to the lab and test them there's no other way to find out and in finding out the right ways
that to predict if they're going to work and the alpha proteio work has shown that we can get further and further in doing this.
>> Give me an example then of some of the type of proteins that that are the target right that like would be nice to be able to design.
>> You know what I think if you ask any protein designer they will have a favorite and their favorite is really can we make proteins do things like carbon capture? Can we actually um you
carbon capture? Can we actually um you know build enzymes that meaningfully contribute to you know address climate change? I think other ones that you
change? I think other ones that you really see for example degrading microplastics or environmental plastics.
I think one of the things also though I'll say as a caution is that for all of these when you talk about doing a real application just like people's conception of drug design is get
molecule to stick drug design done right and that's not the case right there's so many more properties you need you need it to be tolerable in all these ways you need it to be pill formulatable you need all these other
>> things similarly in enzymes you might think oh well you just need to make this reaction happen an enzyme right is a protein that catalyzes a chemical reaction but No, actually you needed to be able to do this many times, right?
Enough that you're not constantly having to make new proteins for each reaction.
You need it to be fast enough. You need
it to not do certain other reactions.
You have all these other properties. And
I think there's a lot more to be done as we think about going from oh maybe this is kind of interesting to this really really works. Although in fairness
really works. Although in fairness actually interestingly on kind of synthetically evolved enzymes we are people are already using them. You know
there's a lot of washing powder that has designed proteins which I find fascinating. I think one of the few
fascinating. I think one of the few applications of design proteins and something people would recognize.
>> Yeah absolutely. How much harder is it though to sort of engineer biology than it is to kind of design than it is to just predict?
>> I'm very empirical. You should ask me in 3 years and we'll know. It's easier and it's harder. One analogy I like to say
it's harder. One analogy I like to say is if you were trying to figure out what uh you know an object is and you might say, "Is this a bicycle?" And I would see two wheels, a chain, some
handlebars, and I would say, "Yeah, that's a bicycle." But having two wheels, a handlebar, and a chain doesn't make something a working bicycle. So
when you're designing something, you have to get all the details right enough that it actually works. And I think we're still figuring this out in proteins. And right now you know protein
proteins. And right now you know protein structure prediction is you know let's say solve star right it's a very very useful system it's not perfect design is not yet solved but I think that it is
advancing rapidly and I don't think we'll be still talking about protein design is incredibly difficult in 15 years >> well okay let let's just zoom out a little bit on AI and biology more generally because um this whole
conversation it's reminded me of something that you said when we last interviewed you a few years ago um and I've got a little clip that I can play you what you said.
>> I think it's really important to remember that these are really powerful techniques that we've developed that are still far short of a real artificial intelligence that you can talk about
thinking and making decisions and everything else.
>> I think that's so interesting. So that
was 2022, right? I I wonder how you reflect now on on that. Do you think that machines are beginning to sort of understand biology in an intelligent way now? Have you changed your mind?
now? Have you changed your mind?
>> I think that whether or not they can think, they're extraordinarily useful for solving problems. How far they are from AI or AGI, I think that's almost
beside the point. I think the really, really interesting point is where we can characterize these systems as reliable enough. Do we find useful things for
enough. Do we find useful things for them to do? I think we need to be much more kind of utilitarian about it. And
certainly machines like AlphaFold, I wouldn't necessarily apply the word think. And I don't know if we're in the
think. And I don't know if we're in the situation, right, that we used to say, okay, you know, intelligence was playing chess.
>> And we should work on chess because once we have machines that play chess, we've basically got intelligence. And of
course, we got machines that played chess really well in a superhuman level in the in what 1994 was the Kasparov
match. and and that wasn't the path that
match. and and that wasn't the path that led us uh to machines that can read and write. And so I think we we always reach
write. And so I think we we always reach for these problems and say well you know this is the problem or like you know people rather optimistically name something humanity's last exam problem
so hard that if you solve it there's no point in posing problems to machines anymore. And I'm very interested in how
anymore. And I'm very interested in how do we find those problems that turn out to be so easy in a certain sense that we can do incredibly on them and build very
useful systems before we build AGI.
That's those are the kind of science problems and of course you want to use related techniques to the people trying to build AGI. They're powerful
techniques but we don't have to get tied up in the philosophy. We can just build useful systems. In fact, I think the whole kind of industry is thinking a lot about how do we build useful systems that matter for people doing software
development that matter for people doing writing that expand the nature of the problems we solve and then we'll see if we in if we end up with AGI but we will certainly end up with useful systems.
>> So how about the most useful system of all in biology? I mean you have deep mind you might have have all of these different systems for lots of aspects of human biology like you know alpha old alpha genome you know alpha proteio and
so on. Can you bring those together in a
so on. Can you bring those together in a single system? I mean, is there a goal
single system? I mean, is there a goal here to sort of build like a a a simulated cell? You know, I used to work
simulated cell? You know, I used to work in simulation and simulation is that I will write down the rules for how all the little pieces do their little thing locally and then I'll put it all mash it
together and turn a big crank and then I will I will get it. But we don't even have a parts list >> for the cell. You have all these effects that I think are not going to give us
like a classical simulation simulated cell. I think what we're going to do is
cell. I think what we're going to do is build really useful systems that draw information from alphafold that draw information from the literature that draw information from the genome and use
that to say really useful things about biology that matter. And I think quite possibly actually one of the core technologies of that will be finding the right fusion of what we understand in
narrow AI systems and what we're understanding about broad machine learning in terms of large language models.
>> Well, so is that I mean how do you bring those systems together? Is is there sort of ideas from large language models that can be applied?
>> I think very easy to say is oh well we'll just have your large language model call alphafold.exe
as a tool. But I think there are all these other problems like okay well if alpha fold produces a structure can these large language models actually understand structure really well to what extent can they understand these 3D
coordinates as well as a human better than a human how do they bring in information from say DNA sequencing from all these others I think it's far from trivial like how do we get these deep
integrations so that a model can understand as much about proteins and protein structure as alphafold but also understand the entirety of say the biology ology literature. I'm kind of hopeful we'll get there, but we have to
build it.
>> Do you think that there are aspects of biology that that are going to resist computational prediction?
>> I think I think there will certainly be aspects that you know, if you ask deep questions about evolution, >> you know, or the origin of life, right?
What what data are you learning from?
What experiments? You're going to have to draw data very far away to answer that question. You might know something
that question. You might know something about chemistry. You might be able to do
about chemistry. You might be able to do these experiments a bit faster, but you're certainly not directly learning from data. Or we talk about evolution
from data. Or we talk about evolution and we draw phoggenetic trees, but ultimately we just have the DNA of the species that exist right now and a little bit into the past. These kinds of
things I think will will be very hard. I
think the other thing that though will happen is that as we build these AI tools, the space of kind of reasonable hypotheses will narrow that it will say probably not that for this reason,
probably not that and our experiments will be better.
>> You know, in a certain kind of basian sense, our prior over what are reasonable biological answers will narrow because of our computational tools and experiments will help resolve
them. And I think this interplay will
them. And I think this interplay will get tighter and as we do more experiments or as we use AI to do things like protein design that gives us more
tools to poke the cell, then we will learn more and we will do more. But I
think we'll just see some things will be harder and some things will be easier and the easier things will happen first.
>> The easier things will happen first.
John, thank you so much for joining us.
>> Looking forward to all those easy things falling.
I think it's really easy to have a very romantic idea of science, right? That
it's about uncovering the hidden truths of the universe. That your aim as a researcher is to build this picture piece by piece that can help to understand the mechanisms of life. And
that I think is what makes John's ideas about interpretability completely fascinating because that is turning things completely on their head. You
know, Alphafold is unashamedly not about the why here. Instead, this is this is a tool that can just reliably be used to accelerate the work that scientists can
do. And then I think when you remember
do. And then I think when you remember that John Jumper is only halfway through his professional career as a scientist and he's already got one Nobel Prize, you realize he isn't necessarily
defending an old paradigm here. He is
literally building the next one. And if
John's focus is completely on utility rather than understanding, when the person who built the most useful thing that AI has ever done tells
you that that is what really matters, well, you have to wonder if he's just showing all of us where science is headed next. You have been listening to
headed next. You have been listening to Google DeepMinder podcast with me, Professor Hannah Fry. If you have enjoyed this episode, then please do leave us a comment or a review. And I
should tell you that coming up we have interviews with two of Google Deep Mind's co-founders, Deis Arbis and Shane Le. Trust me, you will not want to miss
Le. Trust me, you will not want to miss them. So why not take the opportunity to
them. So why not take the opportunity to subscribe to our YouTube channel. See
you soon.
Loading video analysis...