TLDW logo

Toward Causal AI - Elias Bareinboim

By LoG Meetup NYC

Summary

## Key takeaways - **AI Lacks Causal Reasoning**: Current AI systems suffer from lack of explainability, data inefficiency, poor generalizability, and uncontrollability because at the core of these challenges is the absence of some kind of robust causal reasoning. [02:43], [05:47] - **Ladder of Causation**: The ladder of causation has three layers: observational data from passively seeing the world (associations), interventional data from doing actions (P(Y|do(X))), and counterfactuals from imagining different realities. [09:26], [10:30] - **ProCausal Hierarchy Theorem**: Every causal model induces the procausal hierarchy (PCH) with layers of increasing expressiveness, and the Causal Hierarchy Theorem (CHT) proves that higher layers are strictly more expressive than lower ones, analogous to Chomsky hierarchy. [10:11], [15:13] - **99% Data Observational**: 99% of available data is from layer one (passive observations), while interesting inferences require layer two (interventions) or three (counterfactuals), posing the challenge of using observational data for causal questions. [16:57], [17:52] - **Generative ≠ Causal**: Fitting a generative model perfectly to observational data does not guarantee valid inferences for interventions, as seen in examples where age-gender correlations produce flawed counterfactual image generations like mismatched aging or gender swaps. [19:07], [22:07] - **Causal AI Fixes Flaws**: Causal AI enables understanding, explanations, efficient decision-making, robustness, surgical interventions, and counterfactual generation even with mostly observational data. [28:14], [28:50]

Topics Covered

  • AI Scaling Ignores Causal Fundamentals
  • Current AI Lacks Causal Reasoning
  • ProCausal Hierarchy Trumps Associations
  • Observational Data Can't Yield Interventions
  • Causal AI Enables Surgical Interventions

Full Transcript

Today I'll be talking about um some kind of general view I think general view about the field uh very related I think match with the graphs that kind of graphs is a kind of key object here to

our discussion and um and a interesting way of encoding assumptions needed to do causal inference Um

the and we are kind of very excited.

What? Yes. Sorry.

Yeah. The okay the we are very excited to um with recent breakthroughs that happen in AI. Um

which the way that I interpret is like systems are able to perform extremely well uh uh in making predictions in high dimensional settings. It has been

dimensional settings. It has been ongoing for a few decades and now many things start working. We start we're able to scale up at some kind of crazy

dimensions which leads to some sort of emergence and uh and many good things happening. Uh in particular um there has

happening. Uh in particular um there has been huge progress in fields of NLP computer vision and reinforcement learning. then some kind of perceptual

learning. then some kind of perceptual capabilities, vision and and uh and language and some kind of decision making that is related to reinforcement learning. I think most of you may have

learning. I think most of you may have heard about it that is the uh machine learning way of doing sequential decision making. There are other fields

decision making. There are other fields that are have different names for that.

Uh applications are everywhere from this is just some of them from medicine to to business agriculture to space exploration and that's kind of really great uh and exciting. Now the question

that I would like us to ask and I ask myself and in my group is like does this means that we are done and done done here um in the sense of it's more if you

scale scaling more and doing a little bit more of the same maybe it took a few dozen or few hundred PhD thesis uh maybe a little bit a few billion dollars in

terms of compute uh primarily and uh and eventually we'll get there we have the most general type of AI that we And in order to try to answer the question

just for the sake of exercise uh I would say the assuming that we have infinite amount of computation uh and data stick with me let's try to do the the the

gadunking the thought experiment in some way gdunk and experiment um and if not uh what's missing now like to see the other side of the the

coin here in terms of the excitement there are still very pretty much some fundamental challenges here related to AI. Current AI systems suffer from a

AI. Current AI systems suffer from a lack of explanability capabilities.

There's very low understanding. We like

to formalize that. We have maybe some mathematicians in the room. We like to have some notion of understanding and these systems in general are not performing very well explanations at least explanations about the world. I

would say there are two types but I I'll defer for now about the world and about the system itself. But anyhow they they are kind of unfair and unethical decision making potentially there are uh

they are very data inefficient.

Sometimes you need billions of data points to do something that humans could do in a much uh uh at smaller scale. Um

and they are very poorly generalizable.

They can kind of break very easily. If

you are an expert you can kind of craft examples that the thing will break. Um

and also they lack some notion of controllability.

Um in the sense that not even the people that are u some people is kind of PhDs here that when they work to these companies or they're at Colombia the the

no one has idea what the systems are doing in reality. You have no no way of getting inside some kind of complex latent space change something and making the system go in some direction. Very

hard to do that. Now I'm not the only one noticing that. This is just we can kind of Google at random at random meaning uh in general and say this is just from CNN that I got recently. AI

could pose ex extinction level threat to humans and the US must intervene. State

Department commission report warns.

These are not the one from the uh nature briefing saying AI and robotics briefing. There is a five 5% risk that

briefing. There is a five 5% risk that AI will will wipe out humanity. The

number is cute, right? It's like what's this number mean? But but still I would say there's a nonvaging probability of something uh potentially catastrophic or

bad happening. Um

bad happening. Um this is uh human humanity face a catastrophic future if you don't regulate AI. I have my own opinion but

regulate AI. I have my own opinion but but but in any case godfather of AI Yeshua Benjo says then there is some kind of recognition.

Again I'm not the only one saying that there are potential problems there. uh

we can discuss what is the degree how how uh high you are screaming what is the right balance and what but still there is a recognition that there are problems on the technical side this is

kind of a toy long-standing problem some of them uh I don't know interpretability and explanability unfair and unethical decision- making data inefficiency or

data hunger uh very lack of robustness engineer ability and so on the this is going on for sometimes just the scale is crazy at the moment. um which anyhow uh

it seems to raise uh more red flags right people can scream a little bit or substantively higher and the question uh that I ask myself is like do these

problems have anything in common now is here EB uh uh observation uh but I think it's quite natural um my bias way but at the core of these challenges is the

absence of some kind of robust causal reasoning um then there's a kind of underlying thread here and I would like to us to try to move towards having some sort of

science of AI and I I I think that causality uh is the at the core of that.

Um it is true that you could do and there's more discussion here uh maybe in the coffee break even though I need to leave but it is true that you could have

cute airplanes before you have some kind of aeronautics aeronautic type of engineering and we push that for a few decades and kind of very cute and good

results but it did require some type of science uh of how to fly in order to put a airplane in a 20,000 feet uh above the I don't think that we are there in terms

of AI. It's like it's a kind of more

of AI. It's like it's a kind of more cute slightly clunky but very exciting time that you can see the airplane or the AI start doing something. Um but but but I don't think that is of course and

I elaborate not only about scale right it's uh we need a science or something more fundamental about how to move from the skilled uh Wright brothers uh or

Santos Don uh uh folks that were able to do some kind of airplane uh in order to make the the commercial airplane that you see today in other words I don't think that building a it's like

entertaining here like more for I don't think that building a a thinking machine is easier uh uh no easier than building a flying

machine and we are playing as a cute let's do this thing as in a more uh trial and error type of things exactly as we did with airplanes I'll share link for my book I discuss a little bit of

history in the beginning of my book textbook in in causality but anyhow the the h how do we move there what is insight from the area of uh causality we

liken some kind of more more form formal way to to model the agent environment relationship and use some kind of causal uh language

for that. We'll separate here these

for that. We'll separate here these these two concepts the the quote unquote real world and the the agent itself. Now

there is some kind of interaction between these things. Um and the interaction will be modeled through formal language. Um in particular there

formal language. Um in particular there are these three um three qualitatively different types of interactions between

the agent and the environment. Um one is about we are kind of passively let me see where's my mouse here. Oh we are kind of passively observing the world

and there's kind of sampling from there.

Um the without any interventions uh it's called observational data. Most of the data today uh I'll come back to that.

The second one is about we are intervening in the world and seeing how things change uh eventually and the other is like we are doing nothing just sitting and kind of imagining different

versions of reality different counterfactuals as we say this um quote from touring what we want is a machine that can learn from experience now the

question is like we like to ground experience is very beautiful the poetry but the question is like what experience means uh my hypothesis that is of people in the field is about it is related to

grounding these different types of experiences. Um this is also

experiences. Um this is also acknowledged um reference here called the book of life. You have heard about that by by Uda Pearl and uh Dana

McKenzie and they call this structure the ladder of causation. You have this increasing refining from passively seeing the world by intervening in the world by doing versus by imagining

different versions of the world that may not exist uh at the moment. Now the just elaborating a little bit about what is this hierarchy every time that you have

a causal model that mathematicians don't shoot I will not formally define here uh because it takes some time but once a causal model for us will be a collection of of of mechanisms

and a probability probability distribution over the exogenous conditions the border condition parts outside the model and every time that you have a causal model it's like we can we can consider physics as a causal

model. Every time that you have a causal

model. Every time that you have a causal model, this induces this ladder of causation which I call this is this bar and boy 2022 I call with Thomas Iicard at Stanford and some of our students the

we call the procausal hierarchy just in honor of pro given that he was the one that acknowledged that again the layers here just elaborating a little bit. The

first layer is the one about associations about seeing. Um the kind of question that humans will ask related to that is like what would seeing act

change my belief in the proposition why.

Um in machine learning there's a counterpart powerful counterpart for that that is uh called supervised and unsupervised learning. Um all the

unsupervised learning. Um all the formalisms that we have called basian networks decision trees depend on the decade we have a different oneot vector machine the deep neuron nets there are

different language or different ways of implementing ways of evaluating syntactically expressions like that p of y given x it's true that at the moment you can have the x that is 1 million

dimension and uh and the and the y may be the next word or maybe um maybe the label of the image and so on the next frame uh but but still we are trying to evaluate this probability distribution

or function of this probability distribution. This is layer one. Layer

distribution. This is layer one. Layer

two is interventional. Uh syntactically

it also has a different type of signature. It looks like P of Y given do

signature. It looks like P of Y given do X comma C. We kind of extend a little bit the language of probability uh probabilities to account for that. Do X

means that we are doing the action. We

are not passively observing the X at this level. What if I do X uh what if I

this level. What if I do X uh what if I take the aspirin? Will my headache be cured? is not that there is some kind of

cured? is not that there is some kind of correlation between the aspirin taking and being okay without headache. Um

there's a counterpart of that in machine learning as well that's called um reinforcement learning that I mentioned earlier. um different languages to

earlier. um different languages to express uh uh knowledge and in once we express then you have the corresponding

inf inference tasks causal basia networks mdps macov decision processes the partial observable pdps and so on pom dps it's called um

and then you have the third layer that is the counterfactual layer that's called uh uh uh is given by this type of syntactical symbol very different than

the previous ones. Note here that we have this x prime and y prime x everything that is after the conditioning bar here is a mix. It's a

the notation everything in the first one is just that we are passively observing the event big x the random variable x is equal to this particular value little x.

But now we have the dox here is a mix notation because this is observation this is the intervention. This some

people use including me depending on the day. This goes usually to as a subscript

day. This goes usually to as a subscript to the probability distribution because you are indexing different probability distributions and the notation may be a little bit funny. David BL is here maybe he's sometimes like semicolumn here

instead of the conditioning there's different notations but the point is like this is operator and there's different probability distributions here that we're talking about. It's not only one. It's a different type than basian

one. It's a different type than basian conditioning or base conditioning. But

here's a different beast. The third

layer that we have this big X is equal to X prime. Let's say Joe took the drug that is X prime and Joe big Y is equal to Y prime. Joe

is dead.

This happened in the in the factual world. Now you can ask counterfactually

world. Now you can ask counterfactually would Joe be alive that is why that is the opposite of Y prime had he not taken the drug that is X that is different than X prime it's a clean clink can

cannot get more counterfactual than there are than that in the sense that of course we cannot do that in the world let's say that he he died then we cannot kind of bring him back from the other

side one millisecond before you give the drug we submit him to the non-drug and kind of run the simulation code on content simulation in the code of reality and see how things work. We

cannot do that in general. Right? Then

this is why it's called counter contrary to the factual world in some way. It's

very related humans do all the time very related to idea of imagination or the concept of we are imagining these different realities. What what could I

different realities. What what could I have done differently? I didn't but this may be giving very detail important information that can guide us in the future. uh very related in the real

future. uh very related in the real world to the notions of blame, responsibility, credit, assignment and understanding in real in reality.

There's no no counterpart from that in the ML literature at the moment. Not not

doing paper review here in general, but uh it's um it's not represented. It

should be represented um the language that you can use to express that as a structural causal model. The

formalization of the causal hierarchy, the PCH, provides a way to measure the capability or the expressions of different formalisms with respect to increasingly complex queries or types of

questions. I'll defer here, but SID

questions. I'll defer here, but SID there's a result called causal hierarchy theorem. Um, yeah, I I'll just leave the

theorem. Um, yeah, I I'll just leave the name for now. Um, some kind of impossibility result. the the the the

impossibility result. the the the the hierarchy I I don't know how many computer scientists we have in the room but very the analogous that the way that I like to think is uh when I was

undergrad we were doing formal theory and you do this chsy hierarchy between studying I don't know regular languages context sensitive languages and you keep going up touring

machines right the cursor inable and then the here is the same but in terms of understanding you keep kind of keep going up and you can kind of more and more expressiveness in terms of

understanding and there is this this CHT the hierarchy theorem is not CLT CHT in some way is saying that this is more expressive than that and this is more expressive than that there is some kind

of pumping lema if you theory at some point more complicated version of that but there is some kind of we can show that the languages are increasingly more expressive now what I would say like

what is hard here what's the issue is because everything makes sense now I'd like to say the kind of inferences that We are interested is kind of different than the the traditional ML inference

and different difference different how I just removed layer three here but the same happened there we can we can imagine and extrapolate extrapolation by generalization is very related to the

layers as well I digress um but here the the classical challenge here is about just think about layer one and layer two most of the data available today is coming through perception through the

same to passively observing the world unfolding and things happening. um uh

Aaliyah's numbers 99% of the data available in the world today that we have huge amount 99% is coming from layer one is a tiny fraction that is coming from

layer two I don't know the FDA here in the US decides to run this clinical trials you spend I don't know $3 million three five million between three five 10 million to run a trial and in reality

running a trial and having the whole machinery around it means that you are able to sample from the layer to distribution It's quite complex in many settings to get data from layer two and sometimes

even not realizable in terms of layer three despite it existing. U but point being that most of the data is available coming back to layer one and most of the inferences about the world or at least I

would say interesting ones are about layer two or layer three. We like to know effect of policies, treatment, decisions and so on. Um then the

research question here methodological or foundational is how to use the data collected from obser from passive observations that are coming from layer one to answer questions about

interventions, decisions, policies uh or even counterfactuals as well but let's stick with layer two here. How is this possible in the world at all? Right?

Because note here the everyone is in the world of probabilities. I assume let's call that the the distribution that you are sampling layer one let's call that this is distribution P probability

distribution P and in reality we are trying to the the inference about layer 2 that's a probability distribution P prime then I have 1 million or 1 billion data points that I sample from P our

goal is to make a statement about P prime that maybe you have zero data points how on earth can you do that right it's a in principle those are two disconnected objects

Now I would like to frame that in terms of generative modeling here. Jacqueline

and I were chatting before. I think it's a interesting topic, right? The genai

type of how is this related to geni? In

principle there is this belief that if it's um if it's generative, it means that it's causal in some way. And I would say it is the other way around. It's not an

equivalence. If it's causal, if it's the

equivalence. If it's causal, if it's the physics or something like we will be able to generate things, but being able to generate doesn't mean that it's causal. Let me elaborate a little

it's causal. Let me elaborate a little bit on that. U let's say that we have the unobserved nature that there is this causal model.

The MAR has the potential of generating the PCH that is this bunch of probabilities distributions layer one red layer two yellow and layer three green that's great well defined clean

but now reality kicks in and we are unable to observe yellow and green this oh this is an observe now I have a

neurom model very big neurom model complex need a cluster to run uh in principle it has the potential of generating these different things because it's if it's complex enough you

can generate almost anything. Note here

that I'm leaving it open empty. Now you

can use the lay the data that we got the red one that is observational to train this model. Sometimes it takes a week a

this model. Sometimes it takes a week a month sometimes six months dancing around the cluster and making things to match. But I can do likelihood. We can

match. But I can do likelihood. We can

do GANs. You can do diffusion. Whatever

method you have to fit the data, you fit and eventually this guy wake up and well it fit. And then we're very happy with the question because it

was huge effort. But now the question is like can you make any statement about the interventional uh what would be the intervention had we done the intervention in the world? Right? We

didn't do it. We don't have yellow. Now

how h how how is this possible? Now this

is the call the fundamental problem that is the from from the the generative perspect perspective under what conditions inference in m hat that is in the right hand side are valid. In other

words when do they cover the distributions induced by the true m star that is in the left side.

Um of course we can ask the same question for the for the layer three and of course we can have little pieces here.

You can do the same version of the problem. Maybe you have a agent, a

problem. Maybe you have a agent, a reinforcement learning agent that is combining observations and experiments and then you can ask about the green there in the other side. Then every time that you are matching perfectly and you

have a question about red and you get data from red is not simple. We spend a few decades trying to answer that. But

still I would say this is the normal in some sense. We are interested in the

some sense. We are interested in the case about when moving across colors let's say without having the color measure quote unquote color. Um, now one example of a task that this could be

applied related to layer three. For

example, just a cute one. Um, I'm not a vision researcher, but just to show what would a person look like had they been dot dot dot, for example,

had they been 10 years older. Then you

ask the the AI to produce a human face.

And then you get this face and you say how the person will look like had they been 10 years older or older. And then

we get this person that is kind of changing the the gender as well.

Or you say how you ask give me a human face we get this human face the lady that is blonde and then you ask what would happen had the she be another gender and uh and then we got the guy

which seems consistent but is some kind of mismatch there at least doesn't look the pair of the person right it's a different age group I guess um and I'll skip here gray hair just for

the sake of time um this is kind of off in some way we don't have guarantee for connecting the P and the P prime in a But now what's going on in reality is that there is some kind of in the data

set there is some in the top right there there is some correlation between age and gender it's not perfect right the balance between age and gender as everything in the way there's no perfect

balance there um and but of course gender doesn't cause age or age doesn't cause gender biological gender here for the sake of discussion and age may be also but maybe have an effect on the

gray hair goes up amount of gray hair goes up probabilistically up to some point. Now the if you have causal maybe

point. Now the if you have causal maybe you can get something if you use this type of knowledge during the training process you can get something like that right you get a human face this seems to

be the older version of the fellow you get a lady this seems to be the male version of her right seems even some sort of twin brother and so on other implications

implications for fairness that we're talking maybe you have this picture this is from the literature from the input and then you ask me adding a little layer of language here, make them look

like um flight attendants and then we're kind of changing the gender of one because there's a correlation between type of jobs and gender. This is totally not expected or not desirable at least.

Expected could be or making them look like doctors and then you change the gender of the other to become a male.

Again, this is just based on the correlation between gender and job.

Examples are all all over the place. We

can slice in many different ways. uh the

correlation is the one that is driving embedded in some way in the latent space.

Um there are many other applications I would say I'll not go in details happy to to discuss offline or or give sometimes I give the talks about that the first one is about policy evaluation

and fairness analysis um this is just the data that we run that we got the data from the ICU the intensive care in Australia this is the map of Australia you got the the data of the whole

Australia this was amazing happy to talk with people that may have data to try collaboration hard they have the ICU admissions minor minority type there is the indigenous population. It's not

highly non-trivial analysis there to run and kind of detect that there is some type of unfairness relative to this population and uh and then there is some type of policy recommendation where the government and we are talking with the

government where the government should be installing the hospitals minority population given the discrimination tends to be going to ICUs to this a simplified version to solve basic

problems there's huge pressure in terms of cost and it's totally unnecessary in some way um but anyhow Applications of fairness can be for health disparities that we're talking can be criminal

justice or environmental other kind of problem that you have decision- making way. This is more about explanations and

way. This is more about explanations and understanding. The second column here is

understanding. The second column here is about um decision making. This is a case in kind of some type of cancer treatment related my memor is right to leukemia.

Um this is some kind of sequential treatment and the blue there is the outcome. Um

outcome. Um and here you are in the context of a trial randomized trial. The red curve here is

randomized trial. The red curve here is by Susan Murphy kind of uh uh algorithm to do treatment allocation. There's some

kind of way of doing randomization. Oh,

this is the number of episodes number of interactions and this is the cumulative hat. Higrat means uh oracle minus what

hat. Higrat means uh oracle minus what the the allocation procedure is doing.

This is some kind you would like to have something that is a log something like that that in the beginning there is some kind of exploration and later you kind of learn what is the right policy or treatment regime as it's called in the

literature in the medical literature.

Now this is Susan's this is ours which is good and this is the one that we are leveraging layer one data because this is layer two we are going there and doing the experiments and randomizing and deciding you get the treatment one

you get treatment two uh the the the green one is the one with observational data when you have data that is biased by the physician we can leverage this data and the problem in

reality is very easy right there's no regret here essentially the scale is tough but we do a little bit of experimentation in the beginning to remove the confounding and rate that we

can just exploit quote unquote in the language of RL. Um the the and the third one is about uh

counterfactual explanations um that um what I would say is like almost like how to open the the some kind of AI blackbox and are you able to distinguish uh uh

between the explanations right or why the the classifier or why the geni is doing something in a particular way. I

will skip that one just for the sake of time but that's the idea um very related not only to genai but algorithmic course or other kinds of technology in reality about explanations in reality if you go

to philosophy or cognitive science is related to causation right Lou says that explanation is the way of we are kind of tracing the causes that led to the particular event uh the cognitive

science says that humans like explanations that are uh more abnormal right we have a fire you can ask what led to the fire. You humans wouldn't say

that is the oxy oxygen that is there. We

prefer saying that there is this the match that someone did something with the match, right? And then this kind of led to this spark that led to the fire.

Of course, the oxygen was necessary in order to happen the fire. But then there is interplay between causation and abnormality or normality that led to what is and other features that led to

good explanations. Happy to elaborate or

good explanations. Happy to elaborate or talk more. Uh

talk more. Uh now what's our goal here? The the I'm almost done. The the the research

almost done. The the the research program is to develop more general and trustworthy types of AI endowed with the following capabilities. Causal

following capabilities. Causal understanding and the the ability of articulating explanations, being more efficient and precise in terms for decision- making, more

surgical. We don't want to have this

surgical. We don't want to have this quote unquote uh fat hand intervention that there's a complex system and you put their hand and you do a big big splash. You would like to be kind of uh

splash. You would like to be kind of uh surgical uh in a way. Um you like them to be more generalizable and robust.

Um you like to have I showed the example with the image same two language by the way. Um the we like to do more causal

way. Um the we like to do more causal and counterfactual type of generation even though most of the data is from layer one from observations.

uh and we like the model to do Jacqueline is working on that I believe the learning and discovery right there's this field of called uh structural learning or causal discovery that we are trying to learn about the structure of

the world uh and there is another one called coausal um um this entanglement or or uh representation learning that we're trying to learn what are the causal variables in the world many of

the problems if you're more in the sciences you already know the variables many problems in machine learning you have the representational challenge because we're getting data from different modalities from pixels from

text then the variables are not cut yet then it's an interesting problem both of them is this model learning and discovery would like to share here I'm start sharing maybe one week ago or 10 days

ago is the first time that I shared this is almost the beginning here the this is a textbook that I have been written a few years that I'm teaching this class uh called causal AI you check it out

it's two draft called causal AI book.net uh is slides to come um and uh it's kind of very nice project. I'm very happy to

be able to share after uh many years and uh happy to talk about that if you have instructors or or or happy to talk with students share with your colleagues and

so on. I'm still doing the uh slow prof

so on. I'm still doing the uh slow prof slow profile here and it's getting a lot of feedback and still changing. I change

every week by the way if you don't read now don't download it. download one week from now because every week I'm doing at least once a week a change given the feedback that I'm receiving there is

typos and um anyhow this is huge collaboration I'm just the one sharing here the the what we have accomplished but thank you uh for the the the funding

also from many people and uh and the left is more my group at Colombia and the right is the external collaborators uh happy to get questions uh thank you Nice.

Yes.

I think so. This is the instructor instruction. All right. Great. Let's

instruction. All right. Great. Let's

follow the protocol, I guess. Uh, great

talk. Really thoughtprovoking. Thank

you. Um, I have I'll take your aspirin um example as a showcase. When we say, "Oh, as we take aspirin and then I don't have an headache anymore." Most people

won't count that as like uh a mechanistic understanding of what's going on there, right? I think it's kind of like rel sort of levels of abstraction into like how can you say

what does X do to cause Y? Is that

something that like you've considered?

And I guess the other part of it is that like different people will tell you different answers to this. If you ask like cell biologist, they'll say something about how the T- cells are interacting. A biochemist will tell you

interacting. A biochemist will tell you something about proteins that are binding and not binding, right? Okay.

What does that come into play in the pro hierarchy and how do you think about something? Yeah, beautiful, beautiful.

something? Yeah, beautiful, beautiful.

Thank you for the question. the

yeah the at the moment oh let me do the my at the moment um whatever everything that I talk here in the formulas there's this thing called the endogenous

variables endogenous var that is the variables that you observe at the moment and all inferences are at the level of description of v v1 v2 vn and u and a lot of the problems are done

by the exogenous that people that you don't people variable that you don't measure that's called UN an observed U1 U2 different type of unobserved an observer that's very peculiar in this case or very specific in this case that

there is an obser a variable U that's affecting more than one observed then I have V1 the the the aspirin and the headache and maybe there is another thing that is confounding this

relationship and making some people take the aspirin and some people get the headache the which is not causal we like to know if it's the V1 that's affecting V2 or not this third I the age the

gender the social whatever other variables that could be in this case but going back the then everything that I'm saying is operating at I I said or there's a claim those this is the level

of abstraction that you operate in V1 V2 VK VN the but now there is another dimension that I couldn't talk today but it's very nice very related what you're saying is about can you operate at

different levels of descriptions can I kind of build abstractions on top of that uh I don't remember the chapter maybe chapter 15 or 16 in the book but it's about causal abstractions how you move across different levels of

descriptions. Um another example you are

descriptions. Um another example you are depending on the case you're operating on diet. Now depending on the case you

on diet. Now depending on the case you operate in the level of calories but now calories is abstraction from the amount of carbs fat right and protein the in the world or cholesterol. At some point

in the literature people talk about total cholesterol. This is a bad measure

total cholesterol. This is a bad measure because in reality we're combining LDL and HDL. that have completely opposite

and HDL. that have completely opposite type of mechanism. Then the idea of moving across different levels of abstraction is kind of beautiful question I would say not fully resolved but we have some there's a chapter there

and there's some kind of understanding about how to do that and the punch line is like it depends on which kind of inference do you want sometimes is okay operate on the calories sometimes you

need to go to the other type of more fine grain and but how to navigate across this other kind of hierarchy don't it's a topic for of study but

thanks beautiful Yes. Yeah.

Yes. Yeah.

Um I guess I know in like observational causal inference or assumptions that kind of need to be satisfied to like identify things in general. Um I'm

guessing or I'm curious about your thought on like in these super dimensional scenarios like I don't know sequential decision making with a giant language

model. Um do you think that it's

model. Um do you think that it's testable or likely or how to evaluate I guess that there

are other things that could be making up for the lack of identifying assumptions being satisfied like functional constraints or things and sort of what is the right way to think about moving forward in these cases where we think

that stuff is very likely not to hold but Perhaps we could be accidentally doing something smart and is there a way to test that? Yeah, beautiful. Um, and I would say the question is in the frontier, right? We I wouldn't say that

frontier, right? We I wouldn't say that we have a broader uh full understanding about how to work when the one way of paraphrasing the assumption if I understood is that which

kind of statements can you make when the assumptions required for the inference do not we cannot accept them right it's hard to say that in general in general the I would say there is good amount of

recent work for the the last few years I I guess three or so that is related to a building block to answer The general answer is like doing some type of

not well defined but let me use the word some type of causal sensitivity analysis and try to understand the robustness of the conclusion relative to violation of asam. This would be the dream or the

asam. This would be the dream or the very good way of doing that. uh building

block for that is there's growing literature literature on that is something called partial identification or bounding that is exactly uh

identification in reality is almost like a CLT central this is not CHT now the hierarchy theorem the central limit theorem or some kind of notion of consistency once the data is growing I

would say in layer one right in the observational if it grows I have zero data in layer two that is the intervention but eventually things will converge and we can discuss about which kind of convergence but this has

happened. This is what identifiability

happened. This is what identifiability means. Now the but this means point

means. Now the but this means point identification meaning it is converging in a very tight way. Now once we have a nonidentifiable case the natural thing

to do and hard to get the results but it's like can you do some type of partial identification or bounding within c certain interval I know that my effect is within this bounds then how to

do that in generality and scale that's a t technical challenge but there are some kind of new results that I think uh could be exploited could be built on to

to answer the general question in terms of language model at a crazy scale right at a millions of variable but I would say it's a interesting open question how to execute that but I think you have

some of the building blocks to start thinking about it the ref a reference I would say the I think section 5.6 six in the book. You

can drop me a line and the book apologies for not having it because it's new and the book has I think 20 chapters or 21 then I'm and

it's changing then I'm not um the the other one is like a technical report that's a paper in ICML a few years 78 in my website it's the the

paper that starts doing that yes one more maybe short yes no question it's a very interesting talk if I if I had a magical way of constraining the

gener generative models I'm training to actually be causal. Suppose I could do that. Um, seems intuitively that I seems

that. Um, seems intuitively that I seems intuitively that I might need much less data to train them because there's a constraint, right? Is there any

constraint, right? Is there any theoretical or empirical support for that? That's the yes no question and

that? That's the yes no question and you're out of time. So, it can be just yes or no and we can probably no it's open. I think it's okay.

open. I think it's okay.

Yeah, thanks. Thanks.

Happy to talk more as you any Yes. Hello.

any Yes. Hello.

I think you can click there. Yes. Yes.

This one I think. Yes. My question.

Yeah. It's related to the challenge of scale that you'd mentioned before. Uh I

think one issue is sort of how to how are you? I think we we map already. Yes.

are you? I think we we map already. Yes.

Oh yeah. Hello. Yeah. Nice to meet you again. Um how do we sort of bring this

again. Um how do we sort of bring this like causal understanding to large scale systems? Because I think most of the

systems? Because I think most of the literature at least in causal discovery and I'm not sure about the other areas of causality has been focused on very small scale systems and a lot of the

methods um they face this challenge of scale. So I was wondering if you have

scale. So I was wondering if you have like yeah I don't I don't think that we have a magical way of moving results to to the large scale setting yet. The

question is more is not even only that it's about there is something special going on with language or vision. And I

show it's like fragment of something computer vision but language of vision we are moving to a different type of abstraction in some way then it's almost like things are happening in the

how the person will look like they being 10 years older or this is happening at the factual at the causal variable level and you need to translate that to the

pixels or you need to translate that to tokens and word in the text and then there's some kind of t tension between what's happening in the world than this level of description this

related even more to the abstraction this type of abstractions and tying that is I would say work in progress we didn't have it fully tied at first the dream will be or one of the dreams but I

think a big one is like can you have a console uh despite of the claims I believe that we're not there totally but can you have some type of clos

that has some type of closer model and then we want to have a despite This is very exciting. We is a big suit

very exciting. We is a big suit many models with many people that are describing stories and things about the world and this is not how we operate

level and what can be done without a model tough. Now how to distill the

model tough. Now how to distill the model how to nail that clear right it's a but the first step is even to understand what is a model because

otherwise you're just in the soup right in some way and the challenge of the mass is also related to the question of bonding okay it's about can you provide it will work in many cases yes it can

work it's like a little data I don't think it will solve all the problems but the question is about it's very high have mathematicians in the room can you computer scientist can you provide any guarantee that the thing is not going

completely off tangent or doing something crazy and without the model I think there's zero chance that you can provide a guarantee you can put patch bandaids and patch things here and there

you provide them real guarantees I think this should be the goal one of the so something like finding the right level of of abstraction and then also bounding the error rather than sort of going for

perfect providing guarantees or some kind of certificate about we understand under what condition something that's happening, right? It's not totally a

happening, right? It's not totally a dark magic type of emergency, full emergency type of thing. Okay. Thank

you. Thank you.

I'll take it offline. Yes.

Loading...

Loading video analysis...