TLDW logo

China is winning the AI race

By Theo - t3․gg

Summary

## Key takeaways - **China dominates openweight AI models**: Top openweight models like Kimmy, Deepseek, and Mini Maxm2 are all Chinese, wiping America off the chart, while the first US model GPTO OSS120B is rough and lags far behind in tool calling reliability. [00:16], [00:42] - **Openweights enable multi-provider competition**: Closed models like Gemini are only on Google infra, but Deep Seek 3.2 EXP and Kimmy K2 are hosted by Deep Infra, Novita, Silicon Flow, and many others, driving 10x throughput gains like Grok's 356 tokens/sec vs Moonshot's 20. [09:34], [10:31] - **Chinese labs release openweights for trust**: Nobody trusts Chinese-hosted models due to government risks and data security fears, so openweights let Americans self-host and gain mindshare despite US ban attempts on downloads like Deep Seek R1. [14:39], [15:16] - **DeepSeek's research accelerates global AI**: DeepSeek published 12 papers last year on FP8 training that every lab adopted, fueling much of AI's speed gains this year, all released freely despite no direct revenue. [12:06], [12:19] - **OpenAI targets runnable consumer hardware**: GPT OSS 120B and 20B are sized for single beefy GPUs or laptops, crushing competitors in that box with 80 TPS on MacBook and 650 TPS on providers, unlike massive Chinese trillion-param beasts. [25:19], [29:40]

Topics Covered

  • China Dominates Openweight AI
  • Open Weights Equal Open Source
  • Open Weights Bypass China Distrust
  • US Excels at Consumer Open Weights

Full Transcript

If you look at the current top models, they're all from America, Google, Anthropic, and OpenAI. We are clearly winning the AI race until you zoom out a little bit. Then you see a lot of these

little bit. Then you see a lot of these blue bars appearing in the chart. Those

blue bars are for openweight models. And

if we look at the top three, Kimmy, Deepseek, and Mini Maxm2, you realize that China's winning the openweight race and by quite a bit. The first model from

the US to appear here is GPTO OSS120B.

And as a person who's used that model quite a bit, it's rough. It might score well on intelligence charts, but its ability to reliably call tools and be used in your workflows is nothing in

comparison to what I've experienced with Kimmy, with Miniax, and now with Deep Seek V3.2.

Huge gap between those. And if we want to look at the European introductions like uh Mestral Large 3, which just dropped and kind of inspired this video, they're barely even on the chart. Things

are rough. Oh, and almost forgot, much like they seem to have, Llama 4 all the way at the end here. There were supposed to be three versions of Llama 4. If you

remember, it was supposed to be Scout, Maverick, and I forgot the name of the larger one because they never put out the larger one because they all suck so bad. There's a 20 bill per pram model

bad. There's a 20 bill per pram model from OpenAI that beats out Mestral and there's lots of 15 bill ones that do too. It's rough out there. But the thing

too. It's rough out there. But the thing I really want to focus on today is the open weight wars and why China seems like they will be winning them for the foreseeable future. It's kind of crazy

foreseeable future. It's kind of crazy that when you only talk about models that weights are downloadable and usable, all of a sudden America gets wiped off the chart. There's a lot of reasons for this and I can't wait to

talk about them, but since openw weight models don't pay the bills, we're going to do a quick sponsor break first.

Here's a hard question for you. How do

you know if an engineer is actually good? It's really hard to do. You might

good? It's really hard to do. You might

be able to look at their HTML shirt and make some assumptions, but when you're doing an interview, especially when you're reading someone's resume, how do you know they're actually good and not just AI generating some slop that you're going through a huge pile of as you fill

this role? It's never been more annoying

this role? It's never been more annoying to hire good engineers. I feel like there's fewer of them in the pile and the pile's never been bigger. If you're

tired of trying to find the needle in the hay stack and get a good engineer to work for you finally, you got to check out today's sponsor, G2I. These guys are without question the best way to hire good engineers fast. They have over

8,000 of them ready to go in their incredible network. These aren't people

incredible network. These aren't people who are fresh out of college. These are

real experienced engineers that have worked at big fang companies and small startups alike. know how to use all the

startups alike. know how to use all the tools you need, already are familiar with fancy AI development stuff, so they're not going to be slow. Whether

you want a couple junior engineers to kickstart a new project, or a lead that can dig you out of tech debt hell, they have you covered. You create a shared Slack channel with them. They

effectively are operating like your recruiting team. You give them a handful

recruiting team. You give them a handful of questions to ask the engineers. They

ask the engineers and record actual video responses from them so you know what the person's actually like. You go

through them, figure out the ones you want. They'll then go do a technical

want. They'll then go do a technical interview that they've speced out so you don't have to worry. Record it, send you the results, and once they've gone through all that, you can review it, figure out who you think fits best, and

then you can hire them. They're also

just super generous to work with. I've

referred a lot of people on a personal level, like a lot of the YC startups I work with, and every single one has had an incredible experience with G2I. Stop

hiring the old way and stop wasting your time. Get good engineers fast at

time. Get good engineers fast at soyv.link/g2i.

soyv.link/g2i.

Before we can discuss why China's winning so hard, it's important to understand what is an openw weight model. The point of openweight models is

model. The point of openweight models is somewhat similar to open-source models where you're giving out a significant portion of how the thing works. There's

a big difference between openweight and open source though. With open source, the code that is used to create the thing people experience is exposed. With

Linux, for example, the actual thing you download isn't the code when you're using Linux. What you download is the

using Linux. What you download is the binary that's compiled by the code. The

code is the input that results in the output that you are downloading and using. As such, I've seen a lot of

using. As such, I've seen a lot of people complain that openw weight models aren't open- source because you can't recreate that binary yourself. And I

don't really agree. Obviously, it would be cool if we had all of the training data and everything else that went into how the models were made. But the reason open source is valuable is because you

can reproduce the actual output. I can

take the code and on my computer compile it and get a result. There are almost no consumers, almost no developers who have all of the things that are necessary to

spend the millions of dollars on really hard to get infrastructure to turn that data into a model. There is no real reason to expose that and there's a ton of risk and liability if you expose all

of the data that's used in your training and every other company's just going to take that data, throw it into their data sets and suddenly be able to beat you in all of the things you're good at.

Depending on how you cut the lines and think about it, I would argue the data is almost the equivalent of the engineers in this case, not the equivalent of the source code. People

think about source code very specifically because you use the source code to compile the thing that you want and as such we should have the data if we want to call these models open. I

would argue we're just drawing our lines a bit differently. So in open source a developer creates source code that compiles into a binary that users can

use. In open weights data is used to

use. In open weights data is used to train weights that result in generated tokens.

If you think of it this way where you're drawing the lines here, you say like the researchers collect the data that creates the

weights that generate tokens. I would

see why you would call this not open.

But I don't think that is quite how it works. I think of it this way where the

works. I think of it this way where the data is similar to the developer in the case of the weights because it creates the thing that we can use to generate the thing we actually want which in this

case is the binary it's the Linux installable the actual kernel and in our case with models it's the tokens that we get from using the model. So the open

weights mean you have all of the pieces you need to run the model and generate results with it yourself. The weights

are the collection of parameters that are all mapped to and point to each other. So when you give it some text, it

other. So when you give it some text, it can guess what the best next token would be based on the text you give it and this giant hundreds of gigabytes pile of vectors and data that it has collapsed

into this model that it can use to generate the next token as predictably and reliably as possible. I think it's really cool that open weight has gone as far as it has. And I think it's really

convenient that openweight models can use the same licenses that open source code can. I already see people

code can. I already see people disagreeing in chat. I don't care. The

weights are not the binary. The weights

are a thing that can be reused and modified in very useful ways. The nature

of how baked these things are. Like

compiling code costs pennies and can be done on most computers. Turning data

into weights isn't even a deterministic process. And I know there's a lot of

process. And I know there's a lot of debates around this. I know there's a lot of things that like Richard Stallman's going to disagree with me here on. I don't really care. This all

here on. I don't really care. This all

comes down to whether you put this here or here. And I'm not one to [ __ ] when

or here. And I'm not one to [ __ ] when we get something as cool as openweight models. There's only one lab I know of

models. There's only one lab I know of that actually puts out the data and it's Allen Allen Institute. They were funded by Paul Allen from Microsoft as an

attempt to do truly open AI research in the US. And their models aren't just

the US. And their models aren't just open weight models. Their models also have the data exposed too. So you could hypothetically retrain the model on the data yourself. None of it's

data yourself. None of it's deterministic enough that you'll get the exact same weights. But yeah, it's exists. If you're wondering where this

exists. If you're wondering where this falls in the charts, right next to llama, not great. So, it's

cool that we do have a fully open lab that is sharing the data and everything that is based in the US, but they're not really competitive. Just wanted to call

really competitive. Just wanted to call that one out quick. The harsh reality is if we use the strict open- source definition that currently exists for code, there will never be a model that

meets the definition of open source. And

I agree, there probably won't be. And we

shouldn't use the term open- source to describe models. Open weight is still a

describe models. Open weight is still a very cool and useful thing. So with an open weight model, the value you get out of it is I can take those weights and run them on my own hardware or look at

different providers that are hosting them as well. If we go to something like open router and take a look at a Gemini model like Gemini 3 Pro preview, you can

use it in two places, Google Vertex and Google AI Studio because the weights for this model have never left Google's campus. The weights that you use to run

campus. The weights that you use to run these models and generate these results are exclusively provided through Google's own infrastructure because they want to sell you the API, not the model.

And since Google has their own infrastructure, they don't let other companies have access to this except for Apple privately potentially with a really really big pay deal of like a billion plus dollars to get the weights

privately that they can use for some Siri stuff. If you look at something

Siri stuff. If you look at something like OpenAI's GPT 5.1, your options are OpenAI. Some of these models are also

OpenAI. Some of these models are also available on Azure too, but that's it due to the OpenAI Microsoft partnership.

Let's compare that to Deep Seek 3.2 EXP.

We got Deep Infra, Novita, Shoots, Silicon Flow, and Atlas Cloud. Let's

look at Kimmy K2. Kimmy K2, Shoots, Silicon Flow, Novita, Deep Infra, Parasel, Bite Plus, plus seven more.

Moonshot. These are people who actually made the model. They are the eighth option in this list. Fireworks, Atlas

cloud, base 10 together, Grock, and Turbo from Moonshot.AI. Also notice the Turbo option for Moonshot, which costs $8 per million out, is less than half

the speed of Gro's solution here, and 8x the latency, too. Kind of nuts. The open

weight models allow for various providers to offer them, which allows for a different level of competition across infrastructure solutions. That is

really, really cool. But it does also mean that the official infrastructure in this case for Moonshot isn't really a great option. Moonshot charges 60 cents

great option. Moonshot charges 60 cents per mill in and 250 per mill out for under 20 tokens per second. Grock

charges a dollar per mill in and $3 per mill out. So slightly more for 356

mill out. So slightly more for 356 tokens per second. That is more than a 10x increase in throughput for a very minor bump in cost. This is the difference. When you have this type of

difference. When you have this type of competition, the value prop of your own infrastructure goes down, which makes it a lot harder for a company like Moonshot to make money on the Kimmy models, even

though they are fourth on the artificial intelligence chart. Google is a trillion

intelligence chart. Google is a trillion dollar company. Enthropic is a

dollar company. Enthropic is a multi-billion dollar company, potentially worth trillions someday.

OpenAI is already worth half a trillion dollars. Kimmy K2 Thinking by Moonshot

dollars. Kimmy K2 Thinking by Moonshot is a small company in China that isn't making real revenue yet. Do you know what's really funny though? Do you know which of these four companies has been

the kindest to work with for me as a creator? Moonshot. They've been trying

creator? Moonshot. They've been trying really hard for me to give them a mailing address so they can ship me a care package. They've been awesome to

care package. They've been awesome to work with. They always hit me up early.

work with. They always hit me up early.

They offer me free inference for any tests I want to do. They constantly send me useful resources about the things I'm talking about. Moonshot's been a

talking about. Moonshot's been a genuinely awesome company to work with and they even shout out their competitors when they have big launches.

Like when Zai had a big release, they immediately went and supported them.

They're a very good faith player, weirdly. So, Deepseek is very similar in

weirdly. So, Deepseek is very similar in this regard. Not in the com sense. Like,

this regard. Not in the com sense. Like,

I've never heard from anybody at Deepseek. By the way, Deep Seek guys, if

Deepseek. By the way, Deep Seek guys, if you want to hit me up, I'd love to chat.

Very, very big fan of what you did. I

would never have built T3 chat if it wasn't for Deep Seek V3 at the end of last year. I'm so impressed with the

last year. I'm so impressed with the work that DeepSeek has been doing for a while now and their research is incredible. They put out 12 papers last

incredible. They put out 12 papers last year that were so far ahead of where everyone else was. And the discoveries that made FP8 training much more reliable resulted in every lab

fundamentally changing how they did training. You could argue that a large

training. You could argue that a large portion of the speed that AIF accelerated this year came from the research DeepS put out for free last year. And yes, I have also talked to the

year. And yes, I have also talked to the ZI guys. They've been great. They've

ZI guys. They've been great. They've

been really, really awesome. It's crazy

how good at comms the Chinese labs have been with me at the very least. Openai

has been really good. Google's up and down. Enthropic is interesting. But my

down. Enthropic is interesting. But my

experience with the Chinese labs has been really good as a journalist, so to speak, covering these things publicly.

But none of that answers the question, why do open weight? Why are these companies releasing these models in a way that they make no money off them?

Like the real winner whenever DeepSeek drops isn't DeepSeek. It's companies

like Grock and Together and all these like cloud info providers that will host them for us. We currently don't have Deepseek version 3.2 like the final official version on T3 chat yet because

none of the providers are doing it well enough just yet. I would even argue being open weight makes things much harder for the labs even outside of the costs since Kimmy K2 is available for

anyone to host themselves. different

hosts aren't necessarily hosting it properly in the quality of certain behaviors like tool calls might go down meaningfully depending on which host you're using. Kimmy actually went as far

you're using. Kimmy actually went as far as creating the vendor verifier where they rank all of the companies hosting their models based on how reliably they do tool calling. These are all of the

companies that they say are hitting over 73%.

And if we scroll down, you'll see others not performing quite as well. It's cool

that they're doing better now because previously the gap was a lot bigger. But

by creating this bench and making this data public, they incentivize the hosts to fix their [ __ ] and also gave them the tool called eval python file that they can run against their own infra and find

the bugs and fix them. Doing this type of thing is really really annoying but they are doing it because otherwise the reputation of these models will be hurt as a result of other labs and other

hosts not hosting these things properly.

It's a small thing, but I also love they're using UV. Like, these guys get what US developers are expecting. So,

it's clear that doing open weight is harder. It makes it so you make way less

harder. It makes it so you make way less money. Why the hell are they doing it?

money. Why the hell are they doing it?

To be frank, nobody would trust them otherwise. If

you're using a Chinese model and it's being hosted in China, all the data in and out is now at a real risk, like a very legitimate risk. A lot of these

companies have Chinese government hands in them. There is no security team in

in them. There is no security team in the US that would approve of you using a Chinese model from Chinese infrastructure. And open weights allow

infrastructure. And open weights allow them to be relevant in the space right now. The fact that I'm legitimately

now. The fact that I'm legitimately considering doing more work with Chinese models as an American shows that the openweight strategy is working for them because it's the only way they can hold

any mind share in the US. There's even

been attempts to ban the use of Chinese models in the US. When Deep Seek R1 first dropped, there was a huge freak out about that. And the government here was actually considering passing

legislation that would make it illegal to download the weights. Wild. Insane. I

have files on my computers that would suddenly become illegal if that crazy proposal was to actually go through.

Absurd. So, this is like seriously the only way these Chinese labs will be taken seriously. And this goes a lot

taken seriously. And this goes a lot further than language models, too. It's

the same deal with a lot of their image and video generation models as well.

These models are not something that you'd want to run out of China, especially because they have restrictions on what GPUs they're even allowed to have access to. So, you might not be able to run some of these models

they're making on infra and the infra they have is limited to the use cases that they are using for which is mostly training. There's a whole culture around

training. There's a whole culture around getting cheaper GPUs and adding more VRAM to them in China in order to get around these import restrictions, which

is kind of crazy. All of this results in these models only being viable if they are released in a way that we can host them ourselves and use them ourselves.

There is no reason to make a great model in China and not release the weights because you won't be able to make money off it anyways right now. And this makes

these companies go from entirely ignored here to genuinely very relevant to the conversations we're having. The research

that kicked off a ton of this AI boom is the attention is all you need paper from the Google research Google brain deep mind team over at Google that was all

about the transformer model that allowed for us to create language models as we now know them. This then went further with OpenAI's follow-up research, improving language understanding by

generative pre-training. These two

generative pre-training. These two papers kind of kickstarted what we now know as AI. And these are open papers where they published what they did, how they did it, how they got there, and

what it could do. Hypothetically

speaking, any one of these labs could have sat on this information, not published it, and went and made crazy things with it. But then other companies wouldn't be able to innovate further.

Like if Google didn't release this paper, OpenAI wouldn't have had the kickstart that they needed. And if

OpenAI didn't follow up with this paper, we wouldn't have GPT as a concept. Or

maybe somebody else would have come up with it eventually. But if these were all private innovations that each lab was hopefully coming up with itself, the likelihood that any of them progressed meaningfully is way lower. The culture

around sharing our learnings and understanding is rooted deeply in science and research. This is just how advancements happen in technology. On

one hand, this does remind me of the open source world, the way that we're all building on top of each other. But

on the other hand, it's not truly traditionally open because we're spending tons of money doing this research and work and only publishing the things that we think are worth publishing and sharing and don't screw

our competitive advantages. Back when

nobody had working AI, sharing all of this made a lot of sense. Now that the American labs are in a cutthroat race competing with each other, their willingness to share has gone down a

ton. It's silly, but the first like cool

ton. It's silly, but the first like cool thing I've seen for different labs supporting each other in America in 2025 was when Sam Alman tweeted that Gemini 3

seems like a good model. Other than

that, I have not seen much in terms of good faith operations between executives at Anthropic, Google, and OpenAI.

There's just very little collaboration happening at this point because they're too busy trying to fight each other.

Meanwhile, Deepseek breaks everything again with V3.2 getting crazy scores, especially on tool calling stuff. And ZI

is right here in the replies legend heart. Like this is a whole different

heart. Like this is a whole different world. This is what the research was

world. This is what the research was like here before the competition started. We operated like this in the US

started. We operated like this in the US before where these companies were supportive of each other. Now that

they're all cutthroat trying to win this economic race, they're not as willing to collaborate and they're much more skeptical of things like distillation, people using their models to generate a

bunch of synthetic data to then retrain their own models with. In fact, a lot of them are accusing companies like DeepSeek of doing this with their data.

There was a point where certain Deepseek models, if you ask them what model are you, they would say chat GPT because they had data in their training corpus that came from those American models. If

Chinese models want to win, they have to be open because otherwise we won't use them. If Chinese labs want to be

them. If Chinese labs want to be competitive, they have to collaborate because we still have a lead there. And

there's one last piece that I haven't dove into much yet. I think this will make it into the video depending on how angry chat is. This is going to be fun.

Don't get too mad, boys. China sucks at writing software.

They're not quite as bad as Japan, but they're up there. China's incredible

manufacturing. They are surprisingly competent at research.

Chinese software, from my experience, is so atrocious that they end up spinning satellite companies up in the US so they can hire

United States-based software developers to make software that works. A

significant portion of Tik Tok's development happens here now because we have better engineers in the US. The

same way that a significant portion of manufacturing happens in China because they're better at it. Software

development happens in America because we're better at it. People are mad about the Japanese one. I don't care. Sony's

even accepted that the PlayStation software is an untenable mess and has fully outsourced it to the United States. They are hiring consultancies in

States. They are hiring consultancies in the US to save the operating system for PlayStation because they are so bad at software. great at research, great at

software. great at research, great at manufacturing, pretty good at logistics, not capable of writing software. A

significant portion of why I made T3 Chat is that as much as I hated the Claude interface and the chat GBT interface, the Deep Seek 1 was actually unusable, entirely unusable, miserable

to touch. And I wanted to use the model

to touch. And I wanted to use the model somewhere better. And I made T3 Chat

somewhere better. And I made T3 Chat kind of as a pun on V3 Chat because I like Deepseek V3 so much and I wanted to have a better interface for it. The

Chinese labs cannot compete on the software side. And this is where most

software side. And this is where most people come in. Most users don't see this new model came out and then go download the weights and try running it on their local GPU. Most of these

weights cannot be run that way because most of them are too massive to run even on like a high-end local GPU. You're not

going to run Mini Max or Kimmy K2 thinking on your RTX 5080 anytime soon.

So the average consumer wouldn't have done that anyways even if they could.

They're going to go to the app store and look up the app and the Deepseek app will never even come close to the apps by the American labs or even by a third party like Perplexity or like us with T3

chat. So they cannot win on the top

chat. So they cannot win on the top level where people are adopting the thing. They cannot win on the API level

thing. They cannot win on the API level because no American companies are going to use their APIs. So they have to go even deeper. They have to provide the

even deeper. They have to provide the models so they can win at that level some amount and we can build everything else the way we want to on top. So what

about America? Can the US make a comeback here? Can we somehow get back

comeback here? Can we somehow get back in the actual ring with openweight models? The only major US-based

models? The only major US-based openweight model to come out this year has been the new models from OpenAI. GBT

OSS120 is the fourth best performing openweight model according to artificial analysis. That doesn't sound great. Like

analysis. That doesn't sound great. Like

it's open AI. They're a half trillion dollar company. How are they not

dollar company. How are they not competing with these small Chinese labs?

It's not because they don't have the resources to do it. It's because of this. The 12B Mini Maxm 2 is 230 billion

this. The 12B Mini Maxm 2 is 230 billion params, almost double. And according to artificial analysis, it gets the same

score. Deepseek 3.2 is 685 billion

score. Deepseek 3.2 is 685 billion params, 5x the amount on the OpenAI open weight model. Kimmy K2 thinking is a 1

weight model. Kimmy K2 thinking is a 1 trillion parameter model. Almost 10x the OpenAI model. None of these can be run

OpenAI model. None of these can be run on machines that you have in your house.

That is way too much memory to run any of these things. The 120B model can max out my RTX 5090. None of these other ones are going to fit on your GPUs.

We're talking 500 gigs of VRAM to run K2 thinking. The strategy that OpenAI took

thinking. The strategy that OpenAI took here is one that I actually commend. I

was at one of the original listen group session things they did with developers who wanted the open weight models and they actually like had Sam Alman come in and talk to us a whole bunch. It was

genuinely really cool and we got to ask a ton of questions. In Sam's opinion, the only reason you would want an open weight model when there are good APIs with closed weight models is because you want to run it on hardware you own. I

think there's a real value in the competition of different providers hosting models that makes something like certain Kimmy models or DeepSec models really fast on certain providers. But

for the most part, he is right. If

there's a model from a lab you trust that's hosted in places you trust, the only reason you would want it to be open weight is so you can run it yourself.

It's not going to be cheaper to spin up a bunch of servers that you need to spend hundreds of thousands of dollars in GPUs on than it is to just hit an API from somebody who's already doing it. It

would be way cheaper to run it on your GPU in your house, though. It'd be even cheaper to run on your laptop with a much smaller model. So, they were trying to figure out what sizes to target based

on what we wanted to run them on. Every

other lab seems focused on how much money do they have and how smart a model can they generate, not thinking about how big or small is the model going to be. They're much more so thinking about

be. They're much more so thinking about how they can win and have the best scores possible. Open AAI already knows

scores possible. Open AAI already knows they have the best scores possible. They

don't want to make the openweight models do that because then they're just giving a free win to all their competition, but they do want to play in the open model space. They do want to give models to

space. They do want to give models to people who want to run them on their own hardware. And that's why they put out

hardware. And that's why they put out two models, the 120B and the 20B. The

120 bill you can run on a single beefy enough GPU. And the 20 bill you can run

enough GPU. And the 20 bill you can run on a modern enough laptop with a real GPU in it. That was a very specific decision they made rather than how good of a model can we make possibly. They

thought about this as given these two performance targets, how smart can we make something that fits within that box? And they have crushed that box.

box? And they have crushed that box.

There is nothing that comes close to GBT OSS120B within those performance constraints.

This is a really good angle for American companies to compete in the openw weight space and I am thankful somebody actually took the time to do it this way. I can use these models for real

way. I can use these models for real things. And since these models are

things. And since these models are lighter and easier to run, some of the speeds these companies are getting out of them are nuts. GBT OSS120B

is pulling on Parasale 300 TPS on Somanova. It's pulling 650

on Somanova. It's pulling 650 on Grock. It's 550. That's crazy.

on Grock. It's 550. That's crazy.

There are models that are pulling 10 tokens per second. This is 55 times faster than some of those models. And

many of them are dumber, too. The fact

that these models are usable on consumer hardware and can perform that fast in certain cases and are actually useful for things is incredible. And I see why

this is the angle OpenAI took. OpenAI

was not interested in making a model that competed with GPT5 or 5.1. They

were interested in making the best possible thing you could run yourself.

These Chinese labs aren't interested in making things you can run yourself. If

it so happens that you can, that's a cool side effect.

There are two customers of open weight consumers and enthusiasts and info providers.

OpenAI is an info provider. That's how

they make a meaningful amount of their money about 20 to 30% depending on the month. They don't want to or they don't

month. They don't want to or they don't want more info providers. Enthropic you

can use their mo with enthropic you can use their models on Google and AWS and now also on Azure. OpenAI was only usable on OpenAI's infra until somewhat recently where they partnered with Azure

and Microsoft during one of their crazy finance rounds. So Azure can now host

finance rounds. So Azure can now host some OpenAI models. Meanwhile, all of the Chinese labs are hostable on almost all of those providers and a bunch of other places too. Most of the Chinese models we're talking about cannot

reasonably be hosted by a consumer or an enthusiast. And people are wondering

enthusiast. And people are wondering about how much RAM does 120 bill per RAM mean. It doesn't mean 120 gigs. In this

mean. It doesn't mean 120 gigs. In this

case, it means 80 gigs of VRAM.

Supposedly, you can get away with 60 fine. And honestly, you can get away

fine. And honestly, you can get away with a little less in certain cases, but you can run the 120 bill model on a computer with enough VRAM. Yeah. So,

this laptop's 128 gig. Let's set up LM Studio and try it quick. One of the cool things about Apple Silicon is that the GPUs and the CPUs are sharing memory.

You don't have separate VRAM from regular RAM, which means I have 128 gigs of VRAM on this machine. Look, when you set up LM Studio, it tells you to use

GBT OSS 20 because it's one of the best small models. So, the 20 bill model is

small models. So, the 20 bill model is 12 gigs and the 120 bill model is 60 plus. So, I just got the GBT OSS 20 bill

plus. So, I just got the GBT OSS 20 bill running on my laptop here. It's a maxed out M4 Pro from Apple. M4 Max. Didn't

have the patience to wait for the M5. I

got it very recently. We got 128 gigs of RAM and we're using 12 gigs right now from the GP OSS20 bill model. Can I tell it to write me three poems about JavaScript?

And it is flying. That was 117 tokens per second on my laptop locally. Pretty

cool, right? Let's switch this over to OSS 120 bill. It's going to take a sec to load because it has to load that all into memory. And you can see my memory

into memory. And you can see my memory consumption going up fast. We're now at 59 gigs of VRAM being used for this.

It's total memory cuz Mac OS and Apple silicon using the same memory for both.

Send slower, but that chugged. The fact that I can get almost 80 TPS on a model that smart on my own computer. Do you

understand how [ __ ] cool that is?

This is what OpenAI's choice was. They

wanted to make models that you can run on real consumer hardware. And while

there aren't many good GPUs that you can buy and plug into your desktop that can do this because desktop GPUs have very limited VRAM, if you get a Mac Studio or a MacBook Pro with enough RAM, you can

actually do inference on it. That's the

difference. It's so different that we just got a crazy comment from chat.

Apple's RAM prices seem reasonable now.

They actually kind of are when you consider this and also the crazy squeeze happening in memory. This is cool. This

is really cool. The point of this distinction for me is very simple. Open

interested in helping the competition here because they are deep in that space and they see no issue with the current state of in providers in the US. The

Chinese labs can't provide their own infrastructure because nobody will use it. So they need other input providers

it. So they need other input providers to host their stuff. By doing openweight models, they can convince those labs to host their things. Consumers and

enthusiasts can't use most of those models because they're way too big, but they can use the two that OpenAI released. OpenAI is interested in this

released. OpenAI is interested in this space because it's one of the few that they weren't really competitive in and they wanted to to go back and win it and they did. The the goal of the GPOSS

they did. The the goal of the GPOSS models has been achieved. If you are running a model locally, there's a good chance you're using GPTOSS or you're doing something less optimal than you otherwise could. But that's not winning

otherwise could. But that's not winning open weights because that's not going to get you high up on the chart here. And

this is where my conclusion comes. I

don't think we're ever going to see an open weight model from the US win on this chart ever again. There's just very little incentive for labs to do it.

Meanwhile, the Chinese labs won't make it to America and they won't even be on charts like this if they don't do open weight. Don't think of this as why

weight. Don't think of this as why aren't American companies keeping up with these open weight models. Rather,

think of this as why do the Chinese labs have to do open weight even if nobody can use the things they're publishing other than like six companies. There are

very very few places in the world that can handle a one trillion parameter model. But they put it out anyways

model. But they put it out anyways because they need a way for American labs and American companies and infrastructure providers to use it.

There is some hope left, but it's dwindling fast because at the very end of this chart, we have Llama. Meta has

released all of their models as open weight. Historically, they were one of

weight. Historically, they were one of the first people doing good openweight models. In fact, the way a lot of people

models. In fact, the way a lot of people were using Deepseek models originally wasn't through DeepSeek. It was by using the Deepseek R1 model to do a finetune

on Llama 3. And a lot of us were using that Llama 3 fine-tune as Deepseek even though it was Llama bastardiz into acting like Deepseek.

There is some chance for Meta to catch up here, especially some of the hires they've made. But I just don't see it

they've made. But I just don't see it happening. They are so so far behind at

happening. They are so so far behind at this point. There's also some effort to

this point. There's also some effort to try and fund this type of research. like

the White House's attempts to fund open models and AI research in the US, providing everything from inference to power and paying for research to happen

here. This was published in July and I

here. This was published in July and I have not heard anything about it since there's also the potential security risk. The problem with openweight models

risk. The problem with openweight models in security is that you can't take it back once you put it out. If it turns out that you could use Deep Seek 3.2 to

to make a nuclear weapon. They can't

take it away. That issue now exists forever in the model in the weights.

Once it is published, you can't unpublish it. Meanwhile, if some issue

unpublish it. Meanwhile, if some issue was discovered with OpenAI's new model, you can add a layer in front to prevent it. If it turns out GPT5's weights are

it. If it turns out GPT5's weights are capable of telling you how to make a nuclear weapon, you can put a safeguard in front when the API request comes in and before the response goes out with those instructions, you can block it.

you can prevent it. Security risks,

copyright risks, all of these types of things are a lot easier if you can block the request in or out before it gets to the model. But once the model's out

the model. But once the model's out there, you've lost your ability to do this. So there's a huge risk and

this. So there's a huge risk and liability from the labs that are publishing these openweight models. It's

a lot more work to do it. It's a lot more work to do it right. But the

Chinese labs don't really care. Their

willingness to put out things that are potentially actual security risks is zero. They just don't give a [ __ ] They

zero. They just don't give a [ __ ] They put it out when they can win benchmarks.

The American labs have liability to worry about. They have expectations to

worry about. They have expectations to worry about. Investors, they don't want

worry about. Investors, they don't want to piss off. They have to go out of their way to make sure their models are safe and don't have potential copyright issues. And even if we start funding the

issues. And even if we start funding the creation of these openweight models here, the expectations that would be set on them from the government of them being safe, reliable, and hitting the

expectations of the American government is going to make it a lot harder to do, right? It's a lot easier to add these

right? It's a lot easier to add these things in front of the model than in the model itself. And when you are giving

model itself. And when you are giving the model weights out, you're giving up your ability to control what goes in. So

my conclusion is pretty clear. I think I do not see America competing in the open weight models that are only hostable via infra providers like these super giant

models. I don't see us competing there.

models. I don't see us competing there.

But I think we have a unique potential to win here. As more people get stronger computers with better GPUs as more consumers have more reasons to try out

these models, as Apple starts shipping models on our devices, as Chrome starts shipping models in Chrome itself, where there's more reason to run locally, there's a very, very good chance that

America can win with those. But we need an incentive if we're going to give out the weights. And right now there is not

the weights. And right now there is not much incentive for American labs to give models out for free to their competition. There is potentially

competition. There is potentially incentive to give us things that we can run on our own machines. So I don't see us winning anytime soon, but I hope we can win here. Let me know what you guys think. Am I way overblowing this or is

think. Am I way overblowing this or is China definitely going to be the winner of open weight? Curious how y'all feel and if you even care, let me know. And

until next time, peace nerds.

Loading...

Loading video analysis...