China is winning the AI race
By Theo - t3․gg
Summary
## Key takeaways - **China dominates openweight AI models**: Top openweight models like Kimmy, Deepseek, and Mini Maxm2 are all Chinese, wiping America off the chart, while the first US model GPTO OSS120B is rough and lags far behind in tool calling reliability. [00:16], [00:42] - **Openweights enable multi-provider competition**: Closed models like Gemini are only on Google infra, but Deep Seek 3.2 EXP and Kimmy K2 are hosted by Deep Infra, Novita, Silicon Flow, and many others, driving 10x throughput gains like Grok's 356 tokens/sec vs Moonshot's 20. [09:34], [10:31] - **Chinese labs release openweights for trust**: Nobody trusts Chinese-hosted models due to government risks and data security fears, so openweights let Americans self-host and gain mindshare despite US ban attempts on downloads like Deep Seek R1. [14:39], [15:16] - **DeepSeek's research accelerates global AI**: DeepSeek published 12 papers last year on FP8 training that every lab adopted, fueling much of AI's speed gains this year, all released freely despite no direct revenue. [12:06], [12:19] - **OpenAI targets runnable consumer hardware**: GPT OSS 120B and 20B are sized for single beefy GPUs or laptops, crushing competitors in that box with 80 TPS on MacBook and 650 TPS on providers, unlike massive Chinese trillion-param beasts. [25:19], [29:40]
Topics Covered
- China Dominates Openweight AI
- Open Weights Equal Open Source
- Open Weights Bypass China Distrust
- US Excels at Consumer Open Weights
Full Transcript
If you look at the current top models, they're all from America, Google, Anthropic, and OpenAI. We are clearly winning the AI race until you zoom out a little bit. Then you see a lot of these
little bit. Then you see a lot of these blue bars appearing in the chart. Those
blue bars are for openweight models. And
if we look at the top three, Kimmy, Deepseek, and Mini Maxm2, you realize that China's winning the openweight race and by quite a bit. The first model from
the US to appear here is GPTO OSS120B.
And as a person who's used that model quite a bit, it's rough. It might score well on intelligence charts, but its ability to reliably call tools and be used in your workflows is nothing in
comparison to what I've experienced with Kimmy, with Miniax, and now with Deep Seek V3.2.
Huge gap between those. And if we want to look at the European introductions like uh Mestral Large 3, which just dropped and kind of inspired this video, they're barely even on the chart. Things
are rough. Oh, and almost forgot, much like they seem to have, Llama 4 all the way at the end here. There were supposed to be three versions of Llama 4. If you
remember, it was supposed to be Scout, Maverick, and I forgot the name of the larger one because they never put out the larger one because they all suck so bad. There's a 20 bill per pram model
bad. There's a 20 bill per pram model from OpenAI that beats out Mestral and there's lots of 15 bill ones that do too. It's rough out there. But the thing
too. It's rough out there. But the thing I really want to focus on today is the open weight wars and why China seems like they will be winning them for the foreseeable future. It's kind of crazy
foreseeable future. It's kind of crazy that when you only talk about models that weights are downloadable and usable, all of a sudden America gets wiped off the chart. There's a lot of reasons for this and I can't wait to
talk about them, but since openw weight models don't pay the bills, we're going to do a quick sponsor break first.
Here's a hard question for you. How do
you know if an engineer is actually good? It's really hard to do. You might
good? It's really hard to do. You might
be able to look at their HTML shirt and make some assumptions, but when you're doing an interview, especially when you're reading someone's resume, how do you know they're actually good and not just AI generating some slop that you're going through a huge pile of as you fill
this role? It's never been more annoying
this role? It's never been more annoying to hire good engineers. I feel like there's fewer of them in the pile and the pile's never been bigger. If you're
tired of trying to find the needle in the hay stack and get a good engineer to work for you finally, you got to check out today's sponsor, G2I. These guys are without question the best way to hire good engineers fast. They have over
8,000 of them ready to go in their incredible network. These aren't people
incredible network. These aren't people who are fresh out of college. These are
real experienced engineers that have worked at big fang companies and small startups alike. know how to use all the
startups alike. know how to use all the tools you need, already are familiar with fancy AI development stuff, so they're not going to be slow. Whether
you want a couple junior engineers to kickstart a new project, or a lead that can dig you out of tech debt hell, they have you covered. You create a shared Slack channel with them. They
effectively are operating like your recruiting team. You give them a handful
recruiting team. You give them a handful of questions to ask the engineers. They
ask the engineers and record actual video responses from them so you know what the person's actually like. You go
through them, figure out the ones you want. They'll then go do a technical
want. They'll then go do a technical interview that they've speced out so you don't have to worry. Record it, send you the results, and once they've gone through all that, you can review it, figure out who you think fits best, and
then you can hire them. They're also
just super generous to work with. I've
referred a lot of people on a personal level, like a lot of the YC startups I work with, and every single one has had an incredible experience with G2I. Stop
hiring the old way and stop wasting your time. Get good engineers fast at
time. Get good engineers fast at soyv.link/g2i.
soyv.link/g2i.
Before we can discuss why China's winning so hard, it's important to understand what is an openw weight model. The point of openweight models is
model. The point of openweight models is somewhat similar to open-source models where you're giving out a significant portion of how the thing works. There's
a big difference between openweight and open source though. With open source, the code that is used to create the thing people experience is exposed. With
Linux, for example, the actual thing you download isn't the code when you're using Linux. What you download is the
using Linux. What you download is the binary that's compiled by the code. The
code is the input that results in the output that you are downloading and using. As such, I've seen a lot of
using. As such, I've seen a lot of people complain that openw weight models aren't open- source because you can't recreate that binary yourself. And I
don't really agree. Obviously, it would be cool if we had all of the training data and everything else that went into how the models were made. But the reason open source is valuable is because you
can reproduce the actual output. I can
take the code and on my computer compile it and get a result. There are almost no consumers, almost no developers who have all of the things that are necessary to
spend the millions of dollars on really hard to get infrastructure to turn that data into a model. There is no real reason to expose that and there's a ton of risk and liability if you expose all
of the data that's used in your training and every other company's just going to take that data, throw it into their data sets and suddenly be able to beat you in all of the things you're good at.
Depending on how you cut the lines and think about it, I would argue the data is almost the equivalent of the engineers in this case, not the equivalent of the source code. People
think about source code very specifically because you use the source code to compile the thing that you want and as such we should have the data if we want to call these models open. I
would argue we're just drawing our lines a bit differently. So in open source a developer creates source code that compiles into a binary that users can
use. In open weights data is used to
use. In open weights data is used to train weights that result in generated tokens.
If you think of it this way where you're drawing the lines here, you say like the researchers collect the data that creates the
weights that generate tokens. I would
see why you would call this not open.
But I don't think that is quite how it works. I think of it this way where the
works. I think of it this way where the data is similar to the developer in the case of the weights because it creates the thing that we can use to generate the thing we actually want which in this
case is the binary it's the Linux installable the actual kernel and in our case with models it's the tokens that we get from using the model. So the open
weights mean you have all of the pieces you need to run the model and generate results with it yourself. The weights
are the collection of parameters that are all mapped to and point to each other. So when you give it some text, it
other. So when you give it some text, it can guess what the best next token would be based on the text you give it and this giant hundreds of gigabytes pile of vectors and data that it has collapsed
into this model that it can use to generate the next token as predictably and reliably as possible. I think it's really cool that open weight has gone as far as it has. And I think it's really
convenient that openweight models can use the same licenses that open source code can. I already see people
code can. I already see people disagreeing in chat. I don't care. The
weights are not the binary. The weights
are a thing that can be reused and modified in very useful ways. The nature
of how baked these things are. Like
compiling code costs pennies and can be done on most computers. Turning data
into weights isn't even a deterministic process. And I know there's a lot of
process. And I know there's a lot of debates around this. I know there's a lot of things that like Richard Stallman's going to disagree with me here on. I don't really care. This all
here on. I don't really care. This all
comes down to whether you put this here or here. And I'm not one to [ __ ] when
or here. And I'm not one to [ __ ] when we get something as cool as openweight models. There's only one lab I know of
models. There's only one lab I know of that actually puts out the data and it's Allen Allen Institute. They were funded by Paul Allen from Microsoft as an
attempt to do truly open AI research in the US. And their models aren't just
the US. And their models aren't just open weight models. Their models also have the data exposed too. So you could hypothetically retrain the model on the data yourself. None of it's
data yourself. None of it's deterministic enough that you'll get the exact same weights. But yeah, it's exists. If you're wondering where this
exists. If you're wondering where this falls in the charts, right next to llama, not great. So, it's
cool that we do have a fully open lab that is sharing the data and everything that is based in the US, but they're not really competitive. Just wanted to call
really competitive. Just wanted to call that one out quick. The harsh reality is if we use the strict open- source definition that currently exists for code, there will never be a model that
meets the definition of open source. And
I agree, there probably won't be. And we
shouldn't use the term open- source to describe models. Open weight is still a
describe models. Open weight is still a very cool and useful thing. So with an open weight model, the value you get out of it is I can take those weights and run them on my own hardware or look at
different providers that are hosting them as well. If we go to something like open router and take a look at a Gemini model like Gemini 3 Pro preview, you can
use it in two places, Google Vertex and Google AI Studio because the weights for this model have never left Google's campus. The weights that you use to run
campus. The weights that you use to run these models and generate these results are exclusively provided through Google's own infrastructure because they want to sell you the API, not the model.
And since Google has their own infrastructure, they don't let other companies have access to this except for Apple privately potentially with a really really big pay deal of like a billion plus dollars to get the weights
privately that they can use for some Siri stuff. If you look at something
Siri stuff. If you look at something like OpenAI's GPT 5.1, your options are OpenAI. Some of these models are also
OpenAI. Some of these models are also available on Azure too, but that's it due to the OpenAI Microsoft partnership.
Let's compare that to Deep Seek 3.2 EXP.
We got Deep Infra, Novita, Shoots, Silicon Flow, and Atlas Cloud. Let's
look at Kimmy K2. Kimmy K2, Shoots, Silicon Flow, Novita, Deep Infra, Parasel, Bite Plus, plus seven more.
Moonshot. These are people who actually made the model. They are the eighth option in this list. Fireworks, Atlas
cloud, base 10 together, Grock, and Turbo from Moonshot.AI. Also notice the Turbo option for Moonshot, which costs $8 per million out, is less than half
the speed of Gro's solution here, and 8x the latency, too. Kind of nuts. The open
weight models allow for various providers to offer them, which allows for a different level of competition across infrastructure solutions. That is
really, really cool. But it does also mean that the official infrastructure in this case for Moonshot isn't really a great option. Moonshot charges 60 cents
great option. Moonshot charges 60 cents per mill in and 250 per mill out for under 20 tokens per second. Grock
charges a dollar per mill in and $3 per mill out. So slightly more for 356
mill out. So slightly more for 356 tokens per second. That is more than a 10x increase in throughput for a very minor bump in cost. This is the difference. When you have this type of
difference. When you have this type of competition, the value prop of your own infrastructure goes down, which makes it a lot harder for a company like Moonshot to make money on the Kimmy models, even
though they are fourth on the artificial intelligence chart. Google is a trillion
intelligence chart. Google is a trillion dollar company. Enthropic is a
dollar company. Enthropic is a multi-billion dollar company, potentially worth trillions someday.
OpenAI is already worth half a trillion dollars. Kimmy K2 Thinking by Moonshot
dollars. Kimmy K2 Thinking by Moonshot is a small company in China that isn't making real revenue yet. Do you know what's really funny though? Do you know which of these four companies has been
the kindest to work with for me as a creator? Moonshot. They've been trying
creator? Moonshot. They've been trying really hard for me to give them a mailing address so they can ship me a care package. They've been awesome to
care package. They've been awesome to work with. They always hit me up early.
work with. They always hit me up early.
They offer me free inference for any tests I want to do. They constantly send me useful resources about the things I'm talking about. Moonshot's been a
talking about. Moonshot's been a genuinely awesome company to work with and they even shout out their competitors when they have big launches.
Like when Zai had a big release, they immediately went and supported them.
They're a very good faith player, weirdly. So, Deepseek is very similar in
weirdly. So, Deepseek is very similar in this regard. Not in the com sense. Like,
this regard. Not in the com sense. Like,
I've never heard from anybody at Deepseek. By the way, Deep Seek guys, if
Deepseek. By the way, Deep Seek guys, if you want to hit me up, I'd love to chat.
Very, very big fan of what you did. I
would never have built T3 chat if it wasn't for Deep Seek V3 at the end of last year. I'm so impressed with the
last year. I'm so impressed with the work that DeepSeek has been doing for a while now and their research is incredible. They put out 12 papers last
incredible. They put out 12 papers last year that were so far ahead of where everyone else was. And the discoveries that made FP8 training much more reliable resulted in every lab
fundamentally changing how they did training. You could argue that a large
training. You could argue that a large portion of the speed that AIF accelerated this year came from the research DeepS put out for free last year. And yes, I have also talked to the
year. And yes, I have also talked to the ZI guys. They've been great. They've
ZI guys. They've been great. They've
been really, really awesome. It's crazy
how good at comms the Chinese labs have been with me at the very least. Openai
has been really good. Google's up and down. Enthropic is interesting. But my
down. Enthropic is interesting. But my
experience with the Chinese labs has been really good as a journalist, so to speak, covering these things publicly.
But none of that answers the question, why do open weight? Why are these companies releasing these models in a way that they make no money off them?
Like the real winner whenever DeepSeek drops isn't DeepSeek. It's companies
like Grock and Together and all these like cloud info providers that will host them for us. We currently don't have Deepseek version 3.2 like the final official version on T3 chat yet because
none of the providers are doing it well enough just yet. I would even argue being open weight makes things much harder for the labs even outside of the costs since Kimmy K2 is available for
anyone to host themselves. different
hosts aren't necessarily hosting it properly in the quality of certain behaviors like tool calls might go down meaningfully depending on which host you're using. Kimmy actually went as far
you're using. Kimmy actually went as far as creating the vendor verifier where they rank all of the companies hosting their models based on how reliably they do tool calling. These are all of the
companies that they say are hitting over 73%.
And if we scroll down, you'll see others not performing quite as well. It's cool
that they're doing better now because previously the gap was a lot bigger. But
by creating this bench and making this data public, they incentivize the hosts to fix their [ __ ] and also gave them the tool called eval python file that they can run against their own infra and find
the bugs and fix them. Doing this type of thing is really really annoying but they are doing it because otherwise the reputation of these models will be hurt as a result of other labs and other
hosts not hosting these things properly.
It's a small thing, but I also love they're using UV. Like, these guys get what US developers are expecting. So,
it's clear that doing open weight is harder. It makes it so you make way less
harder. It makes it so you make way less money. Why the hell are they doing it?
money. Why the hell are they doing it?
To be frank, nobody would trust them otherwise. If
you're using a Chinese model and it's being hosted in China, all the data in and out is now at a real risk, like a very legitimate risk. A lot of these
companies have Chinese government hands in them. There is no security team in
in them. There is no security team in the US that would approve of you using a Chinese model from Chinese infrastructure. And open weights allow
infrastructure. And open weights allow them to be relevant in the space right now. The fact that I'm legitimately
now. The fact that I'm legitimately considering doing more work with Chinese models as an American shows that the openweight strategy is working for them because it's the only way they can hold
any mind share in the US. There's even
been attempts to ban the use of Chinese models in the US. When Deep Seek R1 first dropped, there was a huge freak out about that. And the government here was actually considering passing
legislation that would make it illegal to download the weights. Wild. Insane. I
have files on my computers that would suddenly become illegal if that crazy proposal was to actually go through.
Absurd. So, this is like seriously the only way these Chinese labs will be taken seriously. And this goes a lot
taken seriously. And this goes a lot further than language models, too. It's
the same deal with a lot of their image and video generation models as well.
These models are not something that you'd want to run out of China, especially because they have restrictions on what GPUs they're even allowed to have access to. So, you might not be able to run some of these models
they're making on infra and the infra they have is limited to the use cases that they are using for which is mostly training. There's a whole culture around
training. There's a whole culture around getting cheaper GPUs and adding more VRAM to them in China in order to get around these import restrictions, which
is kind of crazy. All of this results in these models only being viable if they are released in a way that we can host them ourselves and use them ourselves.
There is no reason to make a great model in China and not release the weights because you won't be able to make money off it anyways right now. And this makes
these companies go from entirely ignored here to genuinely very relevant to the conversations we're having. The research
that kicked off a ton of this AI boom is the attention is all you need paper from the Google research Google brain deep mind team over at Google that was all
about the transformer model that allowed for us to create language models as we now know them. This then went further with OpenAI's follow-up research, improving language understanding by
generative pre-training. These two
generative pre-training. These two papers kind of kickstarted what we now know as AI. And these are open papers where they published what they did, how they did it, how they got there, and
what it could do. Hypothetically
speaking, any one of these labs could have sat on this information, not published it, and went and made crazy things with it. But then other companies wouldn't be able to innovate further.
Like if Google didn't release this paper, OpenAI wouldn't have had the kickstart that they needed. And if
OpenAI didn't follow up with this paper, we wouldn't have GPT as a concept. Or
maybe somebody else would have come up with it eventually. But if these were all private innovations that each lab was hopefully coming up with itself, the likelihood that any of them progressed meaningfully is way lower. The culture
around sharing our learnings and understanding is rooted deeply in science and research. This is just how advancements happen in technology. On
one hand, this does remind me of the open source world, the way that we're all building on top of each other. But
on the other hand, it's not truly traditionally open because we're spending tons of money doing this research and work and only publishing the things that we think are worth publishing and sharing and don't screw
our competitive advantages. Back when
nobody had working AI, sharing all of this made a lot of sense. Now that the American labs are in a cutthroat race competing with each other, their willingness to share has gone down a
ton. It's silly, but the first like cool
ton. It's silly, but the first like cool thing I've seen for different labs supporting each other in America in 2025 was when Sam Alman tweeted that Gemini 3
seems like a good model. Other than
that, I have not seen much in terms of good faith operations between executives at Anthropic, Google, and OpenAI.
There's just very little collaboration happening at this point because they're too busy trying to fight each other.
Meanwhile, Deepseek breaks everything again with V3.2 getting crazy scores, especially on tool calling stuff. And ZI
is right here in the replies legend heart. Like this is a whole different
heart. Like this is a whole different world. This is what the research was
world. This is what the research was like here before the competition started. We operated like this in the US
started. We operated like this in the US before where these companies were supportive of each other. Now that
they're all cutthroat trying to win this economic race, they're not as willing to collaborate and they're much more skeptical of things like distillation, people using their models to generate a
bunch of synthetic data to then retrain their own models with. In fact, a lot of them are accusing companies like DeepSeek of doing this with their data.
There was a point where certain Deepseek models, if you ask them what model are you, they would say chat GPT because they had data in their training corpus that came from those American models. If
Chinese models want to win, they have to be open because otherwise we won't use them. If Chinese labs want to be
them. If Chinese labs want to be competitive, they have to collaborate because we still have a lead there. And
there's one last piece that I haven't dove into much yet. I think this will make it into the video depending on how angry chat is. This is going to be fun.
Don't get too mad, boys. China sucks at writing software.
They're not quite as bad as Japan, but they're up there. China's incredible
manufacturing. They are surprisingly competent at research.
Chinese software, from my experience, is so atrocious that they end up spinning satellite companies up in the US so they can hire
United States-based software developers to make software that works. A
significant portion of Tik Tok's development happens here now because we have better engineers in the US. The
same way that a significant portion of manufacturing happens in China because they're better at it. Software
development happens in America because we're better at it. People are mad about the Japanese one. I don't care. Sony's
even accepted that the PlayStation software is an untenable mess and has fully outsourced it to the United States. They are hiring consultancies in
States. They are hiring consultancies in the US to save the operating system for PlayStation because they are so bad at software. great at research, great at
software. great at research, great at manufacturing, pretty good at logistics, not capable of writing software. A
significant portion of why I made T3 Chat is that as much as I hated the Claude interface and the chat GBT interface, the Deep Seek 1 was actually unusable, entirely unusable, miserable
to touch. And I wanted to use the model
to touch. And I wanted to use the model somewhere better. And I made T3 Chat
somewhere better. And I made T3 Chat kind of as a pun on V3 Chat because I like Deepseek V3 so much and I wanted to have a better interface for it. The
Chinese labs cannot compete on the software side. And this is where most
software side. And this is where most people come in. Most users don't see this new model came out and then go download the weights and try running it on their local GPU. Most of these
weights cannot be run that way because most of them are too massive to run even on like a high-end local GPU. You're not
going to run Mini Max or Kimmy K2 thinking on your RTX 5080 anytime soon.
So the average consumer wouldn't have done that anyways even if they could.
They're going to go to the app store and look up the app and the Deepseek app will never even come close to the apps by the American labs or even by a third party like Perplexity or like us with T3
chat. So they cannot win on the top
chat. So they cannot win on the top level where people are adopting the thing. They cannot win on the API level
thing. They cannot win on the API level because no American companies are going to use their APIs. So they have to go even deeper. They have to provide the
even deeper. They have to provide the models so they can win at that level some amount and we can build everything else the way we want to on top. So what
about America? Can the US make a comeback here? Can we somehow get back
comeback here? Can we somehow get back in the actual ring with openweight models? The only major US-based
models? The only major US-based openweight model to come out this year has been the new models from OpenAI. GBT
OSS120 is the fourth best performing openweight model according to artificial analysis. That doesn't sound great. Like
analysis. That doesn't sound great. Like
it's open AI. They're a half trillion dollar company. How are they not
dollar company. How are they not competing with these small Chinese labs?
It's not because they don't have the resources to do it. It's because of this. The 12B Mini Maxm 2 is 230 billion
this. The 12B Mini Maxm 2 is 230 billion params, almost double. And according to artificial analysis, it gets the same
score. Deepseek 3.2 is 685 billion
score. Deepseek 3.2 is 685 billion params, 5x the amount on the OpenAI open weight model. Kimmy K2 thinking is a 1
weight model. Kimmy K2 thinking is a 1 trillion parameter model. Almost 10x the OpenAI model. None of these can be run
OpenAI model. None of these can be run on machines that you have in your house.
That is way too much memory to run any of these things. The 120B model can max out my RTX 5090. None of these other ones are going to fit on your GPUs.
We're talking 500 gigs of VRAM to run K2 thinking. The strategy that OpenAI took
thinking. The strategy that OpenAI took here is one that I actually commend. I
was at one of the original listen group session things they did with developers who wanted the open weight models and they actually like had Sam Alman come in and talk to us a whole bunch. It was
genuinely really cool and we got to ask a ton of questions. In Sam's opinion, the only reason you would want an open weight model when there are good APIs with closed weight models is because you want to run it on hardware you own. I
think there's a real value in the competition of different providers hosting models that makes something like certain Kimmy models or DeepSec models really fast on certain providers. But
for the most part, he is right. If
there's a model from a lab you trust that's hosted in places you trust, the only reason you would want it to be open weight is so you can run it yourself.
It's not going to be cheaper to spin up a bunch of servers that you need to spend hundreds of thousands of dollars in GPUs on than it is to just hit an API from somebody who's already doing it. It
would be way cheaper to run it on your GPU in your house, though. It'd be even cheaper to run on your laptop with a much smaller model. So, they were trying to figure out what sizes to target based
on what we wanted to run them on. Every
other lab seems focused on how much money do they have and how smart a model can they generate, not thinking about how big or small is the model going to be. They're much more so thinking about
be. They're much more so thinking about how they can win and have the best scores possible. Open AAI already knows
scores possible. Open AAI already knows they have the best scores possible. They
don't want to make the openweight models do that because then they're just giving a free win to all their competition, but they do want to play in the open model space. They do want to give models to
space. They do want to give models to people who want to run them on their own hardware. And that's why they put out
hardware. And that's why they put out two models, the 120B and the 20B. The
120 bill you can run on a single beefy enough GPU. And the 20 bill you can run
enough GPU. And the 20 bill you can run on a modern enough laptop with a real GPU in it. That was a very specific decision they made rather than how good of a model can we make possibly. They
thought about this as given these two performance targets, how smart can we make something that fits within that box? And they have crushed that box.
box? And they have crushed that box.
There is nothing that comes close to GBT OSS120B within those performance constraints.
This is a really good angle for American companies to compete in the openw weight space and I am thankful somebody actually took the time to do it this way. I can use these models for real
way. I can use these models for real things. And since these models are
things. And since these models are lighter and easier to run, some of the speeds these companies are getting out of them are nuts. GBT OSS120B
is pulling on Parasale 300 TPS on Somanova. It's pulling 650
on Somanova. It's pulling 650 on Grock. It's 550. That's crazy.
on Grock. It's 550. That's crazy.
There are models that are pulling 10 tokens per second. This is 55 times faster than some of those models. And
many of them are dumber, too. The fact
that these models are usable on consumer hardware and can perform that fast in certain cases and are actually useful for things is incredible. And I see why
this is the angle OpenAI took. OpenAI
was not interested in making a model that competed with GPT5 or 5.1. They
were interested in making the best possible thing you could run yourself.
These Chinese labs aren't interested in making things you can run yourself. If
it so happens that you can, that's a cool side effect.
There are two customers of open weight consumers and enthusiasts and info providers.
OpenAI is an info provider. That's how
they make a meaningful amount of their money about 20 to 30% depending on the month. They don't want to or they don't
month. They don't want to or they don't want more info providers. Enthropic you
can use their mo with enthropic you can use their models on Google and AWS and now also on Azure. OpenAI was only usable on OpenAI's infra until somewhat recently where they partnered with Azure
and Microsoft during one of their crazy finance rounds. So Azure can now host
finance rounds. So Azure can now host some OpenAI models. Meanwhile, all of the Chinese labs are hostable on almost all of those providers and a bunch of other places too. Most of the Chinese models we're talking about cannot
reasonably be hosted by a consumer or an enthusiast. And people are wondering
enthusiast. And people are wondering about how much RAM does 120 bill per RAM mean. It doesn't mean 120 gigs. In this
mean. It doesn't mean 120 gigs. In this
case, it means 80 gigs of VRAM.
Supposedly, you can get away with 60 fine. And honestly, you can get away
fine. And honestly, you can get away with a little less in certain cases, but you can run the 120 bill model on a computer with enough VRAM. Yeah. So,
this laptop's 128 gig. Let's set up LM Studio and try it quick. One of the cool things about Apple Silicon is that the GPUs and the CPUs are sharing memory.
You don't have separate VRAM from regular RAM, which means I have 128 gigs of VRAM on this machine. Look, when you set up LM Studio, it tells you to use
GBT OSS 20 because it's one of the best small models. So, the 20 bill model is
small models. So, the 20 bill model is 12 gigs and the 120 bill model is 60 plus. So, I just got the GBT OSS 20 bill
plus. So, I just got the GBT OSS 20 bill running on my laptop here. It's a maxed out M4 Pro from Apple. M4 Max. Didn't
have the patience to wait for the M5. I
got it very recently. We got 128 gigs of RAM and we're using 12 gigs right now from the GP OSS20 bill model. Can I tell it to write me three poems about JavaScript?
And it is flying. That was 117 tokens per second on my laptop locally. Pretty
cool, right? Let's switch this over to OSS 120 bill. It's going to take a sec to load because it has to load that all into memory. And you can see my memory
into memory. And you can see my memory consumption going up fast. We're now at 59 gigs of VRAM being used for this.
It's total memory cuz Mac OS and Apple silicon using the same memory for both.
Send slower, but that chugged. The fact that I can get almost 80 TPS on a model that smart on my own computer. Do you
understand how [ __ ] cool that is?
This is what OpenAI's choice was. They
wanted to make models that you can run on real consumer hardware. And while
there aren't many good GPUs that you can buy and plug into your desktop that can do this because desktop GPUs have very limited VRAM, if you get a Mac Studio or a MacBook Pro with enough RAM, you can
actually do inference on it. That's the
difference. It's so different that we just got a crazy comment from chat.
Apple's RAM prices seem reasonable now.
They actually kind of are when you consider this and also the crazy squeeze happening in memory. This is cool. This
is really cool. The point of this distinction for me is very simple. Open
interested in helping the competition here because they are deep in that space and they see no issue with the current state of in providers in the US. The
Chinese labs can't provide their own infrastructure because nobody will use it. So they need other input providers
it. So they need other input providers to host their stuff. By doing openweight models, they can convince those labs to host their things. Consumers and
enthusiasts can't use most of those models because they're way too big, but they can use the two that OpenAI released. OpenAI is interested in this
released. OpenAI is interested in this space because it's one of the few that they weren't really competitive in and they wanted to to go back and win it and they did. The the goal of the GPOSS
they did. The the goal of the GPOSS models has been achieved. If you are running a model locally, there's a good chance you're using GPTOSS or you're doing something less optimal than you otherwise could. But that's not winning
otherwise could. But that's not winning open weights because that's not going to get you high up on the chart here. And
this is where my conclusion comes. I
don't think we're ever going to see an open weight model from the US win on this chart ever again. There's just very little incentive for labs to do it.
Meanwhile, the Chinese labs won't make it to America and they won't even be on charts like this if they don't do open weight. Don't think of this as why
weight. Don't think of this as why aren't American companies keeping up with these open weight models. Rather,
think of this as why do the Chinese labs have to do open weight even if nobody can use the things they're publishing other than like six companies. There are
very very few places in the world that can handle a one trillion parameter model. But they put it out anyways
model. But they put it out anyways because they need a way for American labs and American companies and infrastructure providers to use it.
There is some hope left, but it's dwindling fast because at the very end of this chart, we have Llama. Meta has
released all of their models as open weight. Historically, they were one of
weight. Historically, they were one of the first people doing good openweight models. In fact, the way a lot of people
models. In fact, the way a lot of people were using Deepseek models originally wasn't through DeepSeek. It was by using the Deepseek R1 model to do a finetune
on Llama 3. And a lot of us were using that Llama 3 fine-tune as Deepseek even though it was Llama bastardiz into acting like Deepseek.
There is some chance for Meta to catch up here, especially some of the hires they've made. But I just don't see it
they've made. But I just don't see it happening. They are so so far behind at
happening. They are so so far behind at this point. There's also some effort to
this point. There's also some effort to try and fund this type of research. like
the White House's attempts to fund open models and AI research in the US, providing everything from inference to power and paying for research to happen
here. This was published in July and I
here. This was published in July and I have not heard anything about it since there's also the potential security risk. The problem with openweight models
risk. The problem with openweight models in security is that you can't take it back once you put it out. If it turns out that you could use Deep Seek 3.2 to
to make a nuclear weapon. They can't
take it away. That issue now exists forever in the model in the weights.
Once it is published, you can't unpublish it. Meanwhile, if some issue
unpublish it. Meanwhile, if some issue was discovered with OpenAI's new model, you can add a layer in front to prevent it. If it turns out GPT5's weights are
it. If it turns out GPT5's weights are capable of telling you how to make a nuclear weapon, you can put a safeguard in front when the API request comes in and before the response goes out with those instructions, you can block it.
you can prevent it. Security risks,
copyright risks, all of these types of things are a lot easier if you can block the request in or out before it gets to the model. But once the model's out
the model. But once the model's out there, you've lost your ability to do this. So there's a huge risk and
this. So there's a huge risk and liability from the labs that are publishing these openweight models. It's
a lot more work to do it. It's a lot more work to do it right. But the
Chinese labs don't really care. Their
willingness to put out things that are potentially actual security risks is zero. They just don't give a [ __ ] They
zero. They just don't give a [ __ ] They put it out when they can win benchmarks.
The American labs have liability to worry about. They have expectations to
worry about. They have expectations to worry about. Investors, they don't want
worry about. Investors, they don't want to piss off. They have to go out of their way to make sure their models are safe and don't have potential copyright issues. And even if we start funding the
issues. And even if we start funding the creation of these openweight models here, the expectations that would be set on them from the government of them being safe, reliable, and hitting the
expectations of the American government is going to make it a lot harder to do, right? It's a lot easier to add these
right? It's a lot easier to add these things in front of the model than in the model itself. And when you are giving
model itself. And when you are giving the model weights out, you're giving up your ability to control what goes in. So
my conclusion is pretty clear. I think I do not see America competing in the open weight models that are only hostable via infra providers like these super giant
models. I don't see us competing there.
models. I don't see us competing there.
But I think we have a unique potential to win here. As more people get stronger computers with better GPUs as more consumers have more reasons to try out
these models, as Apple starts shipping models on our devices, as Chrome starts shipping models in Chrome itself, where there's more reason to run locally, there's a very, very good chance that
America can win with those. But we need an incentive if we're going to give out the weights. And right now there is not
the weights. And right now there is not much incentive for American labs to give models out for free to their competition. There is potentially
competition. There is potentially incentive to give us things that we can run on our own machines. So I don't see us winning anytime soon, but I hope we can win here. Let me know what you guys think. Am I way overblowing this or is
think. Am I way overblowing this or is China definitely going to be the winner of open weight? Curious how y'all feel and if you even care, let me know. And
until next time, peace nerds.
Loading video analysis...