Deepseek V3.2 in 5 mins!

By 1littlecoder

Summary

## Key takeaways - **685B Open-Weight Model**: Deepseek V3.2 is a 685 billion parameter model that they have open sourced on Hugging Face, allowing finetuning and running on your own hardware. [00:00], [00:19] - **DeepSeek Sparse Attention**: DeepSeek Sparse Attention (DSA) optimizes the attention mechanism in transformers, substantially reducing computational complexity while handling long contexts. [00:52], [01:12] - **Scalable RL Framework**: Deepseek implemented a robust reinforcement learning protocol and scaled post-training compute, making V3.2 comparable to GPT-5, with the Speciale variant surpassing it and matching Gemini 3.0 Pro. [01:22], [01:48] - **Gold Medal Olympiad Wins**: The high-compute DeepSeek V3.2-Speciale achieves gold-medal performance in the 2025 IMO and IOI, rivaling top closed models. [02:09], [02:20] - **Agentic Task Synthesis Pipeline**: Deepseek built a large-scale pipeline that systematically generates training data at scale, integrating reasoning into tool-use scenarios to address data shortages. [02:33], [02:57]

Topics Covered

Deepseek V3.2 Matches GPT-5 Openly
Sparse Attention Cuts Long-Context Costs
Scale RL to Rival Closed Models
Gold Medal AI Without Secret Sauce
Agentic Pipeline Solves Data Exhaustion

Full Transcript

Deepseek is back and this time with a 685 billion parameter model that they have open sourced it on hugging face.

This is the latest model from Deepseek Deepseek V3.2 and the biggest jump in this model's performance is that the model is on par with GPD5 and also Gemini 3.0 Pro. The best thing about

this particular model is this model is open weight. So you can take this model

open weight. So you can take this model try to do finetuning try to run it on your own hardware. So DeepSync not only just open source the model but they also open sourced the secret recipe with

which they build the model. That means

there's a technical paper. This is

something that you don't see every day from US-based companies. But Deepseek

has decided to share it openly so that people can learn how they have built the model. So now about this particular

model. So now about this particular model in itself as I said this is a 685 billion parameter model. The model is built on three primary principles or they call it technical breakthrough. The

first one they've got something called deepseek sparse attention. Deepseek

sparse attention helps the attention mechanism in transformers but it also substantially reduces the computational complexity. So when you have a

complexity. So when you have a transformers model when you have the long context the context window increasing the model requires a lot more

computation but deepseek's DSA deepseeek sparse attention is optimized for this.

So that means you don't need a lot of computation even when you are increasing the complexity. The next thing is

the complexity. The next thing is scalable reinforcement learning framework. If you have listened to the

framework. If you have listened to the recent podcast either from Anji Karpati or Ilia Sudka, one thing that you might have noticed that their emphasis on reinforcement learning not just on

pre-training and what Deepseek has done here is that they have figured out a way to scale reinforcement learning. So that

means they have built a reinforcement learning protocol and they've also scaled the post training computer. So

which so that means Deepseek V3.2 has a performance that is comparable to GPD5 and also a higher compute version of Deepseek V3.2 which they calling it a

special surpasses GPD5 and also has a performance on par with Gemini 3.0 0 Pro and this model also has a performance similar to the open flagship model and Google's flagship model which has got

gold medal performance in 2025 international mathematical olympiad and international olympiad in informatics III and IMO. So this is probably one of

the very few models in this entire world where it has got a gold medal level performance but still you have the recipe the secret sauce of how they went about building the model. I mean truly

open science and research. And finally,

Deepseek has built a large scale agentic task synthesis pipeline. So one thing that you might have heard again and again is that we're going to run out of pre-training data. So what are we going

pre-training data. So what are we going to do once we run out of pre-training data or if people are going to stop using Stack Overflow, what are we going to do to build coding models? This is a question that has been there around for

a very long time. And what Deep Seek has done is to integrate a reasoning into tool scenarios. They have designed their

tool scenarios. They have designed their own synthesis pipeline that actually generates systematically generates training data at scale. Overall if you see that this is like a great model

across all the benchmarks me 2025 HMMT 2025 code forces which is like a heavy competitive programming benchmark sweep bench verified and then you have got

terminal bench 2.0 for agentic related task and then you have got to bench across all these benchmarks you can see that deepsev 3.2 2 which is the higher

compute model and then you can see deepseek v3.2 thinking which is slightly lower level compute that is required but also it's a thinking model. Both these

models have got performance on par or at least like exceeding with GPD 5 high in terms of special and also it's got like performance closer to Gemini 3.0 pro. So

across all these benchmarks I would say like this is a very very strong model.

If at all you to think that they've done benchmark maxing, let's say like they they've done benchmark hacking, benchmark maxing. I don't think they

benchmark maxing. I don't think they would have shared the secret sauce openly like this, I don't also think that their model would have gone ahead and then won gold medal level performance. I think overall this is a

performance. I think overall this is a great release. There is a lot more for

great release. There is a lot more for us to go through the paper and then learn from it. But for now, Deepseek is back. The blue whale is back. You can go

back. The blue whale is back. You can go chat with the Deepseek model here. You

can go to chat.deseek.com and then chat with the model. Let me know what you think about this model. Thanks to

Deepseek for open sourcing model. See

you another video.

Loading...

Loading video analysis...