DeepSeek AI Introduce DeepSeek V3.2 and V3.2 Speciale for Long Context Reasoning and Agent Workloads
By Marktechpost AI
Summary
Topics Covered
- Sparse Attention Matches Dense Quality
- GRPO Distills Specialist RL Domains
- 85K Synthetic Tasks Train Agents
- Thinking Mode Persists Across Tools
- Olympiad Gold on Open Weights
Full Transcript
How do you get GPT5 level reasoning on real long context tool using workloads without paying the quadratic attention and GPU cost that usually makes those
systems impractical?
Deepseek research introduces DeepSseek V3.2 and Deepseek V3.2 special. They are
reasoning first models built for agents and targets highquality reasoning, long context and agent workflows with open weights and production APIs. The models
combine deepseek sparse attention DSA, a scaled gpo reinforcement learning stack and an agentnative tool protocol and
report performance comparable to GPT5 with deepseek v3.2 special reaching Gemini 3.0 0 Pro level reasoning on
public benchmarks and competitions. Both
Deepseek V3.2 and Deepseek V3.2 Special use the Deepseek V3 mixture of experts transformer with about 671 billion total
parameters and 37 billion active parameters per token inherited from V3.1 terminus. The only structural change is
terminus. The only structural change is Deepseek sparse attention introduced through continued pre-training. Deepseek
sparse attention splits attention into two components. A lightning indexer runs
two components. A lightning indexer runs a small number of low precision heads over all token pairs and produces relevant scores. A fine grain selector
relevant scores. A fine grain selector keeps the top K key value positions per query. And the main attention path runs
query. And the main attention path runs multi-query attention and multi- head latent attention on this sparse set.
Deepse seek sparse attention DSA is introduced by continued pre-training on top of DeepSeek V3.2 terminus. In the
dense warm-up stage, dense attention remains active. All backbone parameters
remains active. All backbone parameters are frozen and only the lightning indexer is trained with a pullback libler loss to match the dense attention
distribution on 128k context sequences.
This stage uses a small number of steps and about 2 billion tokens, enough for the indexer to learn useful scores. In
the sparse stage, the selector keeps 248 key value entries per query. The
backbone is unfrozen and the model continues training on about 944 billion tokens. Gradients for the indexer still
tokens. Gradients for the indexer still come only from the alignment loss with dense attention on the selected positions. This schedule makes deepsek
positions. This schedule makes deepsek sparse attention DSA behave as a drop-in replacement for dense attention with similar quality and lower long context
cost. On top of the sparse architecture,
cost. On top of the sparse architecture, DeepSseek v3.2 2 uses group relative policy optimization GRPO as the main
reinforcement learning method. The
research team state that post-training reinforcement learning RL compute exceeds 10% of pre-training compute. RL
is organized around specialist domains.
The research team trains dedicated runs for mathematics, competitive programming, general logical reasoning, browsing and agent tasks and safety.
then distills these specialists into the shared 685b parameter base for deepseek v3.2 and
deepseek v3.2 special grpo is implemented with an unbiased k estimator of policy sequence masking and
mechanisms that keep mixture of experts m routing and sampling masks consistent between training and sampling. Deepseek
research team builds a large synthetic agent data set by generating more than 1,800 environments and more than 85,000
tasks across code agents, search agents, general tools, and code interpreter setups. Tasks are constructed to be hard
setups. Tasks are constructed to be hard to solve and easy to verify and are used as RL targets together with real coding and search traces. At inference time,
DeepSeek V3.2 two introduces explicit thinking and non-thinking modes. The
deepseek reasoner endpoint exposes thinking mode by default where the model produces an internal chain of thought before the final answer. The thinking
with tools guide describes how reasoning content is kept across tool calls and cleared when a new user message arrives and how tool calls and tool results stay
in the context even when reasoning text is trimmed for budget. The chat template is updated around this behavior. The
DeepSseek v3.2 special repository ships Python encoder and decoder helpers instead of a Ginga template. Messages
can carry a reasoning content field alongside content controlled by a thinking parameter. A developer role is
thinking parameter. A developer role is reserved for search agents and is not accepted in general chat flows by the official API which protects this channel
from accidental misuse. On standard
reasoning and coding benchmarks, DeepSseek V3.2 and especially Deepseek V3.2 Special are reported as comparable
to GPT5 and close to Gemini 3.0 Pro on benchmark suites such as AIM 2025, HMMT 2025,
GPQA, and Live Codebench with improved cost efficiency on long context workloads. For formal competitions,
workloads. For formal competitions, Deepseek research team states that DeepSseek V3.2 two special achieves gold medal level performance on the
International Mathematical Olympiad 2025, the Chinese mathematical olympiad 2025 and the international olympiad in
informatics 2025 and competitive gold medal level performance at the ICPC world finals 2025. Five.
Loading video analysis...