Avg. Total Time
32.26s
Avg. TTFT
8.65s
Avg. Prefill TPS
616.33
Avg. Gen TPS
25.59
Context Size
262144
Quantization
r64
Engine
vllm
Creation Method
LoRA Finetune
Model Type
Gemma31B
Chat Template
Gemma4
Reasoning
Yes
Vision
Yes
Parameters
31B
Added At
5/2/2026
license: apache-2.0 language:
K1 is a model that underwent post-training using datasets such as iannicity/KIMI-K2.5-1000000x, iannicity/Hunter-Alpha-SFT, and stepfun-ai/Step-3.5-Flash-SFT.
It subsequently incorporated GRPO-based reinforcement learning training derived from Chinese logical problems, resulting in relatively consistent reasoning capabilities and enhanced cognition in complex scenarios.
I have also observed that GRPO produces side effects such as improved numerical computation ability, which is related to its influence on layers such as MLP.
If you like my work, you are welcome to support me by buying me a coffee on Ko-fi.
Every bit of your support directly helps me continue creating and allows me to spend more time producing better work: