gemma-4-31B-K1-v5

license: apache-2.0 language:

zh
en base_model:
huihui-ai/Huihui-gemma-4-31B-it-abliterated-v2 pipeline_tag: image-text-to-text library_name: transformers tags:
RL
GRPO
cognitive ability
conversational

K1 is a model that underwent post-training using datasets such as iannicity/KIMI-K2.5-1000000x, iannicity/Hunter-Alpha-SFT, and stepfun-ai/Step-3.5-Flash-SFT.

It subsequently incorporated GRPO-based reinforcement learning training derived from Chinese logical problems, resulting in relatively consistent reasoning capabilities and enhanced cognition in complex scenarios.

I have also observed that GRPO produces side effects such as improved numerical computation ability, which is related to its influence on layers such as MLP.

If you like my work, you are welcome to support me by buying me a coffee on Ko-fi.

Every bit of your support directly helps me continue creating and allows me to spend more time producing better work:

https://ko-fi.com/ogodwin10

Hourly Usage

Performance Metrics

Model Information