Avg. Total Time
N/A
Avg. TTFT
N/A
Avg. Prefill TPS
N/A
Avg. Gen TPS
N/A
Context Size
262144
Quantization
r64
Engine
vllm
Creation Method
LoRA Finetune
Model Type
Gemma31B
Chat Template
Gemma4
Reasoning
Yes
Vision
Yes
Parameters
31B
Added At
5/1/2026
language:
[!NOTE] Gemopus is an attempt at fine-tuning Gemma 4 with a core philosophy of "stability first".
While preserving the original reasoning order of Gemma 4 as much as possible, we conducted targeted refinements for answer quality, structure, clarity, and consistency.
This model was trained in a post-fix Unsloth environment, after Unsloth's official gradient-accumulation and loss-accounting fixes for Gemma-family training. In practice, I used a bug-fixed stack aligned with
unsloth_zoo>=2026.4.6andtransformers==5.5.0, in order to avoid misleading loss inflation under gradient accumulation and to obtain more reliable optimization behavior for Gemma 4 31B fine-tuning.π Therefore, My fine-tuning strategy chose not to follow other teams in aggressive direct distillation from Claude. Instead, we opted for a more conservative and controllable path.
Gemopus-4-31B-it is a supervised fine-tune version based on the Gemma 4 31B Instruction model.

Recent work:
Ren et al., 2026 β Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability (arXiv:2604.06628)
Short-epoch reasoning SFT can underestimate generalization β in-domain gains may appear early, while out-of-domain improvements often require sufficient optimization.
This paper suggests that generalization in reasoning SFT is not fixed, but conditional β shaped jointly by optimization dynamics, training data quality, and base-model capability.
Key takeaways:
For Gemopus-4-31B-it, this evidence supports a more conditional interpretation of reasoning supervision. My strategy is therefore not based on the simplistic claim that reasoning SFT never generalizes, but on a practical judgment about which kind of reasoning supervision is worth applying to Gemma 4. Since Gemma 4 31B already has a relatively orderly and restrained reasoning-chat prior, I chose not to aggressively overwrite it with public "Claude-style" traces of uneven quality. Instead, the SFT objective focuses on preserving Gemma 4's native reasoning order while improving answer quality, structure, clarity, and interaction consistency.
This also suggests that reasoning SFT should be viewed as a dynamic optimization process, rather than a static training outcome. For this project, that means prioritizing data quality, optimization discipline, and compatibility with the base model's native strengths, rather than assuming that longer visible reasoning alone will automatically produce a better student.
Based on the methodological deduction above, I chose to focus my optimization efforts on the lower-risk, more consistently rewarding levels of final answer quality and interactive experience:
For the best performance, use these configurations and best practices:
Use the following standardized sampling configuration across all use cases:
temperature=1.0top_p=0.95top_k=64Compared to Gemma 3, the models use standard system, assistant, and user roles. To properly manage the thinking process, use the following control tokens:
<|think|> token at the start of the system prompt. To disable thinking, remove the token.<|channel>thought\n [Internal reasoning] <channel|><|channel>thought\n<channel|> [Final answer][!NOTE] Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you.
π§ The complete fine-tuning code and related notebooks for this model will be updated soon, please stay tuned!
π GitHub Repository: Jackrong-llm-finetuning-guide
Welcome to visit this repository to gain a deeper understanding of the codebase and reproduce the training results locally or on Colab.
π Qwopus3.5-27b Complete Fine-Tuning Guide (PDF)
No one starts out as an expert, but all experts bravely took the first step.
All training and testing for this project are self-funded. If you find this model or guide helpful, giving a Star βοΈ on GitHub is the greatest encouragement to me. π
Base Model (google/gemma-4-31B-it)
β
βΌ
Targeted Supervised Fine-Tuning (SFT)
(Focus on Answer Quality & Structural Alignment, Retaining Restrained CoT)
β
βΌ
Gemopus-4-31B-it
The training data specifically curates highly coherent instruction pairs with optimal structures from the open-source community, alongside natural multi-turn conversations. The goal is to guide the model to learn more mature ways of organizing and presenting conclusions, rather than mechanically imitating "fake chain of thought" without internalized logic.
Tool Calling Compatibility: The Gemma 4 series models still have known compatibility issues with tool calling functionality in local inference ecosystems like llama.cpp / LM Studio (including call failures, format mismatches, continuous loops, etc.). This has been widely reported in the community and is not unique to this model. If your workflow heavily relies on tool calling, it is recommended to thoroughly test it before official use, or temporarily consider solutions with more mature ecosystem support.
Regarding Fine-Tuning Characteristics of the Gemma Architecture: From an engineering practice perspective, the Gemma series does exhibit different training dynamics compared to the Qwen series during fine-tuningβincluding wider loss curve fluctuations and greater sensitivity of gradient stability to hyperparameters. This may be related to Google's model architecture design. Furthermore, the base Gemma 4 model objectively still has a gap compared to the Qwen 3.5 series in certain dimensions of its raw capabilities. We believe that truthfully stating these observations is more beneficial to the technical judgment of the community than selectively avoiding them.
Project Positioning: The core value of Gemopus-4-31B-it lies in providing an engineering exploration reference supported by methodology for SFT fine-tuning under the Gemma 4 architecture, rather than a fully production-ready solution. If you are looking for a productivity model that has undergone more iterative validation and offers more stable ecosystem compatibility, I recommend looking at the Qwopus-3.5-v3 seriesβits performance after fine-tuning is much more robust.
Special thanks to the developers in the open-source community for building such a thriving ecosystem. Thank you to the Unsloth team for providing excellent and highly efficient LLM fine-tuning support, and sincere respect to the Google team for open-sourcing the outstanding Gemma 4 base model. Finally, thanks to all the researchers who have contributed profound insights into CoT Faithfulness and the interpretability of LLM reasoning. It is exactly these rigorous frontier academic discussions that deeply inspired the core fine-tuning methodology of this project.