Gemma 4 31B Claude Opus Reasoning

Full parameter fine-tune of google/gemma-4-31B-it on 12,680 Claude Opus 4.6 reasoning traces.

First full-parameter fine-tune of Gemma 4 31B.

Highlights

89.7% token accuracy after 4 epochs
Full parameter SFT on 8x NVIDIA H200 — all 31B parameters updated, not LoRA
12,680 pure Claude Opus 4.6 traces — consistent reasoning style, no mixed-model data
Native Gemma 4 thinking format — uses built-in thinking tokens
Runs on a 4090 at Q4_K_M (~17GB VRAM)

Training


Base	google/gemma-4-31B-it
Method	Full parameter SFT (not LoRA)
Framework	TRL SFTTrainer + PyTorch FSDP
Hardware	8x NVIDIA H200 (141GB each)
Precision	bf16
Total epochs	4 (2 at lr=1e-5, then 2 more at lr=5e-6)
Sequence length	8,192
Batch size (effective)	10

Training Schedule

Two-phase approach for optimal convergence:

Phase	Epochs	Learning rate	Result
Initial	2	1e-5 (cosine)	80.8% accuracy
Continued	2	5e-6 (cosine)	89.7% accuracy

Continuing at lower LR on a warm checkpoint improved accuracy by 9 percentage points.

Training Metrics

Metric	After phase 1	After phase 2 (final)
Loss	27.5	13.6
Token accuracy	80.8%	89.7%
Grad norm	15.3	15.3
Entropy	0.69	0.34

Training Data (~12,680 samples)

All Claude Opus 4.6. No mixed-model data.

Dataset	Samples	Description
Crownelius/Opus-4.6-Reasoning-3300x	2,160	Cleaned Claude Opus 4.6 reasoning — math, code, diverse
TeichAI/Claude-Opus-4.6-Reasoning-887x	887	Tool-use reasoning + vague prompt handling
Roman1111111/claude-opus-4.6-10000x	9,633	Math/logic reasoning with verified solutions

Usage

from transformers import AutoProcessor, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "EganAI/gemma4-31b-opus-reasoning",
    torch_dtype="auto",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("EganAI/gemma4-31b-opus-reasoning")

messages = [
    {"role": "user", "content": "Prove that the square root of 2 is irrational."},
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs, max_new_tokens=2048, temperature=1.0, top_p=0.95, top_k=64
)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=False))

Hardware Requirements

Format	VRAM	Device
bf16	~62GB	1x A100/H100 80GB
Q8	~31GB	2x RTX 4090
Q4_K_M	~17GB	RTX 4090
Q3_K_M	~14GB	RTX 4080

Implementation Notes

Gemma 4 requires mm_token_type_ids even for text-only training — custom data collator injects zeros
SDPA attention only — flash attention is incompatible with Gemma's soft-capping
FSDP over DeepSpeed — simpler config for day-zero model support

Related Models

EganAI/gemma-4-31B-Terminal-Agent — Stage 2 model: terminal/coding agent built on this checkpoint (coming soon)
google/gemma-4-31B-it — base model
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled — similar approach on Qwen3.5

License

Apache 2.0 (same as Gemma 4)

Gemma-4-31B-Claude-4.6-Opus-Reasoning-Distilled

Hourly Usage

Performance Metrics

Model Information