Gemma-4-31B-Claude-4.6-Opus-Reasoning-Distilled

Creative model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

29.57s

Avg. TTFT

12.55s

Avg. Prefill TPS

445.10

Avg. Gen TPS

22.01

Model Information

Context Size

262144

Quantization

r64

Engine

vllm

Creation Method

LoRA Finetune

Model Type

Gemma31B

Chat Template

Gemma4

Reasoning

Yes

Vision

Yes

Parameters

31B

Added At

5/1/2026


license: apache-2.0 base_model:

  • google/gemma-4-31B-it library_name: transformers tags:
  • gemma4
  • gemma
  • reasoning
  • claude-opus
  • distillation
  • full-finetune
  • sft language:
  • en pipeline_tag: image-text-to-text

Banner

Gemma 4 31B Claude Opus Reasoning

Full parameter fine-tune of google/gemma-4-31B-it on 12,680 Claude Opus 4.6 reasoning traces.

First full-parameter fine-tune of Gemma 4 31B.

Highlights

  • 89.7% token accuracy after 4 epochs
  • Full parameter SFT on 8x NVIDIA H200 — all 31B parameters updated, not LoRA
  • 12,680 pure Claude Opus 4.6 traces — consistent reasoning style, no mixed-model data
  • Native Gemma 4 thinking format — uses built-in thinking tokens
  • Runs on a 4090 at Q4_K_M (~17GB VRAM)

Training

Basegoogle/gemma-4-31B-it
MethodFull parameter SFT (not LoRA)
FrameworkTRL SFTTrainer + PyTorch FSDP
Hardware8x NVIDIA H200 (141GB each)
Precisionbf16
Total epochs4 (2 at lr=1e-5, then 2 more at lr=5e-6)
Sequence length8,192
Batch size (effective)10

Training Schedule

Two-phase approach for optimal convergence:

PhaseEpochsLearning rateResult
Initial21e-5 (cosine)80.8% accuracy
Continued25e-6 (cosine)89.7% accuracy

Continuing at lower LR on a warm checkpoint improved accuracy by 9 percentage points.

Training Metrics

MetricAfter phase 1After phase 2 (final)
Loss27.513.6
Token accuracy80.8%89.7%
Grad norm15.315.3
Entropy0.690.34

Training Data (~12,680 samples)

All Claude Opus 4.6. No mixed-model data.

DatasetSamplesDescription
Crownelius/Opus-4.6-Reasoning-3300x2,160Cleaned Claude Opus 4.6 reasoning — math, code, diverse
TeichAI/Claude-Opus-4.6-Reasoning-887x887Tool-use reasoning + vague prompt handling
Roman1111111/claude-opus-4.6-10000x9,633Math/logic reasoning with verified solutions

Usage

from transformers import AutoProcessor, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "EganAI/gemma4-31b-opus-reasoning",
    torch_dtype="auto",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("EganAI/gemma4-31b-opus-reasoning")

messages = [
    {"role": "user", "content": "Prove that the square root of 2 is irrational."},
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs, max_new_tokens=2048, temperature=1.0, top_p=0.95, top_k=64
)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=False))

Hardware Requirements

FormatVRAMDevice
bf16~62GB1x A100/H100 80GB
Q8~31GB2x RTX 4090
Q4_K_M~17GBRTX 4090
Q3_K_M~14GBRTX 4080

Implementation Notes

  • Gemma 4 requires mm_token_type_ids even for text-only training — custom data collator injects zeros
  • SDPA attention only — flash attention is incompatible with Gemma's soft-capping
  • FSDP over DeepSpeed — simpler config for day-zero model support

Related Models

License

Apache 2.0 (same as Gemma 4)