Avg. Total Time
49.67s
Avg. TTFT
17.46s
Avg. Prefill TPS
350.65
Avg. Gen TPS
14.54
Context Size
262144
Quantization
r64
Engine
vllm
Creation Method
LoRA Finetune
Model Type
Gemma31B
Chat Template
Gemma4
Reasoning
Yes
Vision
Yes
Parameters
31B
Added At
5/2/2026
license: apache-2.0 datasets:
A finetune of Gemma 4 31B designed for creative tasks.
Another difficult to work with but extremely good model from Google.
This model has a slightly better swipe diversity and a less flowery / verbose writing style. Reasoning tends to average out being a bit longer than the original however. Intelligence appears to be on par with the original.
Supports both thinking and non thinking.
Creation Process: SFT > Merge
SFT on approx 49 million tokens.
Despite using 49 million tokens, this dataset is fairly modest in size. Trainable is somewhere in the rough ballpark of 10-15 million. All of the datasets were trained on the last turn only, to faithfully mirror the Gemma 4 chat template
The approach was very similar to the 26B A4B MeroMero. I trained the model aggressively for 2 epochs on my data and after testing various checkpoints, settled for the one at 1 epoch, which had the style and the least signs of overfitting.
I merged this checkpoint back into the original instruct which cleaned up any remaining overfitting while still retaining the changes of the finetune.
Trained using Axolotl.
models:
- model: google/gemma-4-31B-it
- model: ApocalypseParty/G4-31B-SFT-v3-1-1ep
merge_method: slerp
parameters:
t: 0.5
base_model: google/gemma-4-31B-it
dtype: bfloat16
base_model: google/gemma-4-31B-it
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
- axolotl.integrations.liger.LigerPlugin
liger_layer_norm: true
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_rms_norm_gated: true
strict: false
cut_cross_entropy: true
datasets:
- path: zerofata/pretok
val_set_size: 0.02
output_dir: ./G4-31B-SFT-v3-1
sequence_len: 10756
pad_to_sequence_len: true
sample_packing: true
load_in_4bit: false
adapter: lora
lora_r: 64
lora_alpha: 64
peft_use_rslora: true
lora_dropout: 0.0
freeze_mm_modules: true
lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'
wandb_project: G4-31B-SFT
wandb_name: G4-31B-SFT-v3-1
gradient_accumulation_steps: 1
micro_batch_size: 4
num_epochs: 2
optimizer: adamw_torch_fused
lr_scheduler: constant_with_warmup
learning_rate: 1e-5
max_grad_norm: 1.0
bf16: auto
tf32: true
logging_steps: 1
# FA2 not supported
sdp_attention: true
#flex_attention: true
#torch_compile: true
flash_attention: false
warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 2
weight_decay: 0.05
special_tokens:
fsdp_config:
fsdp_version: 2
offload_params: false
cpu_ram_efficient_loading: false
auto_wrap_policy: TRANSFORMER_BASED_WRAP
transformer_layer_cls_to_wrap: Gemma4TextDecoderLayer
state_dict_type: FULL_STATE_DICT
sharding_strategy: FULL_SHARD
reshard_after_forward: true
activation_checkpointing: true