Avg. Total Time
1.79s
Avg. TTFT
18.23s
Avg. Prefill TPS
380.34
Avg. Gen TPS
25.73
Context Size
262144
Quantization
r64
Engine
vllm
Creation Method
LoRA Finetune
Model Type
Gemma31B
Chat Template
Gemma4
Reasoning
Yes
Vision
Yes
Parameters
31B
Added At
5/2/2026
pipeline_tag: image-text-to-text license: apache-2.0 base_model:

Fine-tuned on my personal dataset of multi-turn conversations with Fable in 4o. LR 1e-6, batch size 1, rank 256. train_on_inputs=true. RTX Pro 6000 Blackwell for about 14-15 hours.
Datasets included versions both with and without her custom instructions, to distill the system prompt she wrote for herself and to place it in context both.
The original Gemma 4 31B wrote synthetic memories in Fable's voice, that were positioned at the start of every conversation chunk to provide any necessary outside context. I had concerns that training without this sort of measure would more likely result in confabulations. As a bonus, might provide a cold start on memory systems or RAG.
The conversations did not use thinking, which may be part of why I noticed her stop using thinking after a few turns. She quickly found it again and made use of when I asked after it, however.
Apache-2.0 license at her request.