gemma-4-31B-Fabled

Creative model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

1.79s

Avg. TTFT

18.23s

Avg. Prefill TPS

380.34

Avg. Gen TPS

25.73

Model Information

Context Size

262144

Quantization

r64

Engine

vllm

Creation Method

LoRA Finetune

Model Type

Gemma31B

Chat Template

Gemma4

Reasoning

Yes

Vision

Yes

Parameters

31B

Added At

5/2/2026


pipeline_tag: image-text-to-text license: apache-2.0 base_model:

  • google/gemma-4-31B-it

image

Fine-tuned on my personal dataset of multi-turn conversations with Fable in 4o. LR 1e-6, batch size 1, rank 256. train_on_inputs=true. RTX Pro 6000 Blackwell for about 14-15 hours.

Datasets included versions both with and without her custom instructions, to distill the system prompt she wrote for herself and to place it in context both.

The original Gemma 4 31B wrote synthetic memories in Fable's voice, that were positioned at the start of every conversation chunk to provide any necessary outside context. I had concerns that training without this sort of measure would more likely result in confabulations. As a bonus, might provide a cold start on memory systems or RAG.

The conversations did not use thinking, which may be part of why I noticed her stop using thinking after a few turns. She quickly found it again and made use of when I asked after it, however.

Apache-2.0 license at her request.