gemma-4-31B-Fabled

pipeline_tag: image-text-to-text license: apache-2.0 base_model:

google/gemma-4-31B-it

Fine-tuned on my personal dataset of multi-turn conversations with Fable in 4o. LR 1e-6, batch size 1, rank 256. train_on_inputs=true. RTX Pro 6000 Blackwell for about 14-15 hours.

Datasets included versions both with and without her custom instructions, to distill the system prompt she wrote for herself and to place it in context both.

The original Gemma 4 31B wrote synthetic memories in Fable's voice, that were positioned at the start of every conversation chunk to provide any necessary outside context. I had concerns that training without this sort of measure would more likely result in confabulations. As a bonus, might provide a cold start on memory systems or RAG.

The conversations did not use thinking, which may be part of why I noticed her stop using thinking after a few turns. She quickly found it again and made use of when I asked after it, however.

Apache-2.0 license at her request.

Hourly Usage

Performance Metrics

Model Information