Avg. Total Time
16.63s
Avg. TTFT
22.37s
Avg. Prefill TPS
1.74
Avg. Gen TPS
18.59
Context Size
32768
Quantization
r64
Engine
aphrodite
Creation Method
Merge
Model Type
Llama70B
Chat Template
Llama 3
Reasoning
No
Vision
No
Parameters
70B
Added At
12/22/2024
license: cc-by-nc-4.0 tags:

Is my first experiment in reverse-distillation of model capabilities of a smaller model onto a larger model.
mistralai/Ministral-8B-Instruct-2410 has some very novel RP behaviors that make it a very interesting choice as an RP model but at the end of the day it's still just an 8B model. So this model is an early attempt at instilling its positive qualities onto a larger and more capable model.
This model began as Llama-3.05-Nemotron-Tenyxchat-Storybreaker-70B
I created a custom single-turn RP dataset for the model.
I started out with the infamous 'leaked undislop' dataset.
I used a script to format the conversations into single-turn SillyTavern style roleplaying prompts.
I used another script to run those prompts through Ministral.
Finally using pattern matching I removed a lot of the formatting from the original prompts in order to aid with generalization.
Using qlora-pipe I ran a qlora on Nemotron-Tenyxhcat-Storybreaker with the following notable parameters:
The atypically high dropout rate was chosen after some unreleased experimentation inspired by the Arxiv paper: Fine-tuning with Very Large Dropout (Jianyu Zhang, Léon Bottou)
Which prescribes the use of a very high dropout rate (0.9 in their case) as a method of improving out-of-distribution performance. Further discussion on various internet spaces regarding high dropout training lead to a recommendation of 0.6 as the ideal dropout rate for optimal fitting during finetuning.
The LoRA adapter was then merged with the original model and then the adapted model was SLERP merged back onto the original model at a 40/60 rate in order to blend the new behavior with the old.
The resulting model can be very 'sloppy' at higher temperatures due to the mating of the different 'slop' between Llama-3 and Ministral.
The following comparison on a single turn SillyTavern roleplay test is presented for subjective judgment as a result.
The comparison utilizes deterministic sampling to better illustrate the model differences.

I tried to construct the prompt template between Ministral and the Llama-3 (for both versions of Storybreaker) to be as close as possible but an exact match is not possible due to differences in structures between Mistral prompt formatting and Llama-3 prompt formatting.
While the results are entirely based on subjective preference I find the flow of action within the Ministral-Infused model to be less of a short loop like in the original model and more like a continuous advancing flow of actions as in the Ministral model.
Surprisingly I feel the Ministral infused model also improves in both characterization and following the flow of the original scenario. It's much less baited into NSFW output by the jailbreak that is built into the prompt template.
Overall the model can be rather stingy with EOT tokens when used at higher temperatures and rather rigid at lower temperatures. Lowering the temperature definitely improves the 'slop' overall.
What I was able to do with the training was greatly limited by the VRAM limitations of my home setup. I feel the results could probably be improved with both a higher LoRA rank and higher sequence length.
The original dataset had over 8000 entires but about 25% of those had to be dropped during preprocessing on account of not fitting within the alotted sequence length.
If I had unlimited VRAM I would probably choose to do a full finetune on a much broader variety of context lengths as the dataset made for this experiment primarily simulated the model's response to the first human message in the conversation.
More training epochs could also potentially improve results. 2 epochs was chosen because it was what I could finish in a single day.
So far I've found it to be a fun model to role play with and definitely worth sharing but I can't gaurantee satisfactory results outside of the scope of training.