Avg. Total Time
60.44s
Avg. TTFT
21.94s
Avg. Prefill TPS
184.25
Avg. Gen TPS
15.46
Context Size
32768
Quantization
r64
Engine
aphrodite
Creation Method
Merge
Model Type
Llama70B
Chat Template
Llama 3
Reasoning
No
Vision
No
Parameters
70B
Added At
1/18/2025
base_model:
Like with everything, I have to start somewhere right? As such this model is named Ginny.

L3.1-70b-Ginny is a merge of the following models using LazyMergekit:
Using Hermes as a base, I mixed in Glitz and Euryale which I both liked. I think I prefer Glitz more actually.
Additionally I decided to throw in cyberagent's Japanese Instruct in the hopes it will boost Japanese capabilities.
(Though on recommendations from others, I've steeled myself to never use Hermes as base ever again.)
Hermes likes ChatML, but the other 3 models use L3 Instruct.
...Use L3 Instruct.
models:
- model: NousResearch/Hermes-3-Llama-3.1-70B
parameters:
density: 0.33
weight: 0.25
- model: Fizzarolli/L3.1-70b-glitz-v0.2
parameters:
density: 0.7
weight: 0.5
- model: cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
parameters:
density: 0.5
weight: 0.25
- model: Sao10K/L3-70B-Euryale-v2.1
parameters:
density: 0.7
weight: 0.5
merge_method: ties
base_model: NousResearch/Hermes-3-Llama-3.1-70B
parameters:
normalize: true
dtype: bfloat16
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "KaraKaraWitch/L3.1-70b-Ginny"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])