Llama-3.3+(3.1v3.3)-70B-Ginny

Creative Model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

60.44s

Avg. TTFT

21.94s

Avg. Prefill TPS

184.25

Avg. Gen TPS

15.46

Model Information

Context Size

32768

Quantization

r64

Engine

aphrodite

Creation Method

Merge

Model Type

Llama70B

Chat Template

Llama 3

Reasoning

No

Vision

No

Parameters

70B

Added At

1/18/2025


base_model:

  • NousResearch/Hermes-3-Llama-3.1-70B
  • Fizzarolli/L3.1-70b-glitz-v0.2
  • cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
  • Sao10K/L3-70B-Euryale-v2.1 tags:
  • merge
  • mergekit
  • lazymergekit
  • NousResearch/Hermes-3-Llama-3.1-70B
  • Fizzarolli/L3.1-70b-glitz-v0.2
  • cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
  • Sao10K/L3-70B-Euryale-v2.1

L3.1-70b-Ginny

Like with everything, I have to start somewhere right? As such this model is named Ginny.

image/png

L3.1-70b-Ginny is a merge of the following models using LazyMergekit:

Using Hermes as a base, I mixed in Glitz and Euryale which I both liked. I think I prefer Glitz more actually.

Additionally I decided to throw in cyberagent's Japanese Instruct in the hopes it will boost Japanese capabilities.

(Though on recommendations from others, I've steeled myself to never use Hermes as base ever again.)

Model Testing

  • The Japanese Instruct was a success in my books. The model appears to be more capable in Japanese and I think it can do translations given the right prompt.
  • Model can get quite rambly. Lower Temp helps.

Quants & Hosts

GGUF GGUF-i1 Featherless

Yap / Chat Format

Hermes likes ChatML, but the other 3 models use L3 Instruct.

...Use L3 Instruct.

🧩 Configuration


models:
  - model: NousResearch/Hermes-3-Llama-3.1-70B
    parameters:
      density: 0.33
      weight: 0.25
  - model: Fizzarolli/L3.1-70b-glitz-v0.2
    parameters:
      density: 0.7
      weight: 0.5
  - model: cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
    parameters:
      density: 0.5
      weight: 0.25
  - model: Sao10K/L3-70B-Euryale-v2.1
    parameters:
      density: 0.7
      weight: 0.5

merge_method: ties
base_model: NousResearch/Hermes-3-Llama-3.1-70B
parameters:
  normalize: true
dtype: bfloat16

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "KaraKaraWitch/L3.1-70b-Ginny"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])