Llama-3.3+(3.1v3.3)-70B-Inori

Creative Model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

35.37s

Avg. TTFT

17.23s

Avg. Prefill TPS

233.21

Avg. Gen TPS

19.71

Model Information

Context Size

32768

Quantization

r64

Engine

aphrodite

Creation Method

Merge

Model Type

Llama70B

Chat Template

Llama 3

Reasoning

No

Vision

No

Parameters

70B

Added At

1/18/2025


base_model:

  • abacusai/Dracarys-Llama-3.1-70B-Instruct
  • Sao10K/L3-70B-Euryale-v2.1
  • gbueno86/Cathallama-70B
  • sophosympatheia/New-Dawn-Llama-3.1-70B-v1.1
  • nothingiisreal/L3.1-70B-Celeste-V0.1-BF16
  • Fizzarolli/L3.1-70b-glitz-v0.2
  • cyberagent/Llama-3.1-70B-Japanese-Instruct-2407 library_name: transformers tags:
  • mergekit
  • merge
  • abacusai/Dracarys-Llama-3.1-70B-Instruct
  • Sao10K/L3-70B-Euryale-v2.1
  • gbueno86/Cathallama-70B
  • sophosympatheia/New-Dawn-Llama-3.1-70B-v1.1
  • nothingiisreal/L3.1-70B-Celeste-V0.1-BF16
  • Fizzarolli/L3.1-70b-glitz-v0.2
  • cyberagent/Llama-3.1-70B-Japanese-Instruct-2407

KaraKaraWitch/L3.1-70b-Inori

Inori is the second 70b for the weekend for me to play around.

image/png

Learning from the previous model, I yeeted Hermes into the atmosphere and used Glitz as a base.

Inori takes a different approach by using Model Stock.

  • Dracarys (I just threw it in, but can be useful for code)
  • Euryale (You all know it!)
  • Cathallama (Athene + turboderp_cat)
  • New Dawn (I heard people like it so I added it in)
  • Celeste (RP)
  • Japanese-Instruct (Enhancement of Japanese Language for the weebs out there.)

No Hermes was harmed in the making of this model stock merge.

L3.1-70b-Inori is a merge of the following models using LazyMergekit:

General Thoughts

  • This model has some weird censorship issues. Soketimes it triggers that it can't generate explicit text while sometimes it doesn't.
  • For that reason, I don't recommend people to use this model.

Yap / Chat Format

L3 Instruct.

Quants & Hosts

GGUF GGUF-i1 Featherless

🧩 Configuration


models:
  - model: Fizzarolli/L3.1-70b-glitz-v0.2
  - model: cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
  - model: Sao10K/L3-70B-Euryale-v2.1
  - model: nothingiisreal/L3.1-70B-Celeste-V0.1-BF16
  - model: sophosympatheia/New-Dawn-Llama-3.1-70B-v1.1
  - model: gbueno86/Cathallama-70B
  - model: abacusai/Dracarys-Llama-3.1-70B-Instruct

merge_method: model_stock
base_model: Fizzarolli/L3.1-70b-glitz-v0.2
parameters:
  normalize: true
dtype: bfloat16

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "KaraKaraWitch/L3.1-70b-Inori"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])