Llama-3.3+(3.1v3.3)-70B-Brinebreath

Creative Model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

26.31s

Avg. TTFT

60.79s

Avg. Prefill TPS

189.90

Avg. Gen TPS

18.16

Model Information

Context Size

32768

Quantization

r64

Engine

aphrodite

Creation Method

Merge

Model Type

Llama70B

Chat Template

Llama 3

Reasoning

No

Vision

No

Parameters

70B

Added At

12/22/2024


language:


image/png Brinebreath-Llama-3.1-70B

I made this since I started having some problems with Cathallama. This seems to behave well during some days testing.

Notable Performance

  • 7% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
  • Strong performance in MMLU-PRO categories overall
  • Great performance during manual testing

Creation workflow

Models merged

  • meta-llama/Meta-Llama-3.1-70B-Instruct
  • NousResearch/Hermes-3-Llama-3.1-70B
  • abacusai/Dracarys-Llama-3.1-70B-Instruct
  • VAGOsolutions/Llama-3.1-SauerkrautLM-70b-Instruct
flowchart TD
    A[Hermes 3] -->|Merge with| B[Meta-Llama-3.1]
    C[Dracarys] -->|Merge with| D[Meta-Llama-3.1]
    B -->| | E[Merge]
    D -->| | E[Merge]
    G[SauerkrautLM] -->|Merge with| E[Merge]
    E[Merge] -->| | F[Brinebreath]

image/png

Testing

Hyperparameters

  • Temperature: 0.0 for automated, 0.9 for manual
  • Penalize repeat sequence: 1.05
  • Consider N tokens for penalize: 256
  • Penalize repetition of newlines
  • Top-K sampling: 40
  • Top-P sampling: 0.95
  • Min-P sampling: 0.05

LLaMAcpp Version

  • b3600-1-g2339a0be
  • -fa -ngl -1 -ctk f16 --no-mmap

Tested Files

  • Brinebreath-Llama-3.1-70B.Q4_0.gguf
  • Meta-Llama-3.1-70B-Instruct.Q4_0.gguf

Manual testing

CategoryTest CaseBrinebreath-Llama-3.1-70B.Q4_0.ggufMeta-Llama-3.1-70B-Instruct.Q4_0.gguf
Common SenseBall on cupOKOK
Big duck small horseOKOK
KillersOKOK
Strawberry r'sKOKO
9.11 or 9.9 biggerKOKO
Dragon or lensKOKO
ShirtsOKKO
SistersOKKO
Jane fasterOKOK
ProgrammingJSONOKOK
Python snake gameOKKO
MathDoor window combinationOKKO
SmokePoemOKOK
StoryOKOK

Note: See sample_generations.txt on the main folder of the repo for the raw generations.

MMLU-PRO

ModelSuccess %
Brinebreath-3.1-70B.Q4_0.gguf49.0%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf42.0%
MMLU-PRO categoryBrinebreath-3.1-70B.Q4_0.ggufMeta-Llama-3.1-70B-Instruct.Q4_0.gguf
Business45.0%40.0%
Law40.0%35.0%
Psychology85.0%80.0%
Biology80.0%75.0%
Chemistry50.0%45.0%
History65.0%60.0%
Other55.0%50.0%
Health70.0%65.0%
Economics80.0%75.0%
Math35.0%30.0%
Physics45.0%40.0%
Computer Science60.0%55.0%
Philosophy50.0%45.0%
Engineering45.0%40.0%

Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.

PubmedQA

Model NameSuccess%
Brinebreath-3.1-70B.Q4_0.gguf71.00%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf68.00%

Note: PubmedQA tested with 100 questions.

Request

If you are hiring in the EU or can sponsor a visa, PM me :D

PS. Thank you mradermacher for the GGUFs!

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

MetricValue
Avg.36.29
IFEval (0-Shot)55.33
BBH (3-Shot)55.46
MATH Lvl 5 (4-Shot)29.98
GPQA (0-shot)12.86
MuSR (0-shot)17.49
MMLU-PRO (5-shot)46.62