Avg. Total Time
12.64s
Avg. TTFT
8.40s
Avg. Prefill TPS
338.91
Avg. Gen TPS
13.21
Context Size
32768
Quantization
r64
Engine
aphrodite
Creation Method
Unknown
Model Type
Llama70B
Chat Template
Llama 3
Reasoning
No
Vision
No
Parameters
70B
Added At
12/22/2024

I applied the last step of my continuous finetuning method to the Nemotron-70b model from Nvidia. More details bellow:
Quants: (Coming Soon)
Open-LLM-Leaderboard scores: (Coming soon)