Avg. Total Time
18.27s
Avg. TTFT
4.96s
Avg. Prefill TPS
1194.45
Avg. Gen TPS
35.73
Context Size
262144
Quantization
r64
Engine
vllm
Creation Method
LoRA Finetune
Model Type
Qwen35
Chat Template
Qwen3.5
Reasoning
Yes
Vision
Yes
Parameters
27B
Added At
5/2/2026
base_model: Qwen/Qwen3.5-27B tags:
An improved 3-way weight-space merge of Qwen3.5-27B reasoning-distilled fine-tunes using the Omnimerge v2 method — combining four recent advances in model merging.
GGUF quantizations available at ManniX-ITA/Qwen3.5-27B-Omnimerge-v2-GGUF
| Benchmark | Omnimerge v1 | Omnimerge v2 | Delta |
|---|---|---|---|
| GPQA Diamond (198q, flex) | 61.11% | 69.19% | +8.08 pp |
| MBPP pass@1 | 71.80% | 74.60% | +2.80 pp |
| HumanEval pass@1 | 79.88% | 79.27% | -0.61 pp |
| Benchmark | Claude-distill | Omnimerge v2 | Delta |
|---|---|---|---|
| GPQA Diamond (198q, flex) | 53.03% | 69.19% | +16.16 pp |
| MBPP pass@1 | 71.20% | 74.60% | +3.40 pp |
| HumanEval pass@1 | 76.22% | 79.27% | +3.05 pp |
Four enhancements over standard DARE-TIES (v1):
OBIM-lite magnitude masking (based on OBIM, arXiv 2502.12217): Deterministic top-k masking by |delta| magnitude instead of random Bernoulli drop. Keeps the most informative parameter changes.
DAREx rescaling (based on DAREx, arXiv 2410.09344, ICLR 2025): Survivors divided by configurable q instead of density. Lower variance than standard DARE rescaling.
EMR election (based on EMR-Merging, arXiv 2405.17461, NeurIPS 2024): Sign from weighted-sum consensus, amplitude from max abs across sources. Each parameter gets the strongest signal from whichever source specialized most.
The merge script also supports GPU-accelerated computation (chunks offloaded to CUDA for ~35x speedup over CPU-only).
Not yet implemented (available in the script for future iterations):
python dare_ties_merge.py \
--base Qwen/Qwen3.5-27B \
--source Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled \
--source ValiantLabs/Qwen3.5-27B-Esper3.1 \
--source Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill \
--method omnimerge_v2 --density 0.53 --weights 0.40,0.35,0.25 \
--darex-q 0.75 --seed 42
| Source | Weight | Focus |
|---|---|---|
| Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled | 0.40 | Claude 4.6 Opus reasoning distillation |
| ValiantLabs/Qwen3.5-27B-Esper3.1 | 0.35 | Code / DevOps specialist |
| Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill | 0.25 | Gemini 3.1 Pro reasoning distillation |
Base: Qwen/Qwen3.5-27B
llama-server -m Qwen3.5-27B-Omnimerge-v2-Q6_K.gguf -c 32768 -ngl 99 \
--reasoning-format deepseek --reasoning-budget 16384 \
--temp 0.6 --top-p 0.95 --top-k 20 --dry-multiplier 0.5
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"ManniX-ITA/Qwen3.5-27B-Omnimerge-v2",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tok = AutoTokenizer.from_pretrained("ManniX-ITA/Qwen3.5-27B-Omnimerge-v2")
| Model | Description |
|---|---|
| Qwen3.5-27B-Omnimerge | v1 (DARE-TIES baseline) |
| Qwen3.5-27B-Omnimerge-GGUF | v1 GGUF quants |
| Qwen3.5-27B-Omnimerge-v2-GGUF | v2 GGUF quants |
Apache-2.0