Avg. Total Time
N/A
Avg. TTFT
N/A
Avg. Prefill TPS
N/A
Avg. Gen TPS
N/A
Context Size
262144
Quantization
r64
Engine
vllm
Creation Method
LoRA Finetune
Model Type
Qwen35
Chat Template
Qwen3.5
Reasoning
Yes
Vision
Yes
Parameters
27B
Added At
5/2/2026
base_model: Qwen/Qwen3.5-27B tags:
An improved 3-way weight-space merge of Qwen3.5-27B reasoning-distilled fine-tunes using the Omnimerge v2 method — combining four recent advances in model merging.
GGUF quantizations available at ManniX-ITA/Qwen3.5-27B-Omnimerge-v2-GGUF
| Benchmark | Omnimerge v1 | Omnimerge v2 | Delta |
|---|---|---|---|
| GPQA Diamond (198q, flex) | 61.11% | 69.19% | +8.08 pp |
| MBPP pass@1 | 71.80% | 74.60% | +2.80 pp |
| HumanEval pass@1 | 79.88% | 79.27% | -0.61 pp |
| Benchmark | Claude-distill | Omnimerge v2 | Delta |
|---|---|---|---|
| GPQA Diamond (198q, flex) | 53.03% | 69.19% | +16.16 pp |
| MBPP pass@1 | 71.20% | 74.60% | +3.40 pp |
| HumanEval pass@1 | 76.22% | 79.27% | +3.05 pp |
Four enhancements over standard DARE-TIES (v1):
OBIM-lite magnitude masking (based on OBIM, arXiv 2502.12217): Deterministic top-k masking by |delta| magnitude instead of random Bernoulli drop. Keeps the most informative parameter changes.
DAREx rescaling (based on DAREx, arXiv 2410.09344, ICLR 2025): Survivors divided by configurable q instead of density. Lower variance than standard DARE rescaling.
EMR election (based on EMR-Merging, arXiv 2405.17461, NeurIPS 2024): Sign from weighted-sum consensus, amplitude from max abs across sources. Each parameter gets the strongest signal from whichever source specialized most.
The merge script also supports GPU-accelerated computation (chunks offloaded to CUDA for ~35x speedup over CPU-only).
Not yet implemented (available in the script for future iterations):
python dare_ties_merge.py \
--base Qwen/Qwen3.5-27B \
--source Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled \
--source ValiantLabs/Qwen3.5-27B-Esper3.1 \
--source Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill \
--method omnimerge_v2 --density 0.53 --weights 0.40,0.35,0.25 \
--darex-q 0.75 --seed 42
| Source | Weight | Focus |
|---|---|---|
| Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled | 0.40 | Claude 4.6 Opus reasoning distillation |
| ValiantLabs/Qwen3.5-27B-Esper3.1 | 0.35 | Code / DevOps specialist |
| Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill | 0.25 | Gemini 3.1 Pro reasoning distillation |
Base: Qwen/Qwen3.5-27B
llama-server -m Qwen3.5-27B-Omnimerge-v2-Q6_K.gguf -c 32768 -ngl 99 \
--reasoning-format deepseek --reasoning-budget 16384 \
--temp 0.6 --top-p 0.95 --top-k 20 --dry-multiplier 0.5
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"ManniX-ITA/Qwen3.5-27B-Omnimerge-v2",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tok = AutoTokenizer.from_pretrained("ManniX-ITA/Qwen3.5-27B-Omnimerge-v2")
| Model | Description |
|---|---|
| Qwen3.5-27B-Omnimerge | v1 (DARE-TIES baseline) |
| Qwen3.5-27B-Omnimerge-GGUF | v1 GGUF quants |
| Qwen3.5-27B-Omnimerge-v2-GGUF | v2 GGUF quants |
Apache-2.0