Qwen3.5-27B-Omnimerge-v2-Derestricted

Creative model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

N/A

Avg. TTFT

N/A

Avg. Prefill TPS

N/A

Avg. Gen TPS

N/A

Model Information

Context Size

262144

Quantization

r64

Engine

vllm

Creation Method

LoRA Finetune

Model Type

Qwen35

Chat Template

Qwen3.5

Reasoning

Yes

Vision

Yes

Parameters

27B

Added At

5/2/2026


base_model: Qwen/Qwen3.5-27B tags:

  • merge
  • omnimerge-v2
  • qwen3.5
  • reasoning
  • obim
  • darex
  • emr license: apache-2.0

Qwen3.5-27B-Omnimerge-v2

An improved 3-way weight-space merge of Qwen3.5-27B reasoning-distilled fine-tunes using the Omnimerge v2 method — combining four recent advances in model merging.

GGUF quantizations available at ManniX-ITA/Qwen3.5-27B-Omnimerge-v2-GGUF

Benchmark Results (Q6_K)

BenchmarkOmnimerge v1Omnimerge v2Delta
GPQA Diamond (198q, flex)61.11%69.19%+8.08 pp
MBPP pass@171.80%74.60%+2.80 pp
HumanEval pass@179.88%79.27%-0.61 pp

vs Best Source Model (Claude-distill)

BenchmarkClaude-distillOmnimerge v2Delta
GPQA Diamond (198q, flex)53.03%69.19%+16.16 pp
MBPP pass@171.20%74.60%+3.40 pp
HumanEval pass@176.22%79.27%+3.05 pp

Method: Omnimerge v2

Four enhancements over standard DARE-TIES (v1):

  1. OBIM-lite magnitude masking (based on OBIM, arXiv 2502.12217): Deterministic top-k masking by |delta| magnitude instead of random Bernoulli drop. Keeps the most informative parameter changes.

  2. DAREx rescaling (based on DAREx, arXiv 2410.09344, ICLR 2025): Survivors divided by configurable q instead of density. Lower variance than standard DARE rescaling.

  3. EMR election (based on EMR-Merging, arXiv 2405.17461, NeurIPS 2024): Sign from weighted-sum consensus, amplitude from max abs across sources. Each parameter gets the strongest signal from whichever source specialized most.

The merge script also supports GPU-accelerated computation (chunks offloaded to CUDA for ~35x speedup over CPU-only).

Not yet implemented (available in the script for future iterations):

  • Fisher weighting (based on Fisher-Merging, Matena & Raffel 2022): Per-parameter adaptive weighting using diagonal Fisher information. Requires a calibration pre-computation step per source model. Currently uses fixed source weights.

Merge Configuration

python dare_ties_merge.py \
    --base Qwen/Qwen3.5-27B \
    --source Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled \
    --source ValiantLabs/Qwen3.5-27B-Esper3.1 \
    --source Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill \
    --method omnimerge_v2 --density 0.53 --weights 0.40,0.35,0.25 \
    --darex-q 0.75 --seed 42

Source Models

SourceWeightFocus
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled0.40Claude 4.6 Opus reasoning distillation
ValiantLabs/Qwen3.5-27B-Esper3.10.35Code / DevOps specialist
Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill0.25Gemini 3.1 Pro reasoning distillation

Base: Qwen/Qwen3.5-27B

Usage

llama.cpp (recommended)

llama-server -m Qwen3.5-27B-Omnimerge-v2-Q6_K.gguf -c 32768 -ngl 99 \
    --reasoning-format deepseek --reasoning-budget 16384 \
    --temp 0.6 --top-p 0.95 --top-k 20 --dry-multiplier 0.5

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "ManniX-ITA/Qwen3.5-27B-Omnimerge-v2",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("ManniX-ITA/Qwen3.5-27B-Omnimerge-v2")

Related Models

ModelDescription
Qwen3.5-27B-Omnimergev1 (DARE-TIES baseline)
Qwen3.5-27B-Omnimerge-GGUFv1 GGUF quants
Qwen3.5-27B-Omnimerge-v2-GGUFv2 GGUF quants

License

Apache-2.0