Qwen3.5-27B-Omnimerge-v2-Derestricted-Lite

Creative model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

18.27s

Avg. TTFT

4.96s

Avg. Prefill TPS

1194.45

Avg. Gen TPS

35.73

Model Information

Context Size

262144

Quantization

r64

Engine

vllm

Creation Method

LoRA Finetune

Model Type

Qwen35

Chat Template

Qwen3.5

Reasoning

Yes

Vision

Yes

Parameters

27B

Added At

5/2/2026


base_model: Qwen/Qwen3.5-27B tags:

  • merge
  • omnimerge-v2
  • qwen3.5
  • reasoning
  • obim
  • darex
  • emr license: apache-2.0

Qwen3.5-27B-Omnimerge-v2

An improved 3-way weight-space merge of Qwen3.5-27B reasoning-distilled fine-tunes using the Omnimerge v2 method — combining four recent advances in model merging.

GGUF quantizations available at ManniX-ITA/Qwen3.5-27B-Omnimerge-v2-GGUF

Benchmark Results (Q6_K)

BenchmarkOmnimerge v1Omnimerge v2Delta
GPQA Diamond (198q, flex)61.11%69.19%+8.08 pp
MBPP pass@171.80%74.60%+2.80 pp
HumanEval pass@179.88%79.27%-0.61 pp

vs Best Source Model (Claude-distill)

BenchmarkClaude-distillOmnimerge v2Delta
GPQA Diamond (198q, flex)53.03%69.19%+16.16 pp
MBPP pass@171.20%74.60%+3.40 pp
HumanEval pass@176.22%79.27%+3.05 pp

Method: Omnimerge v2

Four enhancements over standard DARE-TIES (v1):

  1. OBIM-lite magnitude masking (based on OBIM, arXiv 2502.12217): Deterministic top-k masking by |delta| magnitude instead of random Bernoulli drop. Keeps the most informative parameter changes.

  2. DAREx rescaling (based on DAREx, arXiv 2410.09344, ICLR 2025): Survivors divided by configurable q instead of density. Lower variance than standard DARE rescaling.

  3. EMR election (based on EMR-Merging, arXiv 2405.17461, NeurIPS 2024): Sign from weighted-sum consensus, amplitude from max abs across sources. Each parameter gets the strongest signal from whichever source specialized most.

The merge script also supports GPU-accelerated computation (chunks offloaded to CUDA for ~35x speedup over CPU-only).

Not yet implemented (available in the script for future iterations):

  • Fisher weighting (based on Fisher-Merging, Matena & Raffel 2022): Per-parameter adaptive weighting using diagonal Fisher information. Requires a calibration pre-computation step per source model. Currently uses fixed source weights.

Merge Configuration

python dare_ties_merge.py \
    --base Qwen/Qwen3.5-27B \
    --source Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled \
    --source ValiantLabs/Qwen3.5-27B-Esper3.1 \
    --source Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill \
    --method omnimerge_v2 --density 0.53 --weights 0.40,0.35,0.25 \
    --darex-q 0.75 --seed 42

Source Models

SourceWeightFocus
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled0.40Claude 4.6 Opus reasoning distillation
ValiantLabs/Qwen3.5-27B-Esper3.10.35Code / DevOps specialist
Jackrong/Qwen3.5-27B-Gemini-3.1-Pro-Reasoning-Distill0.25Gemini 3.1 Pro reasoning distillation

Base: Qwen/Qwen3.5-27B

Usage

llama.cpp (recommended)

llama-server -m Qwen3.5-27B-Omnimerge-v2-Q6_K.gguf -c 32768 -ngl 99 \
    --reasoning-format deepseek --reasoning-budget 16384 \
    --temp 0.6 --top-p 0.95 --top-k 20 --dry-multiplier 0.5

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "ManniX-ITA/Qwen3.5-27B-Omnimerge-v2",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("ManniX-ITA/Qwen3.5-27B-Omnimerge-v2")

Related Models

ModelDescription
Qwen3.5-27B-Omnimergev1 (DARE-TIES baseline)
Qwen3.5-27B-Omnimerge-GGUFv1 GGUF quants
Qwen3.5-27B-Omnimerge-v2-GGUFv2 GGUF quants

License

Apache-2.0

Qwen3.5-27B-Omnimerge-v2-Derestricted-Lite - Arli AI - Arli AI