Qwen3.5-27B-Marvin-V2-Derestricted

Creative model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

23.06s

Avg. TTFT

18.51s

Avg. Prefill TPS

428.57

Avg. Gen TPS

23.28

Model Information

Context Size

262144

Quantization

r64

Engine

vllm

Creation Method

LoRA

Model Type

Qwen35

Chat Template

Qwen3.5

Reasoning

Yes

Vision

Yes

Parameters

27B

Added At

4/27/2026


library_name: peft model_name: qwen35-27b-derestricted-full-marvin-seed tags:

  • base_model:adapter:ArliAI/Qwen3.5-27B-Derestricted
  • lora
  • sft
  • transformers
  • trl licence: license base_model: ArliAI/Qwen3.5-27B-Derestricted pipeline_tag: text-generation

qwen35-27b-derestricted-full-marvin-seed

This model was fine-tuned using SFT.

Training procedure

Hyperparameters

ParameterValue
Learning rate1e-05
LR schedulerSchedulerType.CONSTANT_WITH_WARMUP
Per-device batch size1
Gradient accumulation4
Effective batch size4
Epochs1
Max sequence length6144
OptimizerOptimizerNames.PAGED_ADAMW_8BIT
Weight decay0.01
Warmup ratio0.03
Max gradient norm1.0
Precisionbf16
Loss typenll
Chunked cross-entropyyes

LoRA configuration

ParameterValue
Rank (r)32
Alpha16
Target modulesdown_proj, gate_proj, in_proj_a, in_proj_b, in_proj_qkv, in_proj_z, k_proj, o_proj, out_proj, q_proj, up_proj, v_proj
rsLoRAyes
Quantization4-bit (nf4)

Dataset statistics

DatasetSamplesTotal tokensTrainable tokens
json//home/aibox/data/marvin-style-bible/train_full_marvin_seed_mix.jsonl5,97425,473,48325,473,483
Training config
model_name_or_path: Qwen3.5-27B-Derestricted
output_dir: runs/qwen35-27b-derestricted-full-marvin-seed
attn_implementation: flash_attention_2
bf16: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
use_cce: true
model_parallel: true
max_memory:
  0: 18GiB
  1: 18GiB
chunked_mlp: true
chunked_mlp_chunks: 8
max_length: 6144
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
use_peft: true
load_in_4bit: true
bnb_4bit_quant_type: nf4
lora_r: 32
lora_alpha: 16
lora_dropout: 0.0
use_rslora: true
lora_target_modules:
- in_proj_qkv
- in_proj_z
- in_proj_a
- in_proj_b
- out_proj
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
data_config: configs/qwen35-27b-derestricted-full-marvin-seed/data.yaml
prepared_dataset: runs/qwen35-27b-derestricted-full-marvin-seed/prepared
learning_rate: 1.0e-05
lr_scheduler_type: constant_with_warmup
warmup_ratio: 0.03
weight_decay: 0.01
max_grad_norm: 1.0
optim: paged_adamw_8bit
num_train_epochs: 1
logging_steps: 1
disable_tqdm: false
save_strategy: epoch
save_total_limit: 1
report_to: none
run_name: qwen35-27b-derestricted-full-marvin-seed
Data config
datasets:
- path: json
  data_files: /home/aibox/data/marvin-style-bible/train_full_marvin_seed_mix.jsonl
  split: train

Framework versions

  • PEFT 0.18.1
  • Loft: 0.1.0
  • Transformers: 5.2.0
  • Pytorch: 2.6.0+cu124
  • Datasets: 4.6.1
  • Tokenizers: 0.22.2