Qwen3.5-27B-earica-Derestricted

Creative model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

25.12s

Avg. TTFT

12.57s

Avg. Prefill TPS

1000.29

Avg. Gen TPS

28.72

Model Information

Context Size

262144

Quantization

r64

Engine

vllm

Creation Method

LoRA

Model Type

Qwen35

Chat Template

Qwen3.5

Reasoning

Yes

Vision

Yes

Parameters

27B

Added At

4/27/2026


library_name: transformers license: apache-2.0 base_model: Qwen/Qwen3.5-27B tags:

  • axolotl
  • generated_from_trainer datasets:
  • voidful/earica_text_train_v2 model-index:
  • name: Qwen3.5-27B-earica results: []

Built with Axolotl

See axolotl config

axolotl version: 0.16.0.dev0

base_model: Qwen/Qwen3.5-27B
low_cpu_mem_usage: true

plugins:
  - axolotl.integrations.drift.DriftPlugin
  - axolotl.integrations.liger.LigerPlugin

liger_rms_norm: true
liger_glu_activation: true

drift_trainer: true
drift_rho: 0.999
drift_beta: 0.5
drift_tau: 1.0
drift_lambda: 1.0

datasets:
  - path: voidful/earica_text_train_v2
    type: chat_template
    field_messages: conversations
    split: train
    split_thinking: true

dataset_prepared_path: ./prepared_data/drift_27b


chat_template: qwen3_5

sequence_len: 16384
sample_packing: true
pad_to_sequence_len: false

gradient_accumulation_steps: 2
micro_batch_size: 1
batch_flattening: false
group_by_length: true
num_epochs: 3
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 2e-5

bf16: true
gradient_checkpointing: true
flash_attention: true
dataloader_num_workers: 4
val_set_size: 0.05
save_strategy: epoch
output_dir: ./outputs/drift-27b

deepspeed: deepspeed_configs/zero2.json

hub_model_id: voidful/Qwen3.5-27B-earica
push_to_hub: true
hub_strategy: end

log_on_each_node: false
logging_steps: 1

Qwen3.5-27B-earica

This model is a fine-tuned version of Qwen/Qwen3.5-27B on the voidful/earica_text_train_v2 dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 160
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 320
  • total_eval_batch_size: 160
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 17
  • training_steps: 576

Training results

Framework versions

  • Transformers 5.3.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.2