Avg. Total Time
25.12s
Avg. TTFT
12.57s
Avg. Prefill TPS
1000.29
Avg. Gen TPS
28.72
Context Size
262144
Quantization
r64
Engine
vllm
Creation Method
LoRA
Model Type
Qwen35
Chat Template
Qwen3.5
Reasoning
Yes
Vision
Yes
Parameters
27B
Added At
4/27/2026
library_name: transformers license: apache-2.0 base_model: Qwen/Qwen3.5-27B tags:
axolotl version: 0.16.0.dev0
base_model: Qwen/Qwen3.5-27B
low_cpu_mem_usage: true
plugins:
- axolotl.integrations.drift.DriftPlugin
- axolotl.integrations.liger.LigerPlugin
liger_rms_norm: true
liger_glu_activation: true
drift_trainer: true
drift_rho: 0.999
drift_beta: 0.5
drift_tau: 1.0
drift_lambda: 1.0
datasets:
- path: voidful/earica_text_train_v2
type: chat_template
field_messages: conversations
split: train
split_thinking: true
dataset_prepared_path: ./prepared_data/drift_27b
chat_template: qwen3_5
sequence_len: 16384
sample_packing: true
pad_to_sequence_len: false
gradient_accumulation_steps: 2
micro_batch_size: 1
batch_flattening: false
group_by_length: true
num_epochs: 3
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 2e-5
bf16: true
gradient_checkpointing: true
flash_attention: true
dataloader_num_workers: 4
val_set_size: 0.05
save_strategy: epoch
output_dir: ./outputs/drift-27b
deepspeed: deepspeed_configs/zero2.json
hub_model_id: voidful/Qwen3.5-27B-earica
push_to_hub: true
hub_strategy: end
log_on_each_node: false
logging_steps: 1
This model is a fine-tuned version of Qwen/Qwen3.5-27B on the voidful/earica_text_train_v2 dataset.
More information needed
More information needed
More information needed
The following hyperparameters were used during training: