Avg. Total Time
23.06s
Avg. TTFT
18.51s
Avg. Prefill TPS
428.57
Avg. Gen TPS
23.28
Context Size
262144
Quantization
r64
Engine
vllm
Creation Method
LoRA
Model Type
Qwen35
Chat Template
Qwen3.5
Reasoning
Yes
Vision
Yes
Parameters
27B
Added At
4/27/2026
library_name: peft model_name: qwen35-27b-derestricted-full-marvin-seed tags:
This model was fine-tuned using SFT.
| Parameter | Value |
|---|---|
| Learning rate | 1e-05 |
| LR scheduler | SchedulerType.CONSTANT_WITH_WARMUP |
| Per-device batch size | 1 |
| Gradient accumulation | 4 |
| Effective batch size | 4 |
| Epochs | 1 |
| Max sequence length | 6144 |
| Optimizer | OptimizerNames.PAGED_ADAMW_8BIT |
| Weight decay | 0.01 |
| Warmup ratio | 0.03 |
| Max gradient norm | 1.0 |
| Precision | bf16 |
| Loss type | nll |
| Chunked cross-entropy | yes |
| Parameter | Value |
|---|---|
| Rank (r) | 32 |
| Alpha | 16 |
| Target modules | down_proj, gate_proj, in_proj_a, in_proj_b, in_proj_qkv, in_proj_z, k_proj, o_proj, out_proj, q_proj, up_proj, v_proj |
| rsLoRA | yes |
| Quantization | 4-bit (nf4) |
| Dataset | Samples | Total tokens | Trainable tokens |
|---|---|---|---|
| json//home/aibox/data/marvin-style-bible/train_full_marvin_seed_mix.jsonl | 5,974 | 25,473,483 | 25,473,483 |
model_name_or_path: Qwen3.5-27B-Derestricted
output_dir: runs/qwen35-27b-derestricted-full-marvin-seed
attn_implementation: flash_attention_2
bf16: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
use_cce: true
model_parallel: true
max_memory:
0: 18GiB
1: 18GiB
chunked_mlp: true
chunked_mlp_chunks: 8
max_length: 6144
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
use_peft: true
load_in_4bit: true
bnb_4bit_quant_type: nf4
lora_r: 32
lora_alpha: 16
lora_dropout: 0.0
use_rslora: true
lora_target_modules:
- in_proj_qkv
- in_proj_z
- in_proj_a
- in_proj_b
- out_proj
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
data_config: configs/qwen35-27b-derestricted-full-marvin-seed/data.yaml
prepared_dataset: runs/qwen35-27b-derestricted-full-marvin-seed/prepared
learning_rate: 1.0e-05
lr_scheduler_type: constant_with_warmup
warmup_ratio: 0.03
weight_decay: 0.01
max_grad_norm: 1.0
optim: paged_adamw_8bit
num_train_epochs: 1
logging_steps: 1
disable_tqdm: false
save_strategy: epoch
save_total_limit: 1
report_to: none
run_name: qwen35-27b-derestricted-full-marvin-seed
datasets:
- path: json
data_files: /home/aibox/data/marvin-style-bible/train_full_marvin_seed_mix.jsonl
split: train