Qwen3.5-27B-BlueStar-v2-Derestricted

Creative model

View on Hugging FaceBack to Models

Hourly Usage

Performance Metrics

Avg. Total Time

90.05s

Avg. TTFT

18.53s

Avg. Prefill TPS

1245.88

Avg. Gen TPS

13.87

Model Information

Context Size

262144

Quantization

r64

Engine

vllm

Creation Method

LoRA

Model Type

Qwen35

Chat Template

Qwen3.5

Reasoning

Yes

Vision

Yes

Parameters

27B

Added At

4/4/2026


license: mit datasets:

  • zerofata/Instruct-Anime
  • zerofata/Gemini-3.1-Pro-SmallWiki
  • zerofata/Gemini-3.1-Pro-GLM5-Characters
  • zerofata/Roleplay-Anime-Characters base_model:
  • Qwen/Qwen3.5-27B

image

BlueStar v2

Qwen3.5 27B
01 Overview

Designed for RP and writing tasks.

Feels like a good improvement on v1. This version aims to fix the rep and improve the intelligence while keeping the creativity.

Non thinking and thinking are both supported. If you want to use thinking, it is required to prefill the <think>\n as that is how it was trained.

02 SillyTavern Settings
Recommended Roleplay Format
ActionsIn plaintext
Dialogue"In quotes"
Thoughts*In asterisks*
Recommended Samplers
Temp0.8 - 1.0
MinP0.05 - 0.075
03 Quantizations
GGUF
iMatrix
04 Creation Process

Creation Process: SFT

SFT on approx 27 million tokens.

I've confirmed the repetition coming from the RP datasets. Despite the extensive filtering, human editing, rewriting and deduping. Compared to other types of data like chat and writing, RP is just somewhat repetitive in nature. One idea to fix this is to just not use the RP datasets, or use less of them. This does seem to *sort of* work, but the model performs noticably worse at RP as a result. Which makes sense, given that's the entire idea of having RP data to begin with.

The current solution I'm testing is using custom loss masking with the RP datasets. Most common phrases of slop are masked out, so the model doesn't get rewarded for learning these patterns. Overused words within a conversation also get masked out in later turns.

It... seems to have worked? Repetition from my testing is greatly reduced after a few hours of using the model. It can still latch onto phrases, but I've seen much less verbatim repetition.

Trained using Axolotl.

Axolotl Config
SFT (4×H200)
base_model: Qwen/Qwen3.5-27B
 
plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false
 
datasets:
  - path: ./data/bluestar_v2_sft_3_all_rp_attempt_masked_20260318_075236.jsonl
 
val_set_size: 0.02
output_dir: ./Qwen3.5-27B-v2-SFT-5
 
sequence_len: 10756
sample_packing: true
 
load_in_8bit: true
adapter: lora
lora_r: 128
lora_alpha: 128
peft_use_rslora: true
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - down_proj
  - up_proj
  # Uncomment below to also target the linear attention projections.
  # These use separate in_proj_qkv / in_proj_z / out_proj (Qwen3.5-specific).
  - linear_attn.in_proj_qkv
  - linear_attn.in_proj_z
  - linear_attn.out_proj
 
wandb_project: Qwen3.5-27B-SFT
wandb_name: Qwen3.5-27B-v2-SFT-5
 
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_torch_8bit
lr_scheduler: cosine
learning_rate: 1.2e-5
weight_decay: 0.01
warmup_ratio: 0.05
 
bf16: auto
tf32: true
 
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
 
evals_per_epoch: 4
saves_per_epoch: 4
special_tokens:
 
fsdp_config:
  fsdp_version: 2
  offload_params: false
  cpu_ram_efficient_loading: false
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: Qwen3_5DecoderLayer
  state_dict_type: FULL_STATE_DICT
  sharding_strategy: FULL_SHARD
  reshard_after_forward: true
  activation_checkpointing: true