license: mit datasets:

zerofata/Instruct-Anime
zerofata/Gemini-3.1-Pro-SmallWiki
zerofata/Gemini-3.1-Pro-GLM5-Characters
zerofata/Roleplay-Anime-Characters
allura-org/gryphe-sonnet-3.5-charcards-names-added
allura-forge/doubao-seed2.0-distill-multiturn-expr-rp base_model:
Qwen/Qwen3.5-27B

BlueStar v3

Qwen3.5 27B

01 Overview

Designed for RP and writing tasks.

Dunno if it's better than v2 but I like it. Main difference is just the addition of some RP reasoning data from GLM5 & K2.5.

Non thinking and thinking are both supported. If you want to use thinking, it is required to prefill the <think>\n as that is how it was trained.

02 SillyTavern Settings

Recommended Roleplay Format

ActionsIn plaintext

Dialogue"In quotes"

Thoughts*In asterisks*

Recommended Samplers

Temp0.8 - 1.0

MinP0.05 - 0.075

Instruct

ChatML - Think

ChatML - NoThink

03 Quantizations

GGUF

iMatrix

04 Creation Process

Creation Process: SFT

SFT on approx 56 million tokens.

Same as v2 for the most part with one big difference. Chub dataset was replaced with another version that has reasoning that was trained on the last turn only. This explodes the dataset out to 56 million tokens, but means the multi-turn reasoning gets trained correctly.

Also added a subset of 200 Gryphe RP samples that were shown as having a high lexical difference from my current dataset.

Trained using Axolotl.

Axolotl Config

SFT (4×H200)

base_model: Qwen/Qwen3.5-27B
 
plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false
 
datasets:
  - path: ./data/bluestar_v4_sft_2_masked_20260402_120553.jsonl
 
val_set_size: 0.03
output_dir: ./Qwen3.5-27B-v3-SFT-2
 
sequence_len: 10756
sample_packing: true
 
load_in_8bit: true
adapter: lora
lora_r: 128
lora_alpha: 128
peft_use_rslora: true
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - down_proj
  - up_proj
  # Uncomment below to also target the linear attention projections.
  # These use separate in_proj_qkv / in_proj_z / out_proj (Qwen3.5-specific).
  - linear_attn.in_proj_qkv
  - linear_attn.in_proj_z
  - linear_attn.out_proj
 
wandb_project: Qwen3.5-27B-SFT
wandb_name: Qwen3.5-27B-v3-SFT-2
 
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_torch_8bit
lr_scheduler: cosine
learning_rate: 1.2e-5
weight_decay: 0.01
warmup_ratio: 0.05
 
bf16: auto
tf32: true
 
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
 
evals_per_epoch: 4
saves_per_epoch: 4
special_tokens:
 
fsdp_config:
  fsdp_version: 2
  offload_params: false
  cpu_ram_efficient_loading: false
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: Qwen3_5DecoderLayer
  state_dict_type: FULL_STATE_DICT
  sharding_strategy: FULL_SHARD
  reshard_after_forward: true
  activation_checkpointing: true

Qwen3.5-27B-BlueStar-v3-Derestricted-Lite

Hourly Usage

Performance Metrics

Model Information

BlueStar v3