maxtext.models.qwen3_5 module#

Qwen3.5 family of model decoder layers.

class maxtext.models.qwen3_5.Qwen3_5GatedDeltaNet(*args, **kwargs)[source]#

Bases: Qwen3NextGatedDeltaNet

Qwen3.5 GatedDeltaNet layer that is identical to Qwen3-Next GatedDeltaNet

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

class maxtext.models.qwen3_5.Qwen3_5FullAttention(*args, **kwargs)[source]#

Bases: Qwen3NextFullAttention

Qwen3.5 Gated Attention layer that is identical to Qwen3-Next

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

class maxtext.models.qwen3_5.Qwen3_5SparseMoEBlock(*args, **kwargs)[source]#

Bases: Qwen3NextSparseMoeBlock

Shares same MoE code as Qwen3-Next

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

class maxtext.models.qwen3_5.Qwen3_5ScannableBlock(*args, **kwargs)[source]#

Bases: Module

Scanned Structure for Text-only Architecture, explicitly invoking Qwen3_5 layers.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

class maxtext.models.qwen3_5.Qwen3_5DecoderLayer(*args, **kwargs)[source]#

Bases: Module

This layer is a hybrid, capable of functioning as either: 1. A standard attention + MoE layer. 2. A linear attention + MoE layer.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

config#

The model configuration object.

mesh#

The device mesh for sharding.

model_mode#

The operational mode (e.g., ‘train’, ‘prefill’).

layer_idx#

The index of the current layer in the transformer stack.

quant#

Optional quantization configuration.