maxtext.models.qwen3_5 module#
Qwen3.5 family of model decoder layers.
- class maxtext.models.qwen3_5.Qwen3_5GatedDeltaNet(*args, **kwargs)[source]#
Bases:
Qwen3NextGatedDeltaNetQwen3.5 GatedDeltaNet layer that is identical to Qwen3-Next GatedDeltaNet
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- class maxtext.models.qwen3_5.Qwen3_5FullAttention(*args, **kwargs)[source]#
Bases:
Qwen3NextFullAttentionQwen3.5 Gated Attention layer that is identical to Qwen3-Next
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- class maxtext.models.qwen3_5.Qwen3_5SparseMoEBlock(*args, **kwargs)[source]#
Bases:
Qwen3NextSparseMoeBlockShares same MoE code as Qwen3-Next
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- class maxtext.models.qwen3_5.Qwen3_5ScannableBlock(*args, **kwargs)[source]#
Bases:
ModuleScanned Structure for Text-only Architecture, explicitly invoking Qwen3_5 layers.
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- class maxtext.models.qwen3_5.Qwen3_5DecoderLayer(*args, **kwargs)[source]#
Bases:
ModuleThis layer is a hybrid, capable of functioning as either: 1. A standard attention + MoE layer. 2. A linear attention + MoE layer.
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- config#
The model configuration object.
- mesh#
The device mesh for sharding.
- model_mode#
The operational mode (e.g., ‘train’, ‘prefill’).
- layer_idx#
The index of the current layer in the transformer stack.
- quant#
Optional quantization configuration.