maxtext.models.qwen3_custom module#
Custom Qwen3 model decoder layer.
- class maxtext.models.qwen3_custom.Qwen3CustomAttention(*args, **kwargs)[source]#
Bases:
AttentionCustom GQA attention that supports sub-dimensional output.
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- class maxtext.models.qwen3_custom.Qwen3CustomMoeDecoderLayer(*args, **kwargs)[source]#
Bases:
AttentionWithNormQwen3 Transformer decoder layer (Custom MoE).
- Parameters:
args (Any)
kwargs (Any)
- Return type:
Any
- apply_attention_with_norm(inputs, decoder_segment_ids, decoder_positions, deterministic, model_mode, kv_cache=None, attention_metadata=None)[source]#
Applies self-attention with pre and post-layer normalization.
- Parameters:
inputs (Array)
decoder_segment_ids (None | Array)
decoder_positions (None | Array)
deterministic (bool)
model_mode (str)
kv_cache (None | Array)
attention_metadata (None | dict[str, Any])