maxtext.models.qwen3_custom module#

Custom Qwen3 model decoder layer.

class maxtext.models.qwen3_custom.Qwen3CustomAttention(*args, **kwargs)[source]#

Bases: Attention

Custom GQA attention that supports sub-dimensional output.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

init_out_w(output_dim)[source]#

Initializes the output projection.

Parameters:

output_dim (int)

Return type:

Module

class maxtext.models.qwen3_custom.Qwen3CustomMoeDecoderLayer(*args, **kwargs)[source]#

Bases: AttentionWithNorm

Qwen3 Transformer decoder layer (Custom MoE).

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

apply_attention_with_norm(inputs, decoder_segment_ids, decoder_positions, deterministic, model_mode, kv_cache=None, attention_metadata=None)[source]#

Applies self-attention with pre and post-layer normalization.

Parameters:
  • inputs (Array)

  • decoder_segment_ids (None | Array)

  • decoder_positions (None | Array)

  • deterministic (bool)

  • model_mode (str)

  • kv_cache (None | Array)

  • attention_metadata (None | dict[str, Any])