maxtext.models.qwen2 module#

Qwen2 family of model decoder layers.

class maxtext.models.qwen2.AttentionWithNorm(*args, **kwargs)[source]#

Bases: Module

Base class with shared common components: self-attention block with normalization.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any

apply_attention_with_norm(inputs, decoder_segment_ids, decoder_positions, deterministic, model_mode, kv_cache=None, attention_metadata=None)[source]#

Applies self-attention with pre and post-layer normalization.

Parameters:
  • inputs (Array)

  • decoder_segment_ids (None | Array)

  • decoder_positions (None | Array)

  • deterministic (bool)

  • model_mode (str)

  • kv_cache (None | Array)

  • attention_metadata (None | dict[str, Any])

class maxtext.models.qwen2.Qwen2DecoderLayer(*args, **kwargs)[source]#

Bases: AttentionWithNorm

Qwen2 Transformer decoder layer (dense).

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

Any