maxtext.models package#
Submodules#
- maxtext.models.deepseek module
DeepSeekGenericLayerDeepSeekGenericLayer.mlp_op()DeepSeekGenericLayer.with_logical_constraint()DeepSeekGenericLayer.dropout_op()DeepSeekGenericLayer.pre_attention_norm_op()DeepSeekGenericLayer.post_attention_norm_op()DeepSeekGenericLayer.attention_op()DeepSeekGenericLayer.logical_axis_namesDeepSeekGenericLayer.mlp_logical_axis_namesDeepSeekGenericLayer.post_process()DeepSeekGenericLayer.self_attention_with_norm_op()DeepSeekGenericLayer.engram_op()
DeepSeekDenseLayerDeepSeekMoELayer
- maxtext.models.deepseek_batchsplit module
scheduling_group()fetch_weights()split()merge()extract_layer_weights()insert_layer_ws_grad()gather_weights()reduce_scatter_ws_grad()all_reduce_ws_grad_dcn()init_splash_kernel()tpu_flash_attention()tpu_flash_attention_bwd()scan_batch_split_layers()batch_split_schedule()batch_split_schedule_bwd()staggered_call()dot()mla_with_norms()mla_with_norms_remat()mla_with_norms_bwd()mla()mla_remat()mla_bwd()query_projection()kv_projection()get_key_value()rms_norm()initialize_yarn_mask()initialize_yarn_freqs()yarn()shared_expert_and_route()shared_expert()expert_group_mask()expert_indices_and_weights()expert_selection()route()unroute()route_impl_fwd()route_impl_bwd()unroute_impl_fwd()unroute_impl_bwd()gmm()compute_gating()compute_linear()route_compute_unroute()unroute_ubatch_shard_mapped()unroute_ubatch_fn()unroute_ubatch_remat_and_bwd_shard_mapped()unroute_ubatch_fn_remat()unroute_ubatch_fn_bwd()sum_grads()route_compute_unroute_bwd()moe()moe_bwd()
- maxtext.models.deepseek_batchsplit_fp8 module
fetch_weights()split()merge()gather_weights()scan_batch_split_layers()batch_split_schedule()staggered_call()with_data_parallel_constraint()dot()mla_with_norms()mla()query_projection()kv_projection()get_key_value()rms_norm()yarn()moe()expert_indices_and_weights()expert_selection()route()unroute()compute()route_compute_unroute()process_activations()
- maxtext.models.gemma module
- maxtext.models.gemma2 module
- maxtext.models.gemma3 module
- maxtext.models.gemma4 module
- maxtext.models.gemma4_vision module
- maxtext.models.gpt3 module
Gpt3LayerNormgpt3_layer_norm()Gpt3MultiHeadAttentionGpt3MultiHeadAttention.num_headsGpt3MultiHeadAttention.head_dimGpt3MultiHeadAttention.max_target_lengthGpt3MultiHeadAttention.max_prefill_predict_lengthGpt3MultiHeadAttention.meshGpt3MultiHeadAttention.dtypeGpt3MultiHeadAttention.dropout_rateGpt3MultiHeadAttention.kernel_initGpt3MultiHeadAttention.float32_qk_productGpt3MultiHeadAttention.float32_logitsGpt3MultiHeadAttention.fused_qkvGpt3MultiHeadAttention.quantGpt3MultiHeadAttention.use_biasGpt3MultiHeadAttention.create_projection_layer()Gpt3MultiHeadAttention.qkv_projection()Gpt3MultiHeadAttention.projection()
Gpt3DecoderLayer
- maxtext.models.gpt_oss module
- maxtext.models.llama2 module
- maxtext.models.llama4 module
Llama4UnfoldConvolutionpixel_shuffle()Llama4VisionMLPLlama4VisionMLP2Llama4VisionPixelShuffleMLPLlama4MultiModalProjectorllama4multimodalprojector_as_linen()determine_is_nope_layer()determine_is_moe_layer()Llama4DecoderLayerLlama4ScannableBlockLlama4VisionEncoderLayerLlama4VisionEncoderLlama4VisionModelllama4visionmodel_as_linen()
- maxtext.models.mistral module
- maxtext.models.mixtral module
- maxtext.models.models module
TransformerLinenPureTransformerLinenPure.configTransformerLinenPure.meshTransformerLinenPure.quantTransformerLinenPure.model_modeTransformerLinenPure.init()TransformerLinenPure.apply()TransformerLinenPure.setup()TransformerLinenPure.logits_from_hidden_states()TransformerLinenPure.nameTransformerLinenPure.parentTransformerLinenPure.scope
transformer_as_linen()TransformerLinenTransformer
- maxtext.models.olmo3 module
- maxtext.models.qwen2 module
- maxtext.models.qwen3 module
naive_jax_chunk_gated_delta_rule()jax_chunk_gated_delta_rule()Qwen3NextGatedDeltaNetQwen3NextFullAttentionQwen3NextSparseMoeBlockQwen3NextScannableBlockQwen3NextDecoderLayerAttentionWithNormQwen3DecoderLayerQwen3MoeDecoderLayerQwen3OmniMoeVisionPatchMergerQwen3OmniMoeVisionPatchMerger.configQwen3OmniMoeVisionPatchMerger.hidden_sizeQwen3OmniMoeVisionPatchMerger.use_postshuffle_normQwen3OmniMoeVisionPatchMerger.dtypeQwen3OmniMoeVisionPatchMerger.weight_dtypeQwen3OmniMoeVisionPatchMerger.kernel_initQwen3OmniMoeVisionPatchMerger.rngsQwen3OmniMoeVisionPatchMerger.ln_qQwen3OmniMoeVisionPatchMerger.mlp_0Qwen3OmniMoeVisionPatchMerger.mlp_2
Qwen3OmniMoeVisionMLPQwen3OmniMoeVisionPatchEmbedQwen3OmniMoeVisionPatchEmbed.configQwen3OmniMoeVisionPatchEmbed.patch_sizeQwen3OmniMoeVisionPatchEmbed.temporal_patch_sizeQwen3OmniMoeVisionPatchEmbed.in_channelsQwen3OmniMoeVisionPatchEmbed.embed_dimQwen3OmniMoeVisionPatchEmbed.dtypeQwen3OmniMoeVisionPatchEmbed.weight_dtypeQwen3OmniMoeVisionPatchEmbed.rngsQwen3OmniMoeVisionPatchEmbed.proj
Qwen3OmniMoeVisionAttentionQwen3OmniMoeVisionBlockQwen3OmniMoeVisionEncoderQwen3OmniMoeVisionProjectorqwen3omni_visionencoder_as_linen()qwen3omni_visionprojector_as_linen()Qwen3OmniAudioEncoderLayerQwen3OmniAudioEncoderQwen3OmniAudioProjectorqwen3omni_audioencoder_as_linen()qwen3omni_audioprojector_as_linen()
- maxtext.models.qwen3_5 module
- maxtext.models.qwen3_custom module
- maxtext.models.simple_layer module