maxtext.checkpoint_conversion.utils.param_mapping module#

Parameter mappings and transformation hooks for checkpoint conversion.

This module defines the necessary components to convert model checkpoints between MaxText and Hugging Face formats for various architectures (e.g., Gemma, Qwen). It provides two key types of mappings for each model:

  1. Parameter Name Mappings (`PARAM_MAPPING`): Dictionaries that map a MaxText parameter key to its corresponding Hugging Face parameter(s). These mappings are generated by functions like GEMMA2_MAXTEXT_TO_HF_PARAM_MAPPING.

    Key: MaxText parameters, with following forms: - atomic_mt_key: A single string representing one MaxText parameter. - composite_mt_key: A tuple of strings representing multiple MaxText parameters. (e.g., GPT-OSS)

    Value: corresponding Hugging Face parameters, with following forms: - unscanned: A single string. - scanned: A list of strings, to be stacked along the layer axis. - unscanned with expert stacking: A list of strings, to be stacked along the expert axis. - scanned with expert stacking: A nested list of strings, to be stacked along both layer and expert axes. Note: Expert stacking only applies a subset of MoE models (e.g., Qwen MoE, DeepSeek, Mixtral),

    but not others (e.g., GPT-OSS).

  2. Hook Functions (`HOOK_FNS`): Dictionaries that map a MaxText parameter name to a specific transformation function (a “hook”). These hooks handle the actual value conversion, which can include operations like reshaping, transposing, scaling, or padding tensors to match the target format’s requirements. These are generated by functions like GEMMA2_MAXTEXT_TO_HF_PARAM_HOOK_FN.

The main conversion script uses these mappings to systematically transform each parameter from the source checkpoint and build the target checkpoint.

maxtext.checkpoint_conversion.utils.param_mapping.GEMMA3_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Generates a parameter mapping from MaxText to Hugging Face for Gemma3.

This function creates a dictionary that maps the parameter names from a MaxText Gemma3 checkpoint to their corresponding names in the Hugging Face Gemma3ForCausalLM format. It handles both the text and vision components of the model.

Parameters:
  • config (dict) – The Hugging Face model configuration dictionary, which must contain ‘text_config’ and ‘vision_config’ sub-dictionaries.

  • scan_layers (bool, optional) – If True, generates mappings for scanned layers, where multiple layers are stacked into a single tensor. If False, generates mappings for individual, unscanned layers. Defaults to False.

Returns:

A mapping where keys are atomic_mt_key (single MaxText parameter names). Values

are either a single Hugging Face parameter name (unscanned form) or a list of Hugging Face parameter names (scanned form) for stacked text layers.

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.GEMMA3_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Hook functions for Gemma3 parameter conversion.

This function provides a dictionary of transformation functions (hooks) for converting Gemma3 model parameters between MaxText and Hugging Face formats. It handles embedding padding/scaling, RMSNorm scaling, kernel reshaping, and vision-specific tensor manipulations.

Parameters:
  • config (dict) – The Hugging Face model configuration dictionary.

  • scan_layers (bool, optional) – Whether the model uses scanned layers. Defaults to False.

  • saving_to_hf (bool, optional) – The direction of conversion. True for MaxText to Hugging Face, False for the reverse. Defaults to False.

Returns:

A dictionary mapping MaxText parameter names to their corresponding

transformation functions.

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.GEMMA2_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Returns mapping between MaxText and HuggingFace Gemma2 weight paths.

Parameters:
  • config (dict) – Model configuration dictionary containing at least ‘num_hidden_layers’.

  • scan_layers (bool, optional) – Whether the MaxText model uses layer scanning optimization. When True, decoder layers are stacked into a single tensor. Defaults to False.

Returns:

A mapping where keys are atomic_mt_key (single MaxText parameter name).

Values are either a single string (unscanned form) or a list of strings (scanned form) for stacked layers when scan_layers=True.

Return type:

dict

Notes

  • MaxText uses a paired layer approach where two HF decoder layers are treated as one MaxText decoder layer.

  • MaxText layer i corresponds to HF layers 2i and 2i+1.

  • Local components map to even-numbered HF decoder layers (0, 2, 4…).

  • Global components map to odd-numbered HF decoder layers (1, 3, 5…).

maxtext.checkpoint_conversion.utils.param_mapping.GEMMA2_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Creates parameter transformation functions for Gemma2 conversion.

This function generates a mapping of transformation functions that handle the necessary conversions between MaxText and HuggingFace parameter formats for Gemma2, including operations like padding, reshaping, and scaling.

Parameters:
  • config (dict) – Model configuration dictionary that must contain: - num_hidden_layers (int): Number of layers in the model. - head_dim (int): Dimension of attention heads. - hidden_size (int): Model’s hidden dimension size.

  • scan_layers (bool, optional) – Controls the output format for layer parameters. True for batched, False for individual. Defaults to False.

  • saving_to_hf (bool, optional) – Determines the direction of transformation. True for MaxText to HuggingFace, False for the reverse. Defaults to False.

Returns:

A mapping from MaxText parameter names to transformation functions.

The value can be a single function or a list of functions to be applied sequentially.

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.QWEN_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Returns mapping from MaxText to HuggingFace Qwen weight paths.

This function generates a dictionary that maps parameter names from a MaxText Qwen checkpoint to their corresponding names in the Hugging Face format. It handles both dense and Mixture-of-Experts (MoE) model variants.

Parameters:
  • config (dict) – Model configuration dictionary, including ‘num_hidden_layers’ and optionally ‘num_experts’.

  • scan_layers (bool, optional) – Whether the MaxText model uses scanned layers. Defaults to False.

Returns:

A mapping where keys are atomic_mt_key (single MaxText parameter names).

Values are Hugging Face parameter names in one of four forms: unscanned (string), scanned (list of strings), unscanned with expert stacking (list of strings), or scanned with expert stacking (nested list of strings).

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.QWEN_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Creates parameter transformation functions for Qwen.

This function provides a dictionary of transformation functions (hooks) for converting Qwen model parameters between MaxText and Hugging Face formats. It handles embedding padding and kernel reshaping.

Parameters:
  • config (dict) – Model configuration dictionary, including ‘num_hidden_layers’ and optionally ‘num_experts’.

  • scan_layers (bool, optional) – Whether the model uses scanned layers. Defaults to False.

  • saving_to_hf (bool, optional) – The direction of conversion. True for MaxText to Hugging Face, False for the reverse. Defaults to False.

Returns:

A dictionary mapping MaxText parameter names to their corresponding

transformation functions.

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Returns mapping from MaxText to HuggingFace Qwen3-Next weight paths. All MaxText keys start with ‘params-’ and use ‘-’ separators for scanned layers.

maxtext.checkpoint_conversion.utils.param_mapping.QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Transformation hooks for parameters using hyphenated ‘params-’ MaxText keys.

maxtext.checkpoint_conversion.utils.param_mapping.DEEPSEEK_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Generates a parameter mapping from MaxText to HuggingFace Deepseek weight paths.

Returns:

A mapping where keys are atomic_mt_key (single MaxText parameter names).

Values are Hugging Face parameter names in one of four forms: unscanned (string), scanned (list of strings), unscanned with expert stacking (list of strings), or scanned with expert stacking (nested list of strings).

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.DEEPSEEK_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Creates parameter transformation functions for Deepseek.

maxtext.checkpoint_conversion.utils.param_mapping.DEEPSEEK_NNX_TO_VLLM_PARAM_HOOK_FN()[source]#

Creates parameter transformation functions for Deepseek.

maxtext.checkpoint_conversion.utils.param_mapping.GPT_OSS_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Generates mapping from MaxText gpt-oss to Hugging Face weight paths.

Returns:

A mapping where keys are atomic_mt_key (single MaxText parameter) or composite_mt_key (a tuple of MaxText parameters). Values are Hugging Face parameter names either a single string (unscanned form) or a list of strings (scanned form).

Return type:

dict

Notes: - Handles the inhomogeneous scan block structure, based on inhomogeneous_layer_cycle_interval - Handles composite_mt_key: multiple MaxText keys map to HF key(s)

  • (GptOssMlp-wi_0, GptOssMlp-wi_1): mlp.experts.gate_up_proj

  • (GptOssMlp-wi_0_bias, GptOssMlp-wi_1_bias): mlp.experts.gate_up_proj_bias

maxtext.checkpoint_conversion.utils.param_mapping.GPT_OSS_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Transformation hooks for gpt-oss parameters.

Notes: - Handles the inhomogeneous scan block structure (inhomogeneous_layer_cycle_interval) - Handles composite_mt_key where multiple MaxText keys map to HF key(s)

  • (GptOssMlp-wi_0, GptOssMlp-wi_1): mlp.experts.gate_up_proj

  • (GptOssMlp-wi_0_bias, GptOssMlp-wi_1_bias): mlp.experts.gate_up_proj_bias

  • The composite keys are transformed via interleave function

maxtext.checkpoint_conversion.utils.param_mapping.QWEN3_OMNI_MOE_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Returns mapping from MaxText to HuggingFace Qwen3-Omni weight paths.

This function combines mappings from different modalities (text, vision, audio, etc.) into a unified parameter mapping for the multi-modal Qwen3-Omni model.

Parameters:
  • config (dict) – Model configuration dictionary containing modality-specific configs.

  • scan_layers (bool, optional) – Whether the model uses scanned layers. Defaults to False.

Returns:

Combined mapping from all modalities.

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.QWEN3_OMNI_MOE_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Creates parameter transformation functions for Qwen3-Omni.

This function provides a dictionary of transformation functions (hooks) for converting Qwen3-Omni model parameters between MaxText and Hugging Face formats. It handles embedding padding and kernel reshaping.

Parameters:
  • config (dict) – Model configuration dictionary, including ‘num_hidden_layers’ and optionally ‘num_experts’.

  • scan_layers (bool, optional) – Whether the model uses scanned layers. Defaults to False.

  • saving_to_hf (bool, optional) – The direction of conversion. True for MaxText to Hugging Face, False for the reverse. Defaults to False.

Returns:

A dictionary mapping MaxText parameter names to their corresponding

transformation functions.

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.QWEN3_NNX_TO_VLLM_PARAM_HOOK_FN(target_shape=None)[source]#

Creates parameter transformation functions for Qwen3.

This function provides a dictionary of transformation functions (hooks) for converting Qwen3 model parameters between NNX and vLLM formats.

Returns:

A dictionary mapping NNX parameter names to their corresponding

transformation functions.

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.LLAMA31_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Returns a dictionary mapping from MaxText parameter names to HuggingFace LLaMA3.1 parameter names.

Parameters:
  • config (dict) – Model configuration dictionary containing: - num_hidden_layers (int): The number of decoder layers.

  • scan_layers (bool, optional) – If True, MaxText layers are ‘stacked’ into a single param. Defaults to False.

Returns:

A mapping where keys are atomic_mt_key (single MaxText parameter names).

Values are either a single string (unscanned form) or a list of strings (scanned form) for stacked layers when scan_layers=True.

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.LLAMA31_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Creates parameter transformation functions for converting between MaxText and HuggingFace formats.

This function generates a mapping of transformation functions that handle the necessary conversions between MaxText and HuggingFace parameter formats, including operations like reshaping.

maxtext.checkpoint_conversion.utils.param_mapping.LLAMA31_NNX_TO_VLLM_PARAM_HOOK_FN()[source]#

Defines and returns hook functions for weight transformations.

These hooks are applied to specific weights during the conversion from MaxText to a HuggingFace-compatible format. They handle transformations like RoPE reordering and query scaling that are not simple re-mappings.

Returns:

A dictionary where keys are MaxText parameter names and values are the corresponding transformation functions.

maxtext.checkpoint_conversion.utils.param_mapping.MIXTRAL_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Generates the mapping of parameter names from MaxText to Hugging Face for Mixtral.

Returns:

A mapping where keys are atomic_mt_key (single MaxText parameter names). Values

are Hugging Face parameter names in one of four forms: unscanned string, scanned list of strings, unscanned with expert stacking (list of strings), or scanned with expert stacking (nested list of strings).

Return type:

dict

maxtext.checkpoint_conversion.utils.param_mapping.MIXTRAL_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Generates parameter conversion hooks for Mixtral between MaxText and Hugging Face.

maxtext.checkpoint_conversion.utils.param_mapping.GEMMA4_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Returns mapping between MaxText and HuggingFace Gemma4 weight paths.

maxtext.checkpoint_conversion.utils.param_mapping.GEMMA4_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Creates parameter transformation functions for Gemma4.

maxtext.checkpoint_conversion.utils.param_mapping.OLMO3_MAXTEXT_TO_HF_PARAM_MAPPING(config, maxtext_config, scan_layers=False)[source]#

Returns mapping from MaxText to HuggingFace Olmo3 weight paths.

maxtext.checkpoint_conversion.utils.param_mapping.OLMO3_MAXTEXT_TO_HF_PARAM_HOOK_FN(config, maxtext_config, scan_layers=False, saving_to_hf=False)[source]#

Creates parameter transformation functions for Olmo3.