maxtext.multimodal.processor_gemma4 module#

Gemma4-specific utilities for multimodal features.

class maxtext.multimodal.processor_gemma4.Gemma4PreprocessorOutput(pixel_values=None, pixel_mask=None, aspect_ratios=None, num_images=0, audio_values=None, audio_mask=None, positions_xy=None)[source]#

Bases: PreprocessorOutput

The output of Gemma4 image preprocessor.

Parameters:
  • pixel_values (None | ndarray)

  • pixel_mask (None | ndarray)

  • aspect_ratios (None | ndarray)

  • num_images (int)

  • audio_values (None | ndarray)

  • audio_mask (None | ndarray)

  • positions_xy (None | ndarray)

num_images: int = 0#
pixel_values: None | ndarray = None#
pixel_mask: None | ndarray = None#
positions_xy: None | ndarray = None#
maxtext.multimodal.processor_gemma4.preprocess_mm_data_gemma4(images)[source]#

Preprocesses multimodal data for Gemma4 models.

maxtext.multimodal.processor_gemma4.get_image_offsets_gemma4(processor_output)[source]#

Gets the increase in total token count after inserting image token placeholders.

Parameters:

processor_output (PreprocessorOutput | None)

maxtext.multimodal.processor_gemma4.reformat_prompt_gemma4(prompt, image_placeholder, num_images)[source]#

Reformats prompt for Gemma4 models by inserting image placeholders.

maxtext.multimodal.processor_gemma4.insert_sequence(tokens, *, at, sequence, max_num_images)[source]#

Inserts a sequence of tokens into the given tokens array at the specified token position.

Parameters:
  • tokens (ndarray)

  • at (int)

  • sequence (list[int])

  • max_num_images (int)

Return type:

ndarray

maxtext.multimodal.processor_gemma4.add_extra_tokens_for_images_gemma4(tokens, *, max_num_images=1)[source]#

Replaces image placeholder tokens with the full sequence of Gemma 4 image tokens.

Parameters:
  • tokens (ndarray | list)

  • max_num_images (int)

maxtext.multimodal.processor_gemma4.get_dummy_image_shape_for_init_gemma4(batch_size=1, num_image_per_sequence=1)[source]#

Returns the shape of the dummy image for Gemma4 model’s initialization.