maxtext.multimodal.processor_gemma4 module

maxtext.multimodal.processor_gemma4 module#

Gemma4-specific utilities for multimodal features.

class maxtext.multimodal.processor_gemma4.Gemma4PreprocessorOutput(pixel_values=None, pixel_mask=None, aspect_ratios=None, num_images=0, audio_values=None, audio_mask=None, positions_xy=None)[source]#

Bases: PreprocessorOutput

The output of Gemma4 image preprocessor.

Parameters:

pixel_values (None | ndarray)
pixel_mask (None | ndarray)
aspect_ratios (None | ndarray)
num_images (int)
audio_values (None | ndarray)
audio_mask (None | ndarray)
positions_xy (None | ndarray)

num_images: int = 0#

pixel_values: None | ndarray = None#

pixel_mask: None | ndarray = None#

positions_xy: None | ndarray = None#

maxtext.multimodal.processor_gemma4.preprocess_mm_data_gemma4(images)[source]#: Preprocesses multimodal data for Gemma4 models.

maxtext.multimodal.processor_gemma4.get_image_offsets_gemma4(processor_output)[source]#

Gets the increase in total token count after inserting image token placeholders.

Parameters:: processor_output (PreprocessorOutput | None)

maxtext.multimodal.processor_gemma4.reformat_prompt_gemma4(prompt, image_placeholder, num_images)[source]#: Reformats prompt for Gemma4 models by inserting image placeholders.

maxtext.multimodal.processor_gemma4.insert_sequence(tokens, *, at, sequence, max_num_images)[source]#

Inserts a sequence of tokens into the given tokens array at the specified token position.

Parameters:

tokens (ndarray)
at (int)
sequence (list[int])
max_num_images (int)

Return type:

ndarray

maxtext.multimodal.processor_gemma4.add_extra_tokens_for_images_gemma4(tokens, *, max_num_images=1)[source]#

Replaces image placeholder tokens with the full sequence of Gemma 4 image tokens.

Parameters:

tokens (ndarray | list)
max_num_images (int)

maxtext.multimodal.processor_gemma4.get_dummy_image_shape_for_init_gemma4(batch_size=1, num_image_per_sequence=1)[source]#: Returns the shape of the dummy image for Gemma4 model’s initialization.

maxtext.multimodal.processor_gemma4 module

Contents

maxtext.multimodal.processor_gemma4 module#