maxtext.multimodal.processor_gemma4 module#
Gemma4-specific utilities for multimodal features.
- class maxtext.multimodal.processor_gemma4.Gemma4PreprocessorOutput(pixel_values=None, pixel_mask=None, aspect_ratios=None, num_images=0, audio_values=None, audio_mask=None, positions_xy=None)[source]#
Bases:
PreprocessorOutputThe output of Gemma4 image preprocessor.
- Parameters:
pixel_values (None | ndarray)
pixel_mask (None | ndarray)
aspect_ratios (None | ndarray)
num_images (int)
audio_values (None | ndarray)
audio_mask (None | ndarray)
positions_xy (None | ndarray)
- num_images: int = 0#
- pixel_values: None | ndarray = None#
- pixel_mask: None | ndarray = None#
- positions_xy: None | ndarray = None#
- maxtext.multimodal.processor_gemma4.preprocess_mm_data_gemma4(images)[source]#
Preprocesses multimodal data for Gemma4 models.
- maxtext.multimodal.processor_gemma4.get_image_offsets_gemma4(processor_output)[source]#
Gets the increase in total token count after inserting image token placeholders.
- Parameters:
processor_output (PreprocessorOutput | None)
- maxtext.multimodal.processor_gemma4.reformat_prompt_gemma4(prompt, image_placeholder, num_images)[source]#
Reformats prompt for Gemma4 models by inserting image placeholders.
- maxtext.multimodal.processor_gemma4.insert_sequence(tokens, *, at, sequence, max_num_images)[source]#
Inserts a sequence of tokens into the given tokens array at the specified token position.
- Parameters:
tokens (ndarray)
at (int)
sequence (list[int])
max_num_images (int)
- Return type:
ndarray