maxtext.multimodal.processor module#

Multimodal data preprocessor router.

maxtext.multimodal.processor.preprocess_mm_data(config)[source]#

Preprocesses multimodal data based on the provided configuration. Routes to the appropriate preprocessing function based on the model name.

Parameters:

config – A pyconfig.Config object containing configuration parameters.

Returns:

A PreprocessorOutput object containing the processed multimodal data.

maxtext.multimodal.processor.preprocess_image_for_training(image, model_name)[source]#

Preprocesses a single image for training based on the model name.

maxtext.multimodal.processor.get_image_offsets(config, processor_output)[source]#

Get the increase in total token count after inserting image token placeholders

Parameters:

processor_output (PreprocessorOutput | None)

maxtext.multimodal.processor.reformat_prompt(prompt, image_placeholder, model_name, num_images, video_placeholder='<|video|>', num_videos=0)[source]#

Reformat prompt for different models.

maxtext.multimodal.processor.reformat_response(response, model_name)[source]#

Reformat response for different models.

maxtext.multimodal.processor.prepare_text_for_image_fusion(tokens, config, processor_output=None)[source]#

Prepare text by adding extra tokens for image fusion based on the model.

maxtext.multimodal.processor.get_dummy_image_shape_for_init(model_name, batch_size=1, num_image_per_sequence=1)[source]#

Return the shape of the dummy image for specific model’s initialization.

maxtext.multimodal.processor.get_dummy_audio_shape_for_init(config)[source]#

Return the shape of the dummy audio for specific model’s initialization.

Parameters:

config – Model configuration containing audio parameters

Returns:

(batch, num_mel_bins, audio_length) Returns empty tuple if audio is not configured for the model

Return type:

Tuple representing audio shape

maxtext.multimodal.processor.get_bidirectional_mask_vision(config, decoder_input_tokens)[source]#

Get the bidirectional mask for specific models.

maxtext.multimodal.processor.get_bidirectional_mask_audio(config, decoder_input_tokens)[source]#

Get the bidirectional mask for specific models.