maxtext.multimodal.processor module

Contents

maxtext.multimodal.processor module#

Multimodal data preprocessor router.

maxtext.multimodal.processor.preprocess_mm_data(config)[source]#

Preprocesses multimodal data based on the provided configuration. Routes to the appropriate preprocessing function based on the model name.

Parameters:: config – A pyconfig.Config object containing configuration parameters.
Returns:: A PreprocessorOutput object containing the processed multimodal data.

maxtext.multimodal.processor.preprocess_image_for_training(image, model_name)[source]#: Preprocesses a single image for training based on the model name.

maxtext.multimodal.processor.get_image_offsets(config, processor_output)[source]#

Get the increase in total token count after inserting image token placeholders

Parameters:: processor_output (PreprocessorOutput | None)

maxtext.multimodal.processor.reformat_prompt(prompt, image_placeholder, model_name, num_images, video_placeholder='<|video|>', num_videos=0)[source]#: Reformat prompt for different models.

maxtext.multimodal.processor.reformat_response(response, model_name)[source]#: Reformat response for different models.

maxtext.multimodal.processor.prepare_text_for_image_fusion(tokens, config, processor_output=None)[source]#: Prepare text by adding extra tokens for image fusion based on the model.

maxtext.multimodal.processor.get_dummy_image_shape_for_init(model_name, batch_size=1, num_image_per_sequence=1)[source]#: Return the shape of the dummy image for specific model’s initialization.

maxtext.multimodal.processor.get_dummy_audio_shape_for_init(config)[source]#

Return the shape of the dummy audio for specific model’s initialization.

Parameters:: config – Model configuration containing audio parameters
Returns:: (batch, num_mel_bins, audio_length) Returns empty tuple if audio is not configured for the model
Return type:: Tuple representing audio shape

maxtext.multimodal.processor.get_bidirectional_mask_vision(config, decoder_input_tokens)[source]#: Get the bidirectional mask for specific models.

maxtext.multimodal.processor.get_bidirectional_mask_audio(config, decoder_input_tokens)[source]#: Get the bidirectional mask for specific models.