maxtext.multimodal.processor module#
Multimodal data preprocessor router.
- maxtext.multimodal.processor.preprocess_mm_data(config)[source]#
Preprocesses multimodal data based on the provided configuration. Routes to the appropriate preprocessing function based on the model name.
- Parameters:
config – A pyconfig.Config object containing configuration parameters.
- Returns:
A PreprocessorOutput object containing the processed multimodal data.
- maxtext.multimodal.processor.preprocess_image_for_training(image, model_name)[source]#
Preprocesses a single image for training based on the model name.
- maxtext.multimodal.processor.get_image_offsets(config, processor_output)[source]#
Get the increase in total token count after inserting image token placeholders
- Parameters:
processor_output (PreprocessorOutput | None)
- maxtext.multimodal.processor.reformat_prompt(prompt, image_placeholder, model_name, num_images, video_placeholder='<|video|>', num_videos=0)[source]#
Reformat prompt for different models.
- maxtext.multimodal.processor.reformat_response(response, model_name)[source]#
Reformat response for different models.
- maxtext.multimodal.processor.prepare_text_for_image_fusion(tokens, config, processor_output=None)[source]#
Prepare text by adding extra tokens for image fusion based on the model.
- maxtext.multimodal.processor.get_dummy_image_shape_for_init(model_name, batch_size=1, num_image_per_sequence=1)[source]#
Return the shape of the dummy image for specific model’s initialization.
- maxtext.multimodal.processor.get_dummy_audio_shape_for_init(config)[source]#
Return the shape of the dummy audio for specific model’s initialization.
- Parameters:
config – Model configuration containing audio parameters
- Returns:
(batch, num_mel_bins, audio_length) Returns empty tuple if audio is not configured for the model
- Return type:
Tuple representing audio shape