maxtext.multimodal package#
Submodules#
- maxtext.multimodal.processor module
- maxtext.multimodal.processor_gemma3 module
- maxtext.multimodal.processor_gemma4 module
- maxtext.multimodal.processor_llama4 module
Llama4PreprocessorOutputget_factors()find_supported_resolutions()get_best_resolution()pad_to_best_fit_jax()pad_to_max_tiles()split_to_tiles()preprocess_mm_data_llama4()get_num_tokens_for_this_image()get_image_offsets_llama4()reformat_prompt_llama4()get_tokens_for_this_image()add_extra_tokens_for_images_llama4()get_dummy_image_shape_for_init_llama4()
- maxtext.multimodal.processor_qwen3_omni module
Qwen3OmniPreprocessorOutputQwen3OmniPreprocessorOutput.num_imagesQwen3OmniPreprocessorOutput.pixel_valuesQwen3OmniPreprocessorOutput.pixel_grid_thwQwen3OmniPreprocessorOutput.num_videosQwen3OmniPreprocessorOutput.video_valuesQwen3OmniPreprocessorOutput.video_grid_thwQwen3OmniPreprocessorOutput.video_second_per_gridQwen3OmniPreprocessorOutput.num_audiosQwen3OmniPreprocessorOutput.audio_valuesQwen3OmniPreprocessorOutput.audio_maskQwen3OmniPreprocessorOutput.audio_lengths
smart_resize()pre_process_qwen3_image()calculate_video_frame_range()smart_nframes()preprocess_video()pre_process_audio_qwen3_omni()preprocess_mm_data_qwen3_omni()add_extra_tokens_for_qwen3_omni()get_dummy_image_shape_for_init_qwen3_omni()get_dummy_audio_shape_for_init_qwen3_omni()get_llm_pos_ids_for_vision()get_chunked_index()get_rope_index()reformat_prompt_qwen3_omni()get_mm_offsets_qwen3_omni()
- maxtext.multimodal.utils module