maxtext.input_pipeline.tfds_data_processing module

maxtext.input_pipeline.tfds_data_processing module#

Input pipeline for a LM1B dataset.

maxtext.input_pipeline.tfds_data_processing.get_datasets(dataset_name, data_split, shuffle_files, shuffle_seed, dataloading_host_index, dataloading_host_count, dataset_path=None)[source]#: Load a TFDS dataset.

maxtext.input_pipeline.tfds_data_processing.preprocessing_pipeline(dataset, tokenizer_path, tokenizer_type, global_batch_size, max_target_length, data_column_names, shuffle=False, data_shuffle_seed=0, tokenize=True, add_bos=True, add_eos=True, num_epochs=1, pack_examples=True, shuffle_buffer_size=1024, shift=True, drop_remainder=True, prefetch_size=-1, use_dpo=False, hf_access_token='')[source]#

pipeline for preprocessing TFDS dataset.

Parameters:

tokenizer_type (str)
global_batch_size (int)
max_target_length (int)
shuffle (bool)
tokenize (bool)
add_bos (bool)
add_eos (bool)
num_epochs (None | int)
pack_examples (bool)
shuffle_buffer_size (int)
shift (bool)
drop_remainder (bool)
use_dpo (bool)
hf_access_token (str)

maxtext.input_pipeline.tfds_data_processing.make_tfds_train_iterator(config, global_mesh, process_indices_train)[source]#

load dataset, preprocess and return iterators

Parameters:: config (ConfigDict)

maxtext.input_pipeline.tfds_data_processing.make_tfds_eval_iterator(config, global_mesh, process_indices_eval)[source]#

load eval dataset, preprocess and return iterators

Parameters:: config (ConfigDict)

maxtext.input_pipeline.tfds_data_processing module

Contents

maxtext.input_pipeline.tfds_data_processing module#