Full fine-tuning on single-host TPUs

Full fine-tuning on single-host TPUs#

Full Fine-Tuning (FFT) is a common technique used in post-training to adapt a pre-trained Large Language Model (LLM) to a specific downstream task or dataset. In this process, all the parameters (weights) of the original model are “unfrozen” and updated during training on the new task-specific data. This allows the entire model to adjust and specialize, potentially leading to the best performance on the new task.

This tutorial demonstrates step-by-step instructions for setting up the environment, convert checkpoint and then training the model on a Hugging Face dataset using FFT.

In this tutorial we use a single host TPU VM such as v6e-8/v5p-8. Let’s get started!

Install dependencies#

For instructions on installing MaxText on your VM, please refer to the official documentation and use the maxtext[tpu] installation path to include all necessary dependencies.

Setup environment variables#

Login to Hugging Face. Provide your access token when prompted:

hf auth login

Set up the following environment variables to configure your training run. Replace placeholders with your actual values.

# -- Model configuration --
# The MaxText model name. See `src/maxtext/configs/types.py` for `ModelName` for a
# full list of supported models.
export MODEL=<MODEL_NAME> # e.g., 'llama3.1-8b-Instruct'

# -- MaxText configuration --
# Use a GCS bucket you own to store logs and checkpoints. Ideally in the same
# region as your TPUs to minimize latency and costs.
# You can list your buckets and their locations in the
# [Cloud Console](https://console.cloud.google.com/storage/browser).
export BASE_OUTPUT_DIRECTORY=<GCS_BUCKET> # e.g., gs://my-bucket/maxtext-runs

# An arbitrary string to identify this specific run.
# We recommend to include the model, user, and timestamp.
# Note: Kubernetes requires workload names to be valid DNS labels (lowercase, no underscores or periods).
export RUN_NAME=<RUN_NAME>

Hugging Face checkpoint to Maxtext checkpoint#

This section explains how to prepare your model checkpoint for use with MaxText. You have two options: using an existing MaxText checkpoint or converting a Hugging Face checkpoint.

Option 1: Using an existing MaxText checkpoint#

If you already have a MaxText-compatible model checkpoint, simply set the following environment variable and move on to the next section.

export MAXTEXT_CKPT_PATH=<CKPT_PATH> # e.g., gs://my-bucket/my-model-checkpoint/0/items

Option 2: Converting a Hugging Face checkpoint#

Refer the steps in Hugging Face to MaxText to convert a hugging face checkpoint to MaxText. Make sure you have correct checkpoint files converted and saved. Similar as Option 1, you can set the following environment and move on.

export MAXTEXT_CKPT_PATH=<CKPT_PATH> # gs://my-bucket/my-checkpoint-directory/0/items

Dataset#

MaxText provides examples to work with Common Crawl. The dataset is available in TFRecords format in a cloud bucket. MaxText provides scripts to copy the dataset to a Google Cloud Storage Bucket.

Common Crawl (c4) dataset setup#

Run these steps once per project prior to any local development or cluster experiments.

Create two gcs buckets in your project, one for downloading and retrieving the dataset and the other for storing the logs.
Download the dataset in your gcs bucket.

MaxText assumes these GCS buckets are created in the same project and that it has permissions to read and write from them.

export PROJECT_ID=<PROJECT_ID>
export DATASET_GCS_BUCKET=<DATASET_PATH> # e.g., gs://my-bucket/my-dataset

bash tools/data_generation/download_dataset.sh ${PROJECT_ID?} ${DATASET_GCS_BUCKET?}

The above will download the c4 dataset to the GCS BUCKET.

Sample Full Fine tuning script#

Below is a sample training script.

python3 -m maxtext.trainers.pre_train.train \
  run_name=${RUN_NAME?} \
  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
  load_parameters_path=${MAXTEXT_CKPT_PATH?} \
  model_name=${MODEL?} \
  dataset_path=${DATASET_GCS_BUCKET?} \
  async_checkpointing=False  \
  steps=10 per_device_batch_size=1

You can find some end to end scripts here. These scripts can provide a reference point for various scripts.

Parameters to achieve high MFU#

This content is in progress.