Getting Started

Getting Started#

Welcome to MaxText! This guide will help you get started with running your first MaxText workloads. Whether you are working on a single host or scaling up to a multihost environment using Cloud TPUs or NVIDIA GPUs, this page provides the starting point for your journey. Follow the steps below to install MaxText, train your first model, and run inference.

Prerequisites#

To store logs and checkpoints, create a Cloud Storage bucket in your project. To run MaxText, the TPU or GPU VMs must have read/write permissions for the bucket. These permissions are granted by service account roles, such as the STORAGE ADMIN role.
MaxText reads a yaml file for configuration. We also recommend reviewing the configurable options in configs/base.yml. This file includes a decoder-only model of ~1B parameters. The configurable options can be overwritten from the command line. For instance, you can change the steps or log_period by either modifying configs/base.yml or by passing in steps and log_period as additional arguments to the train.py call. Set base_output_directory to a folder in the bucket you just created.
Checkpoint Conversion: In order to run MaxText on HuggingFace checkpoints, you must convert them to the MaxText/Orbax format first. For detailed instructions, see the Checkpoint Conversion Guide.

Running MaxText on a Single Host#

This procedure describes how to run MaxText on a single GPU or TPU host.

1. Installation#

Before running MaxText, you must install it on your VM.

For detailed installation instructions, see the Installation Guide.
For TPU VMs, install maxtext[tpu] for pre-training, or maxtext[tpu-post-train] for post-training.
For GPU VMs, ensure you install maxtext[cuda12].

2. Running Pre-training#

To get started with training your first model, refer to the Pre-training Tutorial.

3. Running Post-training#

To fine-tune your model or apply post-training techniques (such as SFT or RL), refer to the Post-training Tutorial. This guide covers various post-training workflows.

4. Running Inference#

To run inference (decoding) using MaxText models, refer to the Inference Tutorial. This guide covers offline and online inference, as well as integration with vLLM.

Running MaxText on Multiple Hosts#

Google Kubernetes Engine (GKE) is the recommended way to run MaxText on multiple hosts. It provides a managed environment for deploying and scaling containerized applications, including those that require TPUs or GPUs. See Running MaxText with Cluster Toolkit or Running Maxtext with XPK for details.

Running MaxText in Notebooks#

You can run MaxText interactively using Jupyter notebooks, Google Colab, or Visual Studio Code. Refer to the Notebook Guide for instructions on setting up your notebook environment on TPUs.

Next steps: preflight optimizations#

After you get workloads running, there are optimizations you can apply to improve performance. For more information, see Optimization Tips.