Optimized models tiering#
For each of the TPU platforms listed below, we present a list of optimized models[1] [2] for pre-training. If you’re getting started with MaxText, or want to push performance, we recommend choosing a Gold model, with an accompanying pre-training recipe.
Gold Tier: Fully Optimized Models certified to run with maximum efficiency on Cloud TPUs. They are thoroughly refined for the highest possible performance, making them ideal for production-critical workloads requiring peak throughput.
Silver Tier: High Performance Models that are well-optimized to deliver high, reliable performance on Cloud TPUs. They are effective for most use cases but may offer opportunities for expert tuning to achieve peak (Gold Tier) performance.
Trillium (v6e)#
Gold#
Silver#
v5p#
Gold#
Model |
Recipe |
Benchmark Configuration |
MFU |
Approx tokens/sec/device |
|---|---|---|---|---|
Llama 2 70B |
512 Chips, BF16, SL=4096 |
65.4% |
692 |
Silver#
Model |
Recipe |
Benchmark Configuration |
MFU |
Approx tokens/sec/device |
|---|---|---|---|---|
Mixtral 8X7B |
256 Chips(8x4x4), bf16, SL=4096 |
52.56% |
2,909 |