Optimized models tiering

Contents

Optimized models tiering#

For each of the TPU platforms listed below, we present a list of optimized models[1] [2] for pre-training. If you’re getting started with MaxText, or want to push performance, we recommend choosing a Gold model, with an accompanying pre-training recipe.

Gold Tier: Fully Optimized Models certified to run with maximum efficiency on Cloud TPUs. They are thoroughly refined for the highest possible performance, making them ideal for production-critical workloads requiring peak throughput.
Silver Tier: High Performance Models that are well-optimized to deliver high, reliable performance on Cloud TPUs. They are effective for most use cases but may offer opportunities for expert tuning to achieve peak (Gold Tier) performance.

Trillium (v6e)#

Gold#

Model	Recipe	Benchmark Configuration	MFU	Approx tokens/sec/device
Llama 2 70B	Link	256, BF16, SL=4096	43.8%	900
Llama 3.1 8B	Link	256 Chips, BF16, SL=8192	45.46%	7,207
Llama 3.1 70B	Link	256 Chips, BF16, SL=8192	50.33%	960

Silver#

Model	Recipe	Benchmark Configuration	MFU	Approx tokens/sec/device
Llama 3.1 405B	Link	256 Chips, BF16, SL=8192	38.55%	123
Mixtral 8X7B	Link	256 Chips, BF16, SL=4096	35.23%	3,899
Mixtral 8X22B	Link	256 Chips, BF16, SL=4096	36.2%	1,326

v5p#

Gold#

Model	Recipe	Benchmark Configuration	MFU	Approx tokens/sec/device
Llama 2 70B	Link	512 Chips, BF16, SL=4096	65.4%	692

Silver#

Model	Recipe	Benchmark Configuration	MFU	Approx tokens/sec/device
Mixtral 8X7B	Link	256 Chips(8x4x4), bf16, SL=4096	52.56%	2,909