Monitoring and debugging#
🕵️ Features & Diagnostics
Diagnostic tools and features for monitoring MaxText.
☁️ GCP Observability
Observability for workloads running on Google Cloud Platform.
🚫 Hang Playbook
Troubleshooting guide for training hangs at megascale.
📈 Goodput
Monitoring efficient training time (Goodput).
📊 Logs & Metrics
Understanding MaxText logs and performance metrics.
📉 TensorBoard
Using Vertex AI TensorBoard for visualization.
⏱️ XProf
Profiling performance with XProf.