Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Published in arXiv preprint, 2025

Recommended citation: R. L. Castro, A. Panferov, S. Tabesh, O. Sieberling, J. Chen, M. Nikdan, S. Ashkboos, D. Alistarh. (2025). "Quartet: Native FP4 Training Can Be Optimal for Large Language Models." arXiv preprint arXiv:2505.14669. https://arxiv.org/abs/2505.14669

We introduce Quartet, a new approach for training large language models (LLMs) directly in low-precision FP4 arithmetic, which significantly reduces computational costs. Our method overcomes the typical accuracy degradation of low-precision training and enables accurate, end-to-end FP4 training. Through optimized implementations for NVIDIA’s Blackwell GPUs, we show that Quartet achieves state-of-the-art accuracy for FP4, making it a viable and efficient alternative to standard-precision training.

Access paper here