HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
Published in arXiv pre-print, 2025
Recommended citation: S. Ashkboos†, M. Nikdan†, S. Tabesh†, R. L. Castro, T. Hoefler, D. Alistarh. (2024). "HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs." arXiv pre-print. https://arxiv.org/abs/2501.02625
HALO introduces a quantization-aware training pipeline that inserts lightweight Hadamard rotations before every large matrix multiplication, smoothing activation and gradient outliers so the entire forward-and-backward pass can run in 8-bit (or lower) precision without accuracy loss. Combined with custom CUDA kernels and FSDP-compatible communication, HALO delivers up to 1.4 × end-to-end fine-tuning speed-ups on Llama models while remaining compatible with PEFT techniques.