HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs

Published in arXiv pre-print, 2025

Recommended citation: S. Ashkboos, M. Nikdan, S. Tabesh, R. L. Castro, T. Hoefler, D. Alistarh. (2024). "HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs." arXiv pre-print. https://arxiv.org/abs/2501.02625

HALO introduces a quantization-aware training pipeline that inserts lightweight Hadamard rotations before every large matrix multiplication, smoothing activation and gradient outliers so the entire forward-and-backward pass can run in 8-bit (or lower) precision without accuracy loss. Combined with custom CUDA kernels and FSDP-compatible communication, HALO delivers up to 1.4 × end-to-end fine-tuning speed-ups on Llama models while remaining compatible with PEFT techniques.

Access paper here