HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs

Published in arXiv pre-print, 2025

Recommended citation: S. Ashkboos^†, M. Nikdan^†, S. Tabesh^†, R. L. Castro, T. Hoefler, D. Alistarh. (2024). "HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs." arXiv pre-print. https://arxiv.org/abs/2501.02625

HALO introduces a quantization-aware training pipeline that inserts lightweight Hadamard rotations before every large matrix multiplication, smoothing activation and gradient outliers so the entire forward-and-backward pass can run in 8-bit (or lower) precision without accuracy loss. Combined with custom CUDA kernels and FSDP-compatible communication, HALO delivers up to 1.4 × end-to-end fine-tuning speed-ups on Llama models while remaining compatible with PEFT techniques.

Access paper here

Share on

Twitter Facebook LinkedIn

Rush Tabesh

Share on