CV

Download CV as PDF

Education

Ph.D. in Computer Science (Machine Learning), Institute of Science and Technology Austria (ISTA), Sep 2024-present
- Advisor: Prof. Dan Alistarh
B.Sc. in Computer Science, Sharif University of Technology, Sep 2019-Jul 2024

Research Interests

Quantization-Aware Training (QAT)
Efficient Deep Learning
Parameter-Efficient Fine-Tuning (PEFT)
Neural Network Compression

Research Experience

Distributed Algorithms and Systems Laboratory, ISTA, Sep 2022-Oct 2023
- Research Intern (in-person)
- Projects:
  - RoSA: Robust Adaptation of Large Language Models
  - TACO: Task-Aware Compression for Vision Models
- Supervisor: Prof. Dan Alistarh
Bachelor Thesis - Sharif University of Technology, Jun 2023-Jul 2024
- Optimal Transport for Knowledge Distillation in Pruned Vision Models
- Supervisor: Prof. M. H. Yassaee

Work Experience

Data Scientist @ Divar, Mar 2023-Jul 2023
- Semantic retrieval of real estate ads
- Geo-search
ML Engineer @ Hezardastan, Mar 2022-Dec 2022
- Image retrieval of mathematical text

Skills

Technical Skills: Python, C/C++, CUDA, Julia, Java, SLURM
Machine Learning Frameworks:
- PyTorch (Advanced)
- Jax
- Ray
Languages:
- Persian (native)
- English (C2)
- German (A1)

Selected Courses

Graduate Courses:
- Foundations of Theoretical Deep Learning
- High Dimensional Statistics
- Introductory Natural Language Processing
- Statistical Methods and Time Series
- Differential Privacy
- Modern Machine Learning
- Numerical Methods and Algorithms
- Stochastic Processes

Teaching Experience

Teaching Assistant, Sharif University of Technology
- Deep Learning (Graduate Course)
- Data Structures and Algorithm Design
- Introduction to Statistics
- Advanced Programming

Extra Curricular Activities

Problem setter, Codeforces (2018)
- Educational Codeforces Round 57
Math Teacher/Mentor, Rastaa (non-profit education NGO), Tehran, 2019-2020

Publications

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training

S. Tabesh^†, M. Safaryan^†, A. Panferov, A. Volkova, D. Alistarh. (2025). "CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training." arXiv pre-print.

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

R. L. Castro, A. Panferov, S. Tabesh, O. Sieberling, J. Chen, M. Nikdan, S. Ashkboos, D. Alistarh. (2025). "Quartet: Native FP4 Training Can Be Optimal for Large Language Models." Advances in Neural Information Processing Systems (NeurIPS), 2025.

ASIDE: Architectural Separation of Instructions and Data in Language Models

E. Zverev, E. Kortukov, A. Panfilov, A. Volkova, S. Tabesh, S. Lapuschkin, W. Samek, C. H. Lampert. (2024). "ASIDE: Architectural Separation of Instructions and Data in Language Models." Workshop on Building Trust in LMs, ICLR 2025.

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

A. Panferov, J. Chen, S. Tabesh, R. L. Castro, M. Nikdan, D. Alistarh. (2025). "QuEST: Stable Training of LLMs with 1-Bit Weights and Activations." International Conference on Machine Learning (ICML), 2025.

HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs

S. Tabesh^†, S. Ashkboos^†, M. Nikdan^†, R. L. Castro, T. Hoefler, D. Alistarh. (2025). "HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs." Advances in Neural Information Processing Systems (NeurIPS), 2025.

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

E. Zverev, S. Abdelnabi, S. Tabesh, M. Fritz, C. H. Lampert. (2024). "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" International Conference on Learning Representations (ICLR) 2025.

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

S. Tabesh^†, M. Nikdan^†, E. Crnčević, D. Alistarh. (2024). "RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation." International Conference on Machine Learning (ICML), 2024.

TACO: Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

S. Tabesh^†, D. Kuznedelev^†, K. Noorbakhsh^†, E. Frantar^†, S. Beery, E. Kurtic, D. Alistarh. (2025). "TACO: Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression." Transactions on Machine Learning Research (TMLR), 2025.

References

Prof. Dan Alistarh
- Professor, Institute of Science and Technology Austria (ISTA)
- Email: dan.alistarh@ista.ac.at
Prof. Mohammad Hossein Yassaee
- Assistant Professor, EE Department, Sharif University of Technology
- Email: yassaee@sharif.edu

Rush Tabesh

CV