Sarit Khirirat

Postdoctoral Fellow

KAUST

Biography

Sarit Khirirat received the Bechelors’ degree in Electrical Engineering from Chulalongkorn University, Thailand, in 2013; the Master’s degree in Systems, Control and Robotics from KTH Royal Institute of Technology, Sweden, in 2017; and the PhD at the Division of Decision and Control Systems from the same institution in 2022. During his PhD, he did research on optimization in communication-efficient applications under the supervision of Professor Mikael Johansson. This research program was funded by the Wallenberg AI, Autonomous Systems and Software program, Sweden’s largest individual research funding program. He is now a postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) advised by Professor Peter Richtárik.

Interests

Numerical Optimization
Machine Learning
Federated Learning
Energy-aware IoT Systems

Education

PhD at Division of Decision and Control, EECS, 2022
KTH Royal Institute of Technology
MSc in Systems, Control and Robotics, 2017
KTH Royal Insitute of Technology
BSc in Electrical Engineering, 2013
Chulalongkorn University

Experience

Summer Intern

Yokogawa, Thailand, Ltd.

Jun 2012 – Jul 2012 Bangkok, Thailand

Distributed control and automation systems for large-scale chemical processes

Programmed with CENTRUM VP, PLC, SCADA and AutoCAD
Installed communications networks and power lines of distributed control systems
Inspected electrical and field instrument panels, and their blueprints

Accomplishments

Featured Publications

Sarit Khirirat, Xiaoyu Wang, Sindri Magnússon, Mikael Johansson

January 2023 IEEE Transactions on Signal Processing (Accepted) 2023

Improved Step-Size Schedules for Proximal Noisy Gradient Methods

Noisy gradient algorithms have emerged as one of the most popular algorithms for distributed optimization with massive data. Choosing proper step-size schedules is an important task to tune in the algorithms for good performance. For the algorithms to attain fast convergence and high accuracy, it is intuitive to use large step-sizes in the initial iterations when the gradient noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient descent. However, similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can indeed be applied for a broad class of noisy gradient algorithms. Our analysis framework is based on two classes of systems that characterize the impact of the step-sizes on the convergence performance of many algorithms. Our results show that such step-size schedules enable these algorithms to enjoy the optimal rate. We exemplify our results on stochastic compression algorithms. Our experiments validate fast convergence of these algorithms with the step decay schedules.

Sarit Khirirat

August 2022 PhD Thesis:: Stockholm: KTH Royal Institute of Technology, 2022. , p. 233

First-Order Algorithms for Communication Efficient Distributed Learning

Innovations in numerical optimization, statistics and high performance computing have enabled tremendous advances in machine learning algorithms, fuelling applications from natural language processing to autonomous driving.To deal with increasing data volumes, and to keep the training times of increasingly complex machine learning models reasonable, modern optimization algorithms distribute both data and computations over a large number of machines. However, these algorithms face challenges, from hardware and data heterogeneity (as different machines have different processors and data) to privacy and security issues (where data can be extracted from the transmitted parameters).Among these challenges, communication overhead constitutes a major performance bottleneck of the algorithms.Communicating millions of problem parameters between machines has been reported to consumeup to 80% of the training time. To alleviate the communication bottleneck, we develop theory and strategies in this thesis to design communication-efficient optimization algorithms. In the first part, we provide unified analysis frameworks for first-order algorithms using direct or error-compensated compression. We first identify key characteristics of the compression errors induced by many of the most popular compression strategies in the literature. We then perform a unified analysis of the convergence properties of first-order algorithms for general families of deterministic and stochastic gradient compression algorithms.Our results give explicit expressions for how compression accuracy and the amount of asynchrony affect the step-sizes and guaranteed convergence times. We next turn our attention to error-compensated compression algorithms.We develop theoretical explanations for why error-compensated compression algorithms attain solutions with arbitrarily higher accuracy than direct compression algorithms.Our results provide strong convergence guarantees of error-compensated compression algorithms for distributed and federated learning problems. In the second part, we provide flexible tuning frameworks to optimize convergence performance of compression algorithms for a variety of system architectures. We start by analyzing data-dependent complexity that explains why direct compression algorithms are more communication-efficient than full-precision algorithms in practice. This complexity leads to automatic tuning strategies that enable popular compression algorithms on different communication networks to maximize both the convergence progress towards the solution and the communication efficiency. We then turn our attention to diminishing step-size schedules to maximize the convergence speed of the algorithms using noisy gradients.Our analysis framework is based on two classes of systems that characterize the impact of the step-sizes on the speed of noisy gradient algorithms.Our results show that such step-size schedules enable these algorithms to enjoy the optimal rate. Applications of the algorithms in the thesis to central machine learning problems on benchmark data validate our theoretical results.

Sarit Khirirat, Sindri Magnússon, Mikael Johansson

June 2022 ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Eco-Fedsplit: Federated Learning with Error-Compensated Compression

Federated learning is an emerging framework for collaborative machine-learning on devices which do not want to share local data. State-of-the art methods in federated learning reduce the communication frequency, but are not guaranteed to converge to the optimal model parameters. These methods also experience a communication bottleneck, especially when the devices are power-constrained and communicate over a shared medium. This paper presents ECO-FedSplit, an algorithm that increases the communication efficiency of federated learning without sacrificing solution accuracy. The key is to compress inter-device communication and to compensate for information losses in a theoretically justified manner. We prove strong convergence properties of ECO-FedSplit on strongly convex optimization problems and show that the algorithm yields a highly accurate solution with dramatically reduced communication. Extensive numerical experiments validate our theoretical result on real data sets.

Erik Berglund, Sarit Khirirat, Xiaoyu Wang

June 2022 ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Zeroth-order randomized subspace Newton methods

Zeroth-order methods have become important tools for solving problems where we have access only to function evaluations. However, the zeroth-order methods only using gradient approximations are n times slower than classical first-order methods for solving n-dimensional problems. To accelerate the convergence rate, this paper proposes the zeroth order randomized subspace Newton (ZO-RSN) method, which estimates projections of the gradient and Hessian by random sketching and finite differences. This allows us to compute the Newton step in a lower dimensional subspace, with small computational costs. We prove that ZO-RSN can attain lower iteration complexity than existing zeroth order methods for strongly convex problems. Our numerical experiments show that ZO-RSN can perform black-box attacks under a more restrictive limit on the number of function queries than the state-of-the-art Hessian-aware zeroth-order method.

Sarit Khirirat, Sindri Magnússon, Arda Aytekin, Mikael Johansson

March 2021 Proceedings of the AAAI Conference on Artificial Intelligence

A Flexible Framework for Communication-Efficient Machine Learning

With the increasing scale of machine learning tasks, it has become essential to reduce the communication between computing nodes. Early work on gradient compression focused on the bottleneck between CPUs and GPUs, but communication-efficiency is now needed in a variety of different system architectures, from high-performance clusters to energy-constrained IoT devices. In the current practice, compression levels are typically chosen before training and settings that work well for one task may be vastly suboptimal for another dataset on another architecture. In this paper, we propose a flexible framework which adapts the compression level to the true gradient at each iteration, maximizing the improvement in the objective function that is achieved per communicated bit. Our framework is easy to adapt from one technology to the next by modeling how the communication cost depends on the compression level for the specific technology. Theoretical results and practical experiments indicate that the automatic tuning strategies significantly increase communication efficiency on several state-of-the-art compression schemes.

Sarit Khirirat, Sindri Magnússon, Mikael Johansson

December 2020 IEEE Transactions on Signal Processing ( Volume: 69), 2020, p. 998 - 1011

Compressed Gradient Methods With Hessian-Aided Error Compensation

The emergence of big data has caused a dramatic shift in the operating regime for optimization algorithms. The performance bottleneck, which used to be computations, is now often communications. Several gradient compression techniques have been proposed to reduce the communication load at the price of a loss in solution accuracy. Recently, it has been shown how compression errors can be compensated for in the optimization algorithm to improve the solution accuracy. Even though convergence guarantees for error-compensated algorithms have been established, there is very limited theoretical support for quantifying the observed improvements in solution accuracy. In this paper, we show that Hessian-aided error compensation, unlike other existing schemes, avoids accumulation of compression errors on quadratic problems. We also present strong convergence guarantees of Hessian-based error compensation for stochastic gradient descent. Our numerical experiments highlight the benefits of Hessian-based error compensation, and demonstrate that similar convergence improvements are attained when only a diagonal Hessian approximation is used.

Sarit Khirirat

October 2019 Licentiate Thesis:: Stockholm: KTH Royal Institute of Technology, 2019. , p. 106

First-Order Algorithms for Communication Efficient Distributed Learning

Technological developments in devices and storages have made large volumes of data collections more accessible than ever. This transformation leads to optimization problems with massive data in both volume and dimension. In response to this trend, the popularity of optimization on high performance computing architectures has increased unprecedentedly. These scalable optimization solvers can achieve high efficiency by splitting computational loads among multiple machines. However, these methods also incur large communication overhead. To solve optimization problems with millions of parameters, communication between machines has been reported to consume up to 80% of the training time. To alleviate this communication bottleneck, many optimization algorithms with data compression techniques have been studied. In practice, they have been reported to significantly save communication costs while exhibiting almost comparable convergence as the full-precision algorithms. To understand this intuition, we develop theory and techniques in this thesis to design communication-efficient optimization algorithms. In the first part, we analyze the convergence of optimization algorithms with direct compression. First, we outline definitions of compression techniques which cover many compressors of practical interest. Then, we provide the unified analysis framework of optimization algorithms with compressors which can be either deterministic or randomized. In particular, we show how the tuning parameters of compressed optimization algorithms must be chosen to guarantee performance. Our results show explicit dependency on compression accuracy and delay effect due to asynchrony of algorithms. This allows us to characterize the trade-off between iteration and communication complexity under gradient compression. In the second part, we study how error compensation schemes can improve the performance of compressed optimization algorithms. Even though convergence guarantees of optimization algorithms with error compensation have been established, there is very limited theoretical support which guarantees improved solution accuracy. We therefore develop theoretical explanations, which show that error compensation guarantees arbitrarily high solution accuracy from compressed information. In particular, error compensation helps remove accumulated compression errors, thus improving solution accuracy especially for ill-conditioned problems. We also provide strong convergence analysis of error compensation on parallel stochastic gradient descent across multiple machines. In particular, the error-compensated algorithms, unlike direct compression, result in significant reduction in the compression error. Applications of the algorithms in this thesis to real-world problems with benchmark data sets validate our theoretical results.

Sarit Khirirat, Sindri Magnússon, Mikael Johansson

January 2019 ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Convergence Bounds for Compressed Gradient Methods with Memory Based Error Compensation

The veritable scale of modern data necessitates information compression in parallel/distributed big-data optimization. Compression schemes using memory-based error compensation have displayed superior performance in practice, however, to date there are no theoretical explanations for these observed advantages. This paper provides the first theoretical support for why such compression schemes yields higher accuracy solutions in optimization. Our results cover both gradient and incremental gradient algorithms for quadratic optimization. Unlike previous works, our theoretical results explicitly quantify the accuracy gains from error compensation, especially for ill-conditioned problems. Finally, the numerical results on linear least-squares problems validate the benefit of error compensation and demonstrate tightness of our convergence guarantees.

Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, Cédric Renggli

January 2018 Advances in Neural Information Processing Systems

The convergence of sparsified gradient methods

Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization, large-batch methods, and gradient sparsification, have been proposed. To date, gradient sparsification methods–where each node sorts gradients by magnitude, and only communicates a subset of the components, accumulating the rest locally–are known to yield some of the largest practical gains. Such methods can reduce the amount of communication per step by up tomph {three orders of magnitude}, while preserving model accuracy. Yet, this family of methods currently has no theoretical justification. This is the question we address in this paper. We prove that, under analytic assumptions, sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD. The main insight is that sparsification methods implicitly maintain bounds on the maximum impact of stale updates, thanks to selection by magnitude. Our analysis and empirical validation also reveal that these methods do require analytical conditions to converge well, justifying existing heuristics.

See all publications

Sarit Khirirat

Postdoctoral Fellow

KAUST

Biography

Interests

Education

Experience

Summer Intern

Yokogawa, Thailand, Ltd.

Accomplishments

AI rising star (ranked top 20%)

Best Student Paper Awards

Academic Doctoral Student

Featured Publications

Improved Step-Size Schedules for Proximal Noisy Gradient Methods

First-Order Algorithms for Communication Efficient Distributed Learning

Eco-Fedsplit: Federated Learning with Error-Compensated Compression

Zeroth-order randomized subspace Newton methods

A Flexible Framework for Communication-Efficient Machine Learning

Compressed Gradient Methods With Hessian-Aided Error Compensation

First-Order Algorithms for Communication Efficient Distributed Learning

Convergence Bounds for Compressed Gradient Methods with Memory Based Error Compensation

The convergence of sparsified gradient methods

Recent Publications

Improved Step-Size Schedules for Proximal Noisy Gradient Methods

First-Order Algorithms for Communication Efficient Distributed Learning

Eco-Fedsplit: Federated Learning with Error-Compensated Compression

Zeroth-order randomized subspace Newton methods

A Flexible Framework for Communication-Efficient Machine Learning

Recent Posts

Our recent work has been accepted to the IEEE TSP journal in 2023

PhD Thesis Defense

Our recent work has been accepted to ICASSP in 2021

Our recent work has been accepted to the 35th AAAI conference in 2021

Licentiate Thesis Presentation

Contact

Sarit Khirirat

Postdoctoral Fellow

KAUST

Biography

Interests

Education

Experience

Summer Intern

Yokogawa, Thailand, Ltd.

Accomplish­ments

AI rising star (ranked top 20%)

Best Student Paper Awards

Academic Doctoral Student

Featured Publications

Recent Publications

Recent Posts

Contact

Accomplishments