ICSA Colloquium - 22/04/2021

Title: Resource Efficient Machine Learning

Abstract: Training ML models is expensive in terms of time and cost; training jobs use expensive accelerators like GPUs or TPUs and often run for many hours or days. With a wide variety of ML model architectures and data sources (e.g., text, videos, graphs), training jobs also have varied resource requirements. This variety creates challenges in ensuring resources are efficiently used both within a single training job and across jobs in the entire cluster. In this talk, I will present system designs that can exploit the structure and properties of machine learning workloads to enable resource efficient execution. I will first introduce our work in designing adaptive algorithms that can detect critical epochs in ML training and minimize resource use during the other epochs. Following that I will describe our work in developing ML cluster schedulers that can effectively share resources across training jobs. I will conclude by outlining some remaining challenges in achieving resource efficient ML at scale.

Bio: Shivaram Venkataraman is an Assistant Professor in the Computer Science Department at University of Wisconsin, Madison. His research interests are in designing systems and algorithms for large scale data analysis and machine learning. Before coming to Madison, he was a post-doctoral researcher in the Systems Research Group at Microsoft Research in Redmond. Previously, he completed his PhD from UC Berkeley where he was advised by Ion Stoica and Mike Franklin. He is the recipient of a SACM Student Choice Professor of the Year Award and a Facebook Hardware and Software Systems Research Award.

Apr 22 2021 -

ICSA Colloquium - 22/04/2021

Shivaram Venkataraman (University of Wisconsin, Madison)