University of Texas at Austin

Past Event: Oden Institute Seminar

Machine learning meets dynamics: understanding large-learning rates, & variational Stiefel optimization

Molei Tao, Assistant Professor, School of Performance, Visualization & Fine Arts and Assistant Director for Project Development, Texas A&M Institute of Data Science, Texas A&M University

3:30 – 5PM
Tuesday Oct 4, 2022

POB 6.304 & Zoom

Abstract

 The interaction between machine learning and dynamics can lead to both new methodology for dynamics, and deepened understanding and/or efficacious algorithms for machine learning. This talk will focus on the latter.  Specifically, in half of the talk, I will describe some of the nontrivial (and pleasant) effects of large learning rates, which are often used in practical training of machine learning models but beyond traditional optimization theory. More precisely, I will first show how large learning rates can lead to quantitative escapes from local minima, via chaos, which is an alternative mechanism to commonly known noisy escapes due to stochastic gradients. I will then report how large learning rates provably bias toward flatter minimizers, which arguably generalize better.


 In the other half, I will report the construction of momentum-accelerated algorithms that optimize functions defined on Riemannian manifolds, focusing on a particular case known as Stiefel manifold. The treatment will be based on the design of continuous- and discrete-time dynamics. Two practical applications will also be described: (1) we markedly improved the performance of trained-from-scratch Vision Transformer by appropriately wiring orthogonality into its self-attention mechanism, and (2) our optimizer also makes the useful notion of Projection Robust Wasserstein Distance for high-dim. optimal transport even more effective.

Biography

Molei Tao received B.S. in Math & Physics from Tsinghua Univ., China, and Ph.D. in Control & Dynamical Systems from Caltech. He then worked as a postdoc at Caltech and as a Courant Instructor at NYU, before working as an assistant, and then associate professor in Georgia Tech. He is a recipient of W.P. Carey Ph.D. Prize in Applied Mathematics (2011), American Control Conference Best Student Paper Finalist (2013), NSF CAREER Award (2019), AISTATS best paper award (2020), IEEE EFTF-IFCS Best Student Paper Finalist (2021), and Cullen-Peck Scholar Award (2022).
 

Machine learning meets dynamics: understanding large-learning rates, & variational Stiefel optimization

Event information

Date
3:30 – 5PM
Tuesday Oct 4, 2022
Location POB 6.304 & Zoom
Hosted by Richard Tsai