University of Texas at Austin

Past Event: Oden Institute Seminar

Toward a theory of optimization for deep learning

Mikhail Belkin, Halicioglu Data Science Institute, UCSD

3:30 – 5PM
Thursday Nov 12, 2020

Zoom Meeting

Abstract

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. In this talk I will discuss some general mathematical principles allowing for efficient optimization in over-parameterized non-linear systems, a setting that includes deep neural networks. Remarkably, it seems that optimization of such systems is "easy". In particular, optimization problems corresponding to these systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition allowing for efficient optimization by gradient descent/SGD. We connect the PL condition of these systems to the condition number associated to the tangent kernel and develop a non-linear theory parallel to classical analyses of over-parameterized linear equations. In a related but conceptually separate development, I will discuss a new perspective on the recently discovered remarkable phenomenon of the transition to linearity (constancy of NTK) in certain classes of large neural network. I will discuss how this transition to linearity results from the scaling of the network Hessian with the size of the network and will point out that the "lazy training" explanation cannot account for this phenomenon. Also, as we will discuss, while constancy of NTK is a convenient mathematical tool, it is not a general property of large networks and is not necessary for successful optimization. Joint work with Chaoyue Liu and Libin Zhu Bio Mikhail Belkin is a Professor at the Halicioglu Data Science Institute at the University of California, San Diego. He received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, and more recently, the “double descent” risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation. Mikhail Belkin is a recipient of a NSF Career Award and a number of best paper and other awards. He has served on the editorial boards of the Journal of Machine Learning Research, IEEE Pattern Analysis and Machine Intelligence and SIAM Journal on Mathematics of Data Science.
Toward a theory of optimization for deep learning

Event information

Date
3:30 – 5PM
Thursday Nov 12, 2020
Location Zoom Meeting
Hosted by Rachel Ward