Exploring the loss landscape of neural networks with thermal noise

TYPEStatistical & Bio Seminar
Speaker:Prof. Yohai Bar Sinai
Location:Lidow Nathan Rosen (300)

The training of neural networks is a complex, high-dimensional, non-convex and noisy optimization problem whose theoretical understanding is interesting both from an applicative perspective and for fundamental reasons. A core challenge is to understand the geometry and topography of the landscape that guides the optimization. The talk will focus on using Langevin dynamics to explore the "low energy" landscape of a neural model. We show that analyzing the fluctuation statistics, in analogy to thermal dynamics at a constant temperature, allows us to study the geometrical properties of the loss landscape and to infer a clear geometric description of the low-loss region. Specifically, we find that for over-parameterized fully connected network performing a classification task the low-loss region is a low-dimensional manifold whose dimension can be readily obtained from the fluctuations. Furthermore, this dimension is controlled by the number of data points that reside near the classification decision boundary. Importantly, we find that a quadratic approximation of the loss near the minimum is fundamentally inadequate due to the exponential nature of the decision boundary and the flatness of the low-loss region.