Landscape and training regimes in deep learning

TYPEStatistical & Bio Seminar
Speaker:Prof. Matthieu Wyart
Organizer:Yariv Kafri
Time:16:00 - 17:30

Deep learning algorithms are responsible for a technological revolution in a variety of tasks, yet understanding why they work remains a challenge.  Puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck in bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. I will review recent results on these questions based on analogies with physical systems and  scaling arguments testable on real data. For classification, the landscape in deep learning displays a sharp “jamming” transition and becomes glassy as the number of parameters is lowered. This transition also occurs in the packing problem of non-spherical particles. In the over-parametrized regime  where the landscape has many flat directions,  learning can operate in two regimes “Feature Learning” and “Lazy training” depending on the scale of initialisation. I will provide and test a quantitative explanation as to why performance increases  with the number of parameters in both regimes. I will discuss the relative merits of these regimes based on empirical evidence and simple models. If time permits, I will discuss empirical observations based on a maximal entropy model for diffeomorphisms supporting that stability toward smooth transformations  is critical to the success of  state of the art  architectures.



Part of the NSCS webinar series