Landscape and training regimes in deep learning |
TYPE | Statistical & Bio Seminar |
Speaker: | Prof. Matthieu Wyart |
Affiliation: | EPFL |
Organizer: | Yariv Kafri |
Date: | 04.05.2021 |
Time: | 16:00 - 17:30 |
Abstract: | Deep learning algorithms are responsible for a technological revolution in a variety of tasks, yet understanding why they work remains a challenge. Puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck in bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. I will review recent results on these questions based on analogies with physical systems and scaling arguments testable on real data. For classification, the landscape in deep learning displays a sharp “jamming” transition and becomes glassy as the number of parameters is lowered. This transition also occurs in the packing problem of non-spherical particles. In the over-parametrized regime where the landscape has many flat directions, learning can operate in two regimes “Feature Learning” and “Lazy training” depending on the scale of initialisation. I will provide and test a quantitative explanation as to why performance increases with the number of parameters in both regimes. I will discuss the relative merits of these regimes based on empirical evidence and simple models. If time permits, I will discuss empirical observations based on a maximal entropy model for diffeomorphisms supporting that stability toward smooth transformations is critical to the success of state of the art architectures.
Part of the NSCS webinar series https://softmatterisrael.wixsite.com/nscs |