The Physics of Learnable Data |
| TYPE | Statistical & Bio Seminar |
| Speaker: | Noam Levi |
| Affiliation: | EPFL |
| Date: | 14.12.2025 |
| Time: | 12:00 - 13:00 |
| Location: | Lidow Nathan Rosen (300) |
| Abstract: | The equivalence between vastly different, complex physical systems, when observed from afar, allows us to make accurate predictions without analyzing the microscopic details.
Conversely, by reducing such systems to their minimal constituents, we can describe phenomena that would otherwise seem inscrutable.
In this talk, I will discuss how these notions of universality and reductionism extend beyond the natural universe, to the world of natural data.
First, I will review some of the major open problems in modern machine learning, where physics can make an impact. Next, I will focus on the role of data in learnability and study the “Gaussian” approximation of real-world datasets. I will then adopt a reductionist approach to motivate the use of a tractable hierarchical model which captures the compositional properties of real data, which seem to be universal. Next, I will discuss how the latent hierarchical structure of datasets can be probed by employing diffusion models and observables typically used in statistical physics. Finally, I will discuss future prospects, relating hierarchical compositionally with semantic structures in languages, and going beyond diffusion models. |