Effective Theory for Online Learning with Structured Data

Abstract:

Stochastic gradient descent (SGD) is a fundamental optimization technique in modern machine

learning, yet a comprehensive understanding of its exceptional performance remains a challenge.

Drawing on the rich history of this problem in statistical physics, which has provided insights into

simple neural networks with isotropic Gaussian data, this talk reviews existing results and

introduces a theory for SGD in high dimensions. Our theory extends to a broader class of models,

accommodating data with general covariance structures and loss functions. We present limiting

deterministic dynamics governed by low-dimensional order parameters, applicable to a spectrum

of optimization problems, including linear and logistic regression, as well as two-layer neural

networks. This framework also reveals the implicit bias in SGD. For each problem, the deterministic

equivalent of SGD allows us to derive an equation for the generalization error. Moreover, we

establish explicit conditions on the step size, ensuring the convergence and stability of SGD.