Lecture 2

Theory

Lecture slides:

Adaptive step-size: Here
Accelerated GD: Here

These hands-on exercesises focus on first augmenting the Gradient Descent (GD) algorithm by (i) implementing adpative step-size methods, as well as (ii) accelerated GD (AGD) methods, such Momentum (a.k.a. Polyak’s heavy ball [Pol64]) and Nesterov [Nes83] acceleration.

The second part focuses on augmenting the Stochastic GD (SGD) algorithm by implementing adaptive SGD variants, such AdaGrad [DHS11], RMSprop [HSS12], ADAM [KB15], etc. Such implementations will be tested on the multiclass classification problem (MNIST and CIFAR10 datasets) and compared to te previously developed SGD.

Contents:

Simulations