Bibliography

Pol64: B. Polyak. Some methods of speeding up the convergence of iteration methods. Ussr Computational Mathematics and Mathematical Physics, 4:1–17, 12 1964.
Nes83: Y. Nesterov. A method of solving a convex programming problem with convergence rate o(1/k)². Soviet Mathematics Doklady, 27(2):372–376, 1983.
DHS11: J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011.
HSS12: missing journal in hinton-2012-neural
KB15: D. Kingma and J. Ba. Adam: a method for stochastic optimization. In Int'l Conf. on Learning Representations (ICLR). 2015.
Yua08: Y. Yuan. Step-sizes for the gradient method. AMS/IP Studies in Advanced Mathematics, 42(2):785–796, 2008.
BB88: J. Barzilai and J. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis, 8(1):141–148, 1988. doi:10.1093/imanum/8.1.141.
LW19: T. Li and Z. Wan. New adaptive barzilai-borwein step size and its application in solving large-scale optimization problems. ANZIAM Journal, 61(1):76–98, 2019.
RS02: M. Raydan and B. Svaiter. Relaxed steepest descent and cauchy-barzilai-borwein method. Computational Optimization and Applications, 21(2):155–167, Feb 2002.
HCS06: G. Huang, L. Chen, and C. Siew. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. on neural networks, 17:879–92, 07 2006.