Exploring Stochastic Gradient Descent with Restarts (SGDR)

Mark Hoffmann
5 min readNov 16, 2017
Example of Cyclic Learning Rates: From paper Snapshot Ensembles

This is my first deep learning blog post. I started my deep learning journey around January of 2017 after I heard about fast.ai from a presentation at a ChiPy meetup I attended (the Chicago Python user group). It was my first introduction to the topic and the only thing I knew about neural networks before were simple multi layer perception (MLP) models that acted as decent non-linear approximators for various prediction problems. Once I started my first couple of lessons, I was completely hooked on the amount of knowledge exploding about his subfield, especially because I had recently received my masters degree in analytics and was trying to stay up to date on becoming the best data science practitioner I could be. I am currently going through the third fast.ai course, which is part1_v2 using a framework built on top of PyTorch. One of the very important skills I have been picking up from these courses is the ability to decompose research papers that are coming out in the field and understanding / implement them. One of the interesting concepts I have seen come out is the idea of Stochastic Gradient Descent with Restarts, or SGDR.

Before, we get too far into SGDR, let’s first briefly cover what is the idea behind normal stochastic gradient descent and why do we use it.

--

--

Mark Hoffmann

AI Engineer — Meta | Previously Chief Architect — Ubiety, AI/ML — NASA Jet Propulsion Lab / DARPA || AI / Software Engineering / Systems