Hi Palash, thanks for the read and response!
When fitting a model with SGDR, there are 3 main parameters, the number of cycles, the cycle length, and the cycle mult. The number of cycles refers to how many times we decrease the learning rate down our cosine function, the cycle length is over how many epochs we would like to span this decrease over, and the cycle mult is the multiplicative factor we wish to add for expanding how many epochs we wish to extend the decrease over.
It’s always best to look at examples so here are a couple.
Ex.1
Number of cycles = 3
Cycle Length = 1
Cycle Mult = 2
In this scenario we would have the first cycle of our learning rate decreasing over a single epoch (1 pass through our entire training dataset). The second cycle would span 2 epochs, and the third cycle would span 4 epochs. This results in a total of 7 epochs in all.
Ex. 2
Number of cycles = 2
Cycle Length = 2
Cycle Mult = 1
In this scenario we do not have a multiplicative factor, however, our cycle length is now increased. The first cycle would last 2 epochs, and the second cycle would also last 2 epochs, resulting in 4 epochs overall.
Does this answer your question? I can try to elaborate further if need be.