SLIDE 6 GD, SGD & batch SGD
Γ Gradient Descent (GD), choose πB randomly then iterate πCD" = πC β πβπΊ π΅, ππ , where π is the step size and βπΊ is the gradient of πΊ
πCD" = πC β πβπΊ π
Γ When dataset π΅ is large, computing βπΊ π΅, π is cumbersome Γ Stochastic Gradient Descent (SGD): at each iteration, update πC based on one row of π΅ β β,D" that is chosen uniformly at random πCD" = πC β πβπΊ π, ππ ,
randomly chosen data vector from A
π΅
π
Γ Batch SGD: choose a batch of π‘ < π data vectors uniformly at random πCD" = πC β πβπΊ π, ππ ,
random batch of data vectors
Γ SGD & Batch SGD can converge to πβ with a higher number of iterations
π΅
π sample 1 row at random sample batch of π‘ rows at random 6