Ra RandomSh Shuffl fle B Beats SG SGD D after Finite Epoch chs
Jeff HaoChen Tsinghua University Suvrit Sra Massachusetts Institute of Technology
Ra RandomSh Shuffl fle B Beats SG SGD D after Finite Epoch - - PowerPoint PPT Presentation
Ra RandomSh Shuffl fle B Beats SG SGD D after Finite Epoch chs Tsinghua University Jeff HaoChen Suvrit Sra Massachusetts Institute of Technology In Intr troduc ductio tion Goal: to minimize the function In Intr troduc ductio
Jeff HaoChen Tsinghua University Suvrit Sra Massachusetts Institute of Technology
* " (!"$%)
5 = !"$% 5
− '∇)
67 " (!"$% 5
)
* " (!"$%)
5 = !"$% 5
− '∇)
67 " (!"$% 5
)
* " (!"$%)
5 = !"$% 5
− '∇)
67 " (!"$% 5
)
We call this SGD We call this RandomShuffle
SGD RandomShuffle
SGD RandomShuffle
(Recht and Ré, 2012)
" # = (&" '# − )")+
conjecture:
(Recht and Ré, 2012)
" # = (&" '# − )")+
conjecture:
(Recht and Ré, 2012)
" # = (&" '# − )")+
conjecture:
(Recht and Ré, 2012)
" # = (&" '# − )")+
conjecture:
" # :
" #
" # :
" #
" # :
" #
" #$
" # convergence rate
" #$
" # convergence rate
" #$
" # convergence rate
We analyze RandomShuffle in the following settings:
We analyze RandomShuffle in the following settings:
Dheeraj Nagaraj et el. get rid
We analyze RandomShuffle in the following settings:
this talk
We analyze RandomShuffle in the following settings:
" # ? E.g., ! " #$%& ?
" # ? E.g., ! " #$%& ?
" # ? E.g., ! " #$%& ?
' ∑)*& '
+
) %
) % = , &
3 455,
&
3 787#.
' ∑)*& '
+
) %
) % = , &
3 455,
&
3 787#.
"# − "∗
& =
I − )* + x- − x∗
&
+ ! ∑012
#
−1 4 0 ) 5 − )* #60*7
&
:
1 − );9 &#<9
& ,
> = )& ∑912
:
?9
&;9 &!
∑012
#
−1 4 0 1 − );9 #60 &
2 #
P Q
)@ 2 − );9 = 1 ;9 + A(1)
"# − "∗
& =
I − )* + x- − x∗
&
+ ! ∑012
#
−1 4 0 ) 5 − )* #60*7
&
:
1 − );9 &#<9
& ,
> = )& ∑912
:
?9
&;9 &!
∑012
#
−1 4 0 1 − );9 #60 &
2 #
P Q
)@ 2 − );9 = 1 ;9 + A(1)
"# − "∗
& =
I − )* + x- − x∗
&
+ ! ∑012
#
−1 4 0 ) 5 − )* #60*7
&
:
1 − );9 &#<9
& ,
> = )& ∑912
:
?9
&;9 &!
∑012
#
−1 4 0 1 − );9 #60 &
2 #
P Q
)@ 2 − );9 = 1 ;9 + A(1)
"# − "∗
& =
I − )* + x- − x∗
&
+ ! ∑012
#
−1 4 0 ) 5 − )* #60*7
&
:
1 − );9 &#<9
& ,
> = )& ∑912
:
?9
&;9 &!
∑012
#
−1 4 0 1 − );9 #60 &
2 #
P Q
)@ 2 − );9 = 1 ;9 + A(1)
"# − "∗
& =
I − )* + x- − x∗
&
+ ! ∑012
#
−1 4 0 ) 5 − )* #60*7
&
:
1 − );9 &#<9
& ,
> = )& ∑912
:
?9
&;9 &!
∑012
#
−1 4 0 1 − );9 #60 &
2 #
P Q
)@ 2 − );9 = 1 ;9 + A(1)
Cannot be true for different ;9!
" #
% #& , RandomShuffle behaves betterJ
Long Time: !
" #&
Short Time: !
" #
What happens in between?
" #
% #& , RandomShuffle behaves betterJ
Long Time: !
" #&
Short Time: !
" #
What happens in between?
For general second order differentiable functions with Lipschitz Hessian:
" 1 $% + '( $( " 1 $ RandomShuffle is provably better than SGD after " ' epochs!
" 1 $% + '( $( " 1 $ RandomShuffle is provably better than SGD after " ' epochs!
We analyze RandomShuffle in the following settings:
! " = $
%&' (
)
%("+,)
7 ≠ ∅
: = max
'>%>( |{. 7 ∶ .% ∩ . 7 ≠ ∅}|
2
! " = $
%&' (
)
%("+,)
7 ≠ ∅
: = max
'>%>( |{. 7 ∶ .% ∩ . 7 ≠ ∅}|
2
1 " ≤ $ ≤ 1
' ( , there is a & ' )* convergence rate!
1 " ≤ $ ≤ 1
' ( , there is a & ' )* convergence rate!
1 " ≤ $ ≤ 1
' ( , there is a & ' )* convergence rate!
We analyze RandomShuffle in the following settings:
!
" #∗ = 0,
∀ )
initial upper bound on distance 1
" is ,"-strongly convex, -"-Lipschitz continuous
" 3 #∗ = 0
!
" #∗ = 0,
∀ )
initial upper bound on distance 1
" is ,"-strongly convex, -"-Lipschitz continuous
" 3 #∗ = 0
!
" #∗ = 0,
∀ )
initial upper bound on distance 1
" is ,"-strongly convex, -"-Lipschitz continuous
" 3 #∗ = 0
RandomShuffle is provably better than SGD after ANY number of iterations!