1
Convex Optimization for Data Science
Gasnikov Alexander
gasnikov.av@mipt.ru
Lecture 6. Gradient-free methods. Coordinate descent
February, 2017
Convex Optimization for Data Science Gasnikov Alexander - - PowerPoint PPT Presentation
Convex Optimization for Data Science Gasnikov Alexander gasnikov.av@mipt.ru Lecture 6. Gradient-free methods. Coordinate descent February, 2017 1 Main books: Spall J.C. Introduction to stochastic search and optimization: estimation, simula- tion
1
February, 2017
2
3
4
n
x
*
N
2 2 2 2
,
x
E f x M
2 2 2
f y f x L y x
2 2
,
x
E f x f x D
2 2 2 2
2 2
2 2 2 2
2
2 2 2 2
2 2 2 2
2 2 2 2 2
5
x Q
x f x
x
2 2
x q
6
1= Mirr
k
k k k x x
k
k k x x Q
2 *,
*
*
*
*,
k
2 *,
1
N k k
2
2 2 2
* N
7
k k k k k k k k x
k k k k k k k k x
k k k k k k x x
k k
k
2 1 k n
k
n
k i
8
k
k k k k k k x x e
2 2 2 2 2 2
k
k k k k k k k e q q
2 2 2 2 2 2 2
k
k k k k k e q x q
2 2
k
k
2 2 2 2 2
k
k k k k k e q q
2 1 k n
2 1 k n
9
2 1 n
2 1 2
q q
2 2 1 2
2 2 2 2 2 2
q q
n
n
if
k k
k k k k k k k k k k e e
k
k k k k x e
10
k k x f x
k k x f x
1 1 1
1 1 * 1 ,...,
k N k k k k
N k k k k k k x x k x x
* N
2 2 * p
2 2 2 2
,
x
E f x M
2 2 2
f y f x L y x
2 2 2 1 2 2 4 q
2 2 1 2 2 3 q
2
2 2 2 2 2 3 2
2 2 2 2 2
11
2 2 * p
2 2 2 2
,
x
E f x M
2 2 2
f y f x L y x
2 2 2 2 2 q
2 2 2 2 2 q
2
2 2 2 2
2 2 2
2
2 2 2 2
3 2 2
12
1 1 2 2 2 1 2 2
k k k k k k k k k k x
1 2 1 k n
k
n
2 2 1 k n
1 2
k k k k
2 3 2 2
3 2 2
2
1 2
2 2
13
2
14
n
1 1 1 1 1 1 1 1 1 1
k
k k k k k k k k k u k k k k k k
1 1 1 1 1 1 1 1 1 1 1
k
k k k k k k k k k k y u k k k k k k
1
2 k k
2 1 2
k k
1 2
2 2 k k
2 1 2 2 2
k k
1 1
k k y f
15
2 2 2 2
,
x
E f x M
2 2 2
f y f x L y x
2 2
,
x
E f x f x D
2 2 2 2
2 2
2 2 2 2
2
2 2 2 2
2 2 2 2
2 2 2 2 2
k
1 k k
y f
n
16
1 q
1 1 2 q
.
1 m T k k k
1 m T k k
– s-sparse in average – for
17
1 n i i
i
n i
2 2 1 2 1 n i i i i
1 2 1
n i i i i i
i – norm in the corresponding
i
n
i i i
*,
i i i i i i i i
i f x
i f
1 n L i i
i i L
L
i
i
18
1 k
1 k i
1 1
1 1 1 1 1 1 1 1 1 1 1
k k k
k k k k k k k k k k k i k k k u i k
1 2 L
2 2 k k L
2 1 2 2 2
k k L L
1
1 1 1 1 1
k
k k k k k i k
1 1 1 1 k k k k k k
19
2 L L
2 *, L
n i i
are not available a priori). This can be combined
3 n
20
T
n
2
1
1
n k k
21
n
m
yF y x
j
i
22
i
max max 1 1 T n n
i ii
1
n L i i
2 * 2 CSTM
2 * 2 2 STM
23
max
1 1
n n i i i i
max 2 2
CSTM STM
max S
max S
2
24
1
m T k k x Q k
1 n i i i
k
1,..., T m
2 2 2 2
n
g x
, 2)
2 2 1 1
n
n k k x S k
25
2 2 2 2 2 2 2 2
m
T g g y
,
2 2 2 2 1
m
T n i y i
.
2
max 2 2 2 2 1, 1 1 1,..., 2
p p
T T STM k y x x k n
1
2 2 1,.., 2 2 2 1, 1 1 1,..., 1,...,
p p
k k m T CSTM y x x ij i m j n
26
i
2 2 1,.., 1
k k m
2 , 2
ij i j
1 m k k
have in average s nonzero elements in whole n-vector then 2 2 1,.., 1
k k m dual
2 1,..., 2 1
k k n primal
dual
1 primal
27
1
m T k k x Q k
2 2 2 2 1,.., 1,.., 1
k k k m k m
2 2 , , 2
ij ij i j i j
28