Regularization with Lipschitz Loss Pierre Alquier Sequential, - - PowerPoint PPT Presentation

regularization with lipschitz loss
SMART_READER_LITE
LIVE PREVIEW

Regularization with Lipschitz Loss Pierre Alquier Sequential, - - PowerPoint PPT Presentation

Motivation Oracle inequalities Applications Regularization with Lipschitz Loss Pierre Alquier Sequential, structured, and/or statistical learning IHES - May 17, 2017 Pierre Alquier Regularized Procedures with Lipschitz Loss Functions


slide-1
SLIDE 1

Motivation Oracle inequalities Applications

Regularization with Lipschitz Loss

Pierre Alquier Sequential, structured, and/or statistical learning IHES - May 17, 2017

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-2
SLIDE 2

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Motivation : user ratings

Stan 7 3 8 Pierre 8 10 9 10 9 10 10 10 8 Zoe 8 3 7 Bob 6 4 2 Oscar 6 10 7 Léa 8 4 9 Tony 9 3 4 8

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-3
SLIDE 3

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Motivation : user ratings

Stan 7 3 8 Pierre 8 10 9 10 9 10 10 10 8 Zoe 8 3 7 Bob 6 4 2 ? ? ? Oscar 6 10 7 Léa 8 4 9 Tony 9 3 4 8

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-4
SLIDE 4

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Motivation : user ratings

Stan 7 3 8 Pierre 8 10 9 10 9 10 10 10 8 Zoe 8 3 7 Bob 6 4 2 7 Oscar 6 10 7 Léa 8 4 9 Tony 9 3 4 8

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-5
SLIDE 5

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

A possible model

Notation : A, BF = Tr(ATB). Let Ej,k be the matrix with zeros everywhere except the (j, k)-th entry equal to 1.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-6
SLIDE 6

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

A possible model

Notation : A, BF = Tr(ATB). Let Ej,k be the matrix with zeros everywhere except the (j, k)-th entry equal to 1. Observations : Yi = M∗, XiF + εi, E(εi) = 0 Xi takes values in the set of matrices {Ej,k}.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-7
SLIDE 7

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

A possible model

Notation : A, BF = Tr(ATB). Let Ej,k be the matrix with zeros everywhere except the (j, k)-th entry equal to 1. Observations : Yi = M∗, XiF + εi, E(εi) = 0 Xi takes values in the set of matrices {Ej,k}. Idea : M∗ is (approximately) low-rank.

  • E. Candès & T. Tao (2009). The power of convex relaxation : Near-optimal matrix completion.

IEEE Trans. Info. Theory.

  • E. Candès & Y. Plan (2010). Matrix completion with noise. Proceedings of the IEEE.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-8
SLIDE 8

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Penalized ERM

First idea : ˆ M ∈ arg min

  • 1

N

N

  • i=1

(Yi − M, XiF)2 + λ.rank(M)

  • but the rank is not convex...

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-9
SLIDE 9

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Penalized ERM

First idea : ˆ M ∈ arg min

  • 1

N

N

  • i=1

(Yi − M, XiF)2 + λ.rank(M)

  • but the rank is not convex...

ˆ M ∈ arg min

  • 1

N

N

  • i=1

(Yi − M, XiF)2 + λM∗

  • Minimax rates of convergence derived in
  • V. Koltchinskii, K. Lounici, & A. Tsybakov (2011) Nuclear-norm penalization and optimal rates

for noisy low-rank matrix completion. Annals of Statistics.

  • O. Klopp (2014). Noisy low-rank matrix completion with general sampling distribution. Bernoulli.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-10
SLIDE 10

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Is the quadratic loss always a good idea ?

Stan 7 3 8 Pierre 8 10 9 10 9 10 10 10 8 Zoe 8 3 7 Bob 6 4 2 Oscar 6 10 7 Léa 8 4 9 Tony 9 3 4 8

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-11
SLIDE 11

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Is the quadratic loss always a good idea ?

Stan 7 3 8 Pierre 8 10 9 10 9 10 10 10 8 Zoe 8 3 7 Bob 6 4 2 ? ? ? Oscar 6 10 7 Léa 8 4 9 Tony 9 3 4 8

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-12
SLIDE 12

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Is the quadratic loss always a good idea ?

Stan 7 3 8 Pierre 8 10 9 10 9 10 10 10 8 Zoe 8 3 7 Bob 6 4 2 [6,8] Oscar 6 10 7 Léa 8 4 9 Tony 9 3 4 8

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-13
SLIDE 13

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

The quantile loss

... suggests to replace the quadratic loss by the quantile loss ℓτ(f (x), y) = (y − f (x))[τ − 1(y − f (x) ≤ 0)].

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-14
SLIDE 14

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

The quantile loss

... suggests to replace the quadratic loss by the quantile loss ℓτ(f (x), y) = (y − f (x))[τ − 1(y − f (x) ≤ 0)]. ˆ M ∈ arg min

  • 1

N

N

  • i=1

ℓτ(M, XiF , Yi) + λM∗

  • Source : http ://www.lokad.com/

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-15
SLIDE 15

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

1-bit matrix completion

Stan Pierre Zoe Bob Oscar Léa Tony

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-16
SLIDE 16

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

1-bit matrix completion

Stan Pierre Zoe Bob ? ? ? Oscar Léa Tony

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-17
SLIDE 17

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

1-bit matrix completion

Stan Pierre Zoe Bob Oscar Léa Tony

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-18
SLIDE 18

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

1-bit matrix completion

ˆ M ∈ arg min

  • 1

N

N

  • i=1

1(sign(M, XiF) = Yi) + λM∗

  • Pierre Alquier

Regularized Procedures with Lipschitz Loss Functions

slide-19
SLIDE 19

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

1-bit matrix completion

ˆ M ∈ arg min

  • 1

N

N

  • i=1

1(sign(M, XiF) = Yi) + λM∗

  • Problem : the indicator function is not convex.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-20
SLIDE 20

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

1-bit matrix completion

ˆ M ∈ arg min

  • 1

N

N

  • i=1

1(sign(M, XiF) = Yi) + λM∗

  • Problem : the indicator function is not convex.

ˆ M ∈ arg min

  • 1

N

N

  • i=1

ℓ(M, XiF , Yi) + λM∗

  • logistic loss ℓ(y ′, y) = log(1 + exp(−y ′y))
  • J. Laffond, O. Klopp, E. Moulines & J. Salmon (2014). Probabilistic low-rank matrix

completion on finite alphabets. NIPS. Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-21
SLIDE 21

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

1-bit matrix completion

ˆ M ∈ arg min

  • 1

N

N

  • i=1

1(sign(M, XiF) = Yi) + λM∗

  • Problem : the indicator function is not convex.

ˆ M ∈ arg min

  • 1

N

N

  • i=1

ℓ(M, XiF , Yi) + λM∗

  • logistic loss ℓ(y ′, y) = log(1 + exp(−y ′y))
  • J. Laffond, O. Klopp, E. Moulines & J. Salmon (2014). Probabilistic low-rank matrix

completion on finite alphabets. NIPS.

hinge loss ℓ(y ′, y) = (1 − y ′y)+ etc.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-22
SLIDE 22

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Lipschitz losses

All the aforementionned losses : hinge, logistic, quantile are Lipschitz. And so are other popular losses : Huber, ...

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-23
SLIDE 23

Motivation Oracle inequalities Applications Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

Outline of the talk

1

Motivation Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

2

Oracle inequalities Notations and overview The main ingredients Sharp oracle inequality

3

Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-24
SLIDE 24

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Outline of the talk

1

Motivation Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

2

Oracle inequalities Notations and overview The main ingredients Sharp oracle inequality

3

Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-25
SLIDE 25

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Notations

Pairs (X1, Y1), . . . , (XN, YN) in X × R i.i.d from P.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-26
SLIDE 26

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Notations

Pairs (X1, Y1), . . . , (XN, YN) in X × R i.i.d from P. A space E ⊆ L2(P) of functions f : X → R equipped with a norm · , generally different from · L2. A convex F ⊆ E.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-27
SLIDE 27

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Notations

Pairs (X1, Y1), . . . , (XN, YN) in X × R i.i.d from P. F ⊆ E ⊆ L2(P), (E, · ). A loss function ℓ that is 1-Lipschitz : |ℓ(f1(x), y) − ℓ(f2(x), y)| ≤ |f1(x) − f2(x)|.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-28
SLIDE 28

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Notations

Pairs (X1, Y1), . . . , (XN, YN) in X × R i.i.d from P. F ⊆ E ⊆ L2(P), (E, · ). A loss function ℓ

  • racle

f ∗ ∈ arg min

f ∈F EP[ℓ(f (X), Y )]

  • =R(f )

.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-29
SLIDE 29

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Notations

Pairs (X1, Y1), . . . , (XN, YN) in X × R i.i.d from P. F ⊆ E ⊆ L2(P), (E, · ). A loss function ℓ

  • racle f ∗ ∈ arg minf ∈F R(f ).

estimator : Penalized ERM ˆ f ∈ arg min

f ∈F

  • 1

N

N

  • i=1

ℓ(f (Xi), Yi) + λf

  • .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-30
SLIDE 30

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Three main ingredients to study ˆ f

The Bernstein condition with parameters A and κ quantifies the “identifiability” of f ∗.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-31
SLIDE 31

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Three main ingredients to study ˆ f

The Bernstein condition with parameters A and κ . The complexity parameter comp(B) measures the “size”

  • r “complexity” of (the unit ball B of) E. Allows to define

the complexity function r(ρ) = ρAcomp(B) √ N 1

.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-32
SLIDE 32

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Three main ingredients to study ˆ f

The Bernstein condition with parameters A and κ . The complexity function r(ρ) = ρAcomp(B) √ N 1

. The sparsity function ∆(·) measures the size of the sub-differential of · in a ρ-neighborhood of f ∗. Find a solution ρ∗ to the sparsity equation ∆(ρ∗) ≥ (4/5)ρ∗.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-33
SLIDE 33

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Three main ingredients to study ˆ f

The Bernstein condition with parameters A and κ . The complexity function r(ρ) = ρAcomp(B) √ N 1

. Find a solution ρ∗ to the sparsity equation ∆(ρ∗) ≥ (4/5)ρ∗. Then with high probability, ˆ f − f ∗ ≤ ρ∗, ˆ f − f ∗L2 ≤ r(2ρ∗), R(ˆ f ) − R(f ∗) [r(2ρ∗)]2κ.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-34
SLIDE 34

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition

The Bernstein condition There is κ ≥ 1 and A > 0 such that ∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-35
SLIDE 35

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and strongly convex losses

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-36
SLIDE 36

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and strongly convex losses

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

  • P. Bartlett, M. Jordan & J. McAuliffe (2006). Convexity, classification and risk bounds. JASA.

Theorem ℓ is strongly convex ⇒ condition satisfied with κ = 1.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-37
SLIDE 37

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and strongly convex losses

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

  • P. Bartlett, M. Jordan & J. McAuliffe (2006). Convexity, classification and risk bounds. JASA.

Theorem ℓ is strongly convex ⇒ condition satisfied with κ = 1.

Proof : Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-38
SLIDE 38

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and strongly convex losses

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

  • P. Bartlett, M. Jordan & J. McAuliffe (2006). Convexity, classification and risk bounds. JASA.

Theorem ℓ is strongly convex ⇒ condition satisfied with κ = 1.

Proof : ℓ(f (X), Y ) + ℓ(f ∗(X), Y ) 2 − ℓ

  • f (X) + f ∗(X)

2 , Y

  • ≥ α[f (X) − f ∗(X)]2.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-39
SLIDE 39

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and strongly convex losses

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

  • P. Bartlett, M. Jordan & J. McAuliffe (2006). Convexity, classification and risk bounds. JASA.

Theorem ℓ is strongly convex ⇒ condition satisfied with κ = 1.

Proof : ℓ(f (X), Y ) + ℓ(f ∗(X), Y ) 2 − ℓ

  • f (X) + f ∗(X)

2 , Y

  • ≥ α[f (X) − f ∗(X)]2.

R(f ) + R(f ∗) 2 − R

  • f + f ∗

2

  • ≥ αf − f ∗2

L2 .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-40
SLIDE 40

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and strongly convex losses

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

  • P. Bartlett, M. Jordan & J. McAuliffe (2006). Convexity, classification and risk bounds. JASA.

Theorem ℓ is strongly convex ⇒ condition satisfied with κ = 1.

Proof : ℓ(f (X), Y ) + ℓ(f ∗(X), Y ) 2 − ℓ

  • f (X) + f ∗(X)

2 , Y

  • ≥ α[f (X) − f ∗(X)]2.

R(f ) + R(f ∗) 2 − R

  • f + f ∗

2

  • ≥R(f ∗)

≥ αf − f ∗2

L2 .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-41
SLIDE 41

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and strongly convex losses

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

  • P. Bartlett, M. Jordan & J. McAuliffe (2006). Convexity, classification and risk bounds. JASA.

Theorem ℓ is strongly convex ⇒ condition satisfied with κ = 1.

Proof : ℓ(f (X), Y ) + ℓ(f ∗(X), Y ) 2 − ℓ

  • f (X) + f ∗(X)

2 , Y

  • ≥ α[f (X) − f ∗(X)]2.

R(f ) + R(f ∗) 2 − R

  • f + f ∗

2

  • ≥R(f ∗)

≥ αf − f ∗2

L2 .

R(f ) − R(f ∗) ≥ 2αf − f ∗2

L2 .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-42
SLIDE 42

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and the hinge loss

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-43
SLIDE 43

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and the hinge loss

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

  • G. Lecué (2006). Optimal Rates of Aggregation in Classification Under Low Noise Assumption.

PhD Thesis.

Theorem Y ∈ {−1, 1}, η(x) := E(Y |X = x) and f ∗(x) = sign(η(x)). |η(X)| ≥ τ > 0 a.s. ⇒ Bernstein condition with κ ≥ 1.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-44
SLIDE 44

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The Bernstein condition and the hinge loss

∀f ∈ F, f − f ∗2κ

L2 ≤ A[R(f ) − R(f ∗)].

  • G. Lecué (2006). Optimal Rates of Aggregation in Classification Under Low Noise Assumption.

PhD Thesis.

Theorem Y ∈ {−1, 1}, η(x) := E(Y |X = x) and f ∗(x) = sign(η(x)). |η(X)| ≥ τ > 0 a.s. ⇒ Bernstein condition with κ ≥ 1. P(|η(X)| ≤ t) ≤ ct

1 κ−1 with κ > 1 ⇒ Bernstein. Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-45
SLIDE 45

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The complexity parameter 1 - the bounded case

Let us assume that supf ∈F f ∞ ≤ b.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-46
SLIDE 46

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The complexity parameter 1 - the bounded case

Let us assume that supf ∈F f ∞ ≤ b. Ex : matrix completion case, Xi ∈ {Ej,k} and F =

  • M, ·F , sup

i,j

|Mi,j| ≤ b

  • .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-47
SLIDE 47

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The complexity parameter 1 - the bounded case

Let us assume that supf ∈F f ∞ ≤ b. Ex : matrix completion case, Xi ∈ {Ej,k} and F =

  • M, ·F , sup

i,j

|Mi,j| ≤ b

  • .

Rademacher complexity In this case we define, for B the unit ball in E, comp(B) = E sup

f ∈B

  • 1

√ N

N

  • i=1

ǫif (Xi)

  • , (ǫi) i.i.d Rademacher.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-48
SLIDE 48

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The complexity parameter 2 - subgaussian case

Put H = {f − g, (f , g) ∈ F 2}. Assume that ∀h ∈ H, ∀λ, E exp

  • λ|h(X)|

hL2

  • ≤ exp
  • λ2L2

.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-49
SLIDE 49

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The complexity parameter 2 - subgaussian case

Put H = {f − g, (f , g) ∈ F 2}. Assume that ∀h ∈ H, ∀λ, E exp

  • λ|h(X)|

hL2

  • ≤ exp
  • λ2L2

. Ex : X is Gaussian and F = {t, · , t ∈ Rp} .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-50
SLIDE 50

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The complexity parameter 2 - subgaussian case

Put H = {f − g, (f , g) ∈ F 2}. Assume that ∀h ∈ H, ∀λ, E exp

  • λ|h(X)|

hL2

  • ≤ exp
  • λ2L2

. Ex : X is Gaussian and F = {t, · , t ∈ Rp} . Gaussian mean width (Gh)h∈E canonical Gaussian process, comp(B) = E sup

h∈B

Gh.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-51
SLIDE 51

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The complexity function

The complexity function r(ρ) := Aρcomp(B) √ N 1

.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-52
SLIDE 52

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The sparsity equation

Example : the · 1 penalty. Idea : t∗ sparse (easier to estimate) ↔ ∂ · (t∗) is a large set.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-53
SLIDE 53

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The sparsity equation

The sparsity parameter ∆(ρ) := inf

h∈ρS∩r(2ρ)BL2

sup

f ∈∂·(f ∗)

h, f where BL2 is the unit ball in L2 and S is the unit sphere in E.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-54
SLIDE 54

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

The sparsity equation

The sparsity parameter ∆(ρ) := inf

h∈ρS∩r(2ρ)BL2

sup

f ∈∂·(f ∗)

h, f where BL2 is the unit ball in L2 and S is the unit sphere in E. The sparsity equation Find (the smallest possible) ρ∗ such that ∆(ρ∗) ≥ (4/5)ρ∗

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-55
SLIDE 55

Motivation Oracle inequalities Applications Notations and overview The main ingredients Sharp oracle inequality

Sharp oracle inequality

C stands for a constant that depends on A, κ, ... and may change from line to line. Theorem Take λ = 720comp(B)/(7 √ N). Then with probability at least 1 − C exp

  • −CN

1 2κ (ρ∗comp(B)) 2κ−1 κ

  • we have simultaneously

ˆ f − f ∗ ≤ ρ∗, ˆ f − f ∗L2 ≤ r(2ρ∗), R(ˆ f ) − R(f ∗) ≤ C[r(2ρ∗)]2κ.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-56
SLIDE 56

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Outline of the talk

1

Motivation Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

2

Oracle inequalities Notations and overview The main ingredients Sharp oracle inequality

3

Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-57
SLIDE 57

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Outline of the talk

1

Motivation Matrix completion : the L2 point of view Matrix completion : Lipschitz losses ?

2

Oracle inequalities Notations and overview The main ingredients Sharp oracle inequality

3

Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-58
SLIDE 58

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : context

E = F = {t, · , t ∈ Rp} equipped with · = · 1. Logistic LASSO ˆ f ∈ arg min

f ∈F

  • 1

N

N

  • i=1

log(1 − exp(−Yif (Xi))) + λf 1

  • .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-59
SLIDE 59

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : Bernstein & complexity

Assume that X ∼ N(0, Ip).

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-60
SLIDE 60

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : Bernstein & complexity

Assume that X ∼ N(0, Ip). Bernstein condition satisfied with κ = 1.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-61
SLIDE 61

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : Bernstein & complexity

Assume that X ∼ N(0, Ip). Bernstein condition satisfied with κ = 1. comp(B) = E sup

t1≤1

t, X = EX∞ ∼

  • log(p).

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-62
SLIDE 62

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : Bernstein & complexity

Assume that X ∼ N(0, Ip). Bernstein condition satisfied with κ = 1. comp(B) = E sup

t1≤1

t, X = EX∞ ∼

  • log(p).

r(ρ) = ρAcomp(B) √ N 1

  • ρ
  • log(p)

√ N 1

2

.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-63
SLIDE 63

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : sparsity

Sparsity parameter ∆(ρ) = inf

h∈ρS∩r(2ρ)BL2

sup

f ∈∂·1(f ∗)

h, f

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-64
SLIDE 64

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : sparsity

Sparsity parameter ∆(ρ) = inf

h∈ρS∩r(2ρ)BL2

sup

f ∈∂·1(f ∗)

h, f

f ∈ ∂ · 1(f ∗) ⇔      fj = +1 when f ∗

j

> 0, fj = −1 when f ∗

j

< 0, fj ∈ [−1, +1] when f ∗

j

= 0. Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-65
SLIDE 65

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : sparsity

Sparsity parameter ∆(ρ) = inf

h∈ρS∩r(2ρ)BL2

sup

f ∈∂·1(f ∗)

h, f

f ∈ ∂ · 1(f ∗) ⇔      fj = +1 when f ∗

j

> 0, fj = −1 when f ∗

j

< 0, fj ∈ [−1, +1] when f ∗

j

= 0. Choose h and define P as the projector on the sparsity pattern of f ∗. Let s denote the sparsity of f ∗. h, f = (I − P)h, f + Ph, f ≥ (I − P)h1 − Ph1

  • f well chosen

= h1 − 2Ph1 = ρ − 2Ph1 Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-66
SLIDE 66

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : sparsity

Sparsity parameter ∆(ρ) = inf

h∈ρS∩r(2ρ)BL2

sup

f ∈∂·1(f ∗)

h, f

f ∈ ∂ · 1(f ∗) ⇔      fj = +1 when f ∗

j

> 0, fj = −1 when f ∗

j

< 0, fj ∈ [−1, +1] when f ∗

j

= 0. Choose h and define P as the projector on the sparsity pattern of f ∗. Let s denote the sparsity of f ∗. h, f = (I − P)h, f + Ph, f ≥ (I − P)h1 − Ph1

  • f well chosen

= h1 − 2Ph1 = ρ − 2Ph1 Ph1 ≤ √sPh2 ≤ √sh2 ≤ √sr(2ρ) Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-67
SLIDE 67

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : sparsity

Sparsity parameter ∆(ρ) = inf

h∈ρS∩r(2ρ)BL2

sup

f ∈∂·1(f ∗)

h, f

f ∈ ∂ · 1(f ∗) ⇔      fj = +1 when f ∗

j

> 0, fj = −1 when f ∗

j

< 0, fj ∈ [−1, +1] when f ∗

j

= 0. Choose h and define P as the projector on the sparsity pattern of f ∗. Let s denote the sparsity of f ∗. h, f = (I − P)h, f + Ph, f ≥ (I − P)h1 − Ph1

  • f well chosen

= h1 − 2Ph1 = ρ − 2Ph1 Ph1 ≤ √sPh2 ≤ √sh2 ≤ √sr(2ρ)

Sparsity equation ∆(ρ) ≥ (4/5)ρ ⇔ ρ such that ρ r(2ρ) ≥ C√s.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-68
SLIDE 68

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : solving the sparsity equation

r(ρ) ∼

  • ρ
  • log(p)

√ N 1

2

. C√s ≤ ρ r(2ρ) ∼

  • ρ

√ N

  • log(p)
  • .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-69
SLIDE 69

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : solving the sparsity equation

r(ρ) ∼

  • ρ
  • log(p)

√ N 1

2

. C√s ≤ ρ r(2ρ) ∼

  • ρ

√ N

  • log(p)
  • .

ρ∗ ∼ s

  • log(p)

N .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-70
SLIDE 70

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : solving the sparsity equation

r(ρ) ∼

  • ρ
  • log(p)

√ N 1

2

. C√s ≤ ρ r(2ρ) ∼

  • ρ

√ N

  • log(p)
  • .

ρ∗ ∼ s

  • log(p)

N . r(ρ∗) ∼

  • s log(p)

N .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-71
SLIDE 71

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic LASSO : conclusion

Theorem Take λ ∼

  • log(p)/N. Then with probability at least

1 − C exp [−Cs log(p)] we have simultaneously ˆ f − f ∗1 ≤ Cs

  • log(p)

N , ˆ f − f ∗2 ≤ C

  • s log(p)

N , R(ˆ f ) − R(f ∗) ≤ C s log(p) N .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-72
SLIDE 72

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

The SLOPE penalty

LASSO SLOPE t p

i=1 |ti|

p

i=1

  • log

ep

i

  • |t(i)|

comp(B)

  • log p

1 ρ∗ s √ N

  • log p

s √ N log ep s r(ρ∗) s N log p s N log ep s where |t(1)| ≥ · · · ≥ |t(p)|.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-73
SLIDE 73

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Logistic SLOPE : conclusion

Theorem Take λ ∼ 1/ √

  • N. Then with probability at least

1 − C exp [−Cs log(ep/s)] we have simultaneously ˆ f − f ∗1 ≤ Cs

  • log(ep/s)

N , ˆ f − f ∗2 ≤ C

  • s log(ep/s)

N , R(ˆ f ) − R(f ∗) ≤ C s log(ep/s) N .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-74
SLIDE 74

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Matrix completion : context

E = F = {M, ·F , M ∈ [−1, +1]m×p} with · = · ∗. Matrix completion via hinge loss + nuclear norm ˆ f ∈ arg min

f ∈F

  • 1

N

N

  • i=1

(1 − Yif (Xi))+ + λf ∗

  • .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-75
SLIDE 75

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Matrix completion : context

E = F = {M, ·F , M ∈ [−1, +1]m×p} with · = · ∗. Matrix completion via hinge loss + nuclear norm ˆ f ∈ arg min

f ∈F

  • 1

N

N

  • i=1

(1 − Yif (Xi))+ + λf ∗

  • .

Assume that X is uniformly distributed on {Ej,k}.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-76
SLIDE 76

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Matrix completion : Bernstein and complexity

Obvious that f ∗(Ej,k) = sign(Ej,k, M∗) = sign(η(Ej,k)). As soon as |η(Ej,k)| ≥ β > 0 then Bernstein satisfied with κ = 1.

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-77
SLIDE 77

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Matrix completion : Bernstein and complexity

Obvious that f ∗(Ej,k) = sign(Ej,k, M∗) = sign(η(Ej,k)). As soon as |η(Ej,k)| ≥ β > 0 then Bernstein satisfied with κ = 1.

comp(B) = E sup

M∗≤1

  • 1

√ N

N

  • i=1

ǫi M, Xi

  • = E

sup

M∗≤1

  • M,

1 √ N

N

  • i=1

ǫi Xi

  • = E
  • 1

√ N

N

  • i=1

ǫi Xi

  • p

  • log(m + p)

min(m, p) thanks to “matrix Bernstein” inequality. Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-78
SLIDE 78

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Matrix completion : Bernstein and complexity

Obvious that f ∗(Ej,k) = sign(Ej,k, M∗) = sign(η(Ej,k)). As soon as |η(Ej,k)| ≥ β > 0 then Bernstein satisfied with κ = 1.

comp(B) = E sup

M∗≤1

  • 1

√ N

N

  • i=1

ǫi M, Xi

  • = E

sup

M∗≤1

  • M,

1 √ N

N

  • i=1

ǫi Xi

  • = E
  • 1

√ N

N

  • i=1

ǫi Xi

  • p

  • log(m + p)

min(m, p) thanks to “matrix Bernstein” inequality.

r(ρ) ∼

  • ρ
  • log(m + p)

N min(m, p) 1/2 .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-79
SLIDE 79

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Matrix completion : sparsity

Sparsity equation ∆(ρ) ≥ (4/5)ρ ⇔ ρ such that ρ r(2ρ) ≥ C

  • rank(M∗)mp.

Put r = rank(M∗).

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-80
SLIDE 80

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

Matrix completion : conclusion

Theorem Take λ ∼

  • log(m + p)/[N min(m, p)]. Then with probability

at least 1 − C exp [−Cr(m + p) log(m + p)] we have simultaneously ˆ f − f ∗∗ ≤ Cr

  • log(m + p)

N min(m, p), ˆ f − f ∗F ≤ C

  • r max(m, p) log(m + p)

N , R(ˆ f ) − R(f ∗) ≤ C r max(m, p) log(m + p) N .

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions

slide-81
SLIDE 81

Motivation Oracle inequalities Applications Logistic LASSO Logistic SLOPE Matrix completion with hinge loss

  • P. Alquier, V. Cottet & G. Lecué (2017). Estimation Bounds and Sharp Oracle Inequalities of

Regularized Procedures with Lipschitz Loss Functions. Preprint arxiv :1702.01402.

Jupyter notebooks : https://sites.google.com/site/vincentcottet/code Thank you !

Pierre Alquier Regularized Procedures with Lipschitz Loss Functions