Regret Bounds for Lifelong Learning Pierre Alquier Groupe de - - PowerPoint PPT Presentation

regret bounds for lifelong learning
SMART_READER_LITE
LIVE PREVIEW

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de - - PowerPoint PPT Presentation

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA ENS Paris-Saclay


slide-1
SLIDE 1

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Regret Bounds for Lifelong Learning

Pierre Alquier Groupe de Travail de Machine learning du CMLA ENS Paris-Saclay

Pierre Alquier Lifelong Learning

slide-2
SLIDE 2

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

1

Transfer learning, multitask learning, lifelong learning...

2

A strategy for lifelong learning, with regret analysis

3

Open questions

Pierre Alquier Lifelong Learning

slide-3
SLIDE 3

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

1

Transfer learning, multitask learning, lifelong learning...

2

A strategy for lifelong learning, with regret analysis

3

Open questions

Pierre Alquier Lifelong Learning

slide-4
SLIDE 4

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Generic learning task

A generic learning task Given pairs object-label (X1, Y1), . . . , (Xn, Yn) learn to predict labels from objects.

Pierre Alquier Lifelong Learning

slide-5
SLIDE 5

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Generic learning task

A generic learning task Given pairs object-label (X1, Y1), . . . , (Xn, Yn) learn to predict labels from objects. self-driving car : road scene / presence of pedestrian ?

Pierre Alquier Lifelong Learning

slide-6
SLIDE 6

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Generic learning task

A generic learning task Given pairs object-label (X1, Y1), . . . , (Xn, Yn) learn to predict labels from objects. self-driving car : road scene / presence of pedestrian ? recommender system : customer / will buy my stuff ?

Pierre Alquier Lifelong Learning

slide-7
SLIDE 7

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Generic learning task

A generic learning task Given pairs object-label (X1, Y1), . . . , (Xn, Yn) learn to predict labels from objects. self-driving car : road scene / presence of pedestrian ? recommender system : customer / will buy my stuff ? Cambridge analytica : facebook user / believes fake news ?

Pierre Alquier Lifelong Learning

slide-8
SLIDE 8

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Generic learning task

A generic learning task Given pairs object-label (X1, Y1), . . . , (Xn, Yn) learn to predict labels from objects. self-driving car : road scene / presence of pedestrian ? recommender system : customer / will buy my stuff ? Cambridge analytica : facebook user / believes fake news ? ...

Pierre Alquier Lifelong Learning

slide-9
SLIDE 9

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Batch learning

data often assumed i.i.d from P, build ˆ f based on the whole dataset, minimize R(ˆ f ) where R(f ) = E(X,Y )∼P[ℓ(Y , f (X))].

Pierre Alquier Lifelong Learning

slide-10
SLIDE 10

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Batch learning : more books

Pierre Alquier Lifelong Learning

slide-11
SLIDE 11

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Online learning

no probabilistic assumption, data revealed sequentially, at time t build ˆ ft based on data seen so far minimize

T

  • t=1

ℓ(Yt, ˆ ft(Xt))

Pierre Alquier Lifelong Learning

slide-12
SLIDE 12

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Online learning : a good starting point

Pierre Alquier Lifelong Learning

slide-13
SLIDE 13

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

A few facts - motivation for transfer learning

when we solve different tasks, it seems we start from scratch at each task

Pierre Alquier Lifelong Learning

slide-14
SLIDE 14

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

A few facts - motivation for transfer learning

when we solve different tasks, it seems we start from scratch at each task still, our knowledge on “solving tasks” improves at each time

Pierre Alquier Lifelong Learning

slide-15
SLIDE 15

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

A few facts - motivation for transfer learning

when we solve different tasks, it seems we start from scratch at each task still, our knowledge on “solving tasks” improves at each time for similar task, it seems indeed reasonnable to transfer information from one task to another.

Pierre Alquier Lifelong Learning

slide-16
SLIDE 16

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Tentative definition - from Thrun and Pratt

Given a task, a training experience, and a performance measure, a program is said to learn if its performance at the task improves with experience.

Pierre Alquier Lifelong Learning

slide-17
SLIDE 17

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Tentative definition - from Thrun and Pratt

Given a family of tasks, training experience for each of these tasks, and a family of performance measures, an algorithm is said to learn to learn if its performance at each task improve with experience and with the number of tasks.

Pierre Alquier Lifelong Learning

slide-18
SLIDE 18

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Multitask learning

Multitask learning Given M tasks t, with M risks Rt(·) and M datasets St :=

  • (Xt,1, Yt,1), . . . , (Xt,nM, Yt,nM)
  • propose M predictors

ˆ ft(·) = ˆ ft(S1, . . . , SM; ·) that aims at minimizing (for example) R1(ˆ f1) + · · · + RM(ˆ fM).

Pierre Alquier Lifelong Learning

slide-19
SLIDE 19

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Multitask learning

Multitask learning Given M tasks t, with M risks Rt(·) and M datasets St :=

  • (Xt,1, Yt,1), . . . , (Xt,nM, Yt,nM)
  • propose M predictors

ˆ ft(·) = ˆ ft(S1, . . . , SM; ·) that aims at minimizing (for example) R1(ˆ f1) + · · · + RM(ˆ fM). Nice, but what if yet another new task appears ?

Pierre Alquier Lifelong Learning

slide-20
SLIDE 20

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Learning-to-learn

Learning-to-learn (LTL) Given M tasks t with risk Rt(·), and M datasets St :=

  • (Xt,1, Yt,1), . . . , (Xt,nM, Yt,nM)
  • learn information I = I(S1, . . . , SM) such that, when a new

task with risk R(·) and a new dataset S :=

  • (X1, Y1), . . . , (Xn, Yn)
  • arrives, I can build a predictor

ˆ ft(·) = ˆ ft(S, I; ·) such that R(ˆ f ) is small.

Pierre Alquier Lifelong Learning

slide-21
SLIDE 21

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Probabilistic setting for LTL

Possible probabilistic setting :

Pierre Alquier Lifelong Learning

slide-22
SLIDE 22

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Probabilistic setting for LTL

Possible probabilistic setting : P1, . . . , PM i.i.d from P,

Pierre Alquier Lifelong Learning

slide-23
SLIDE 23

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Probabilistic setting for LTL

Possible probabilistic setting : P1, . . . , PM i.i.d from P, (Xt,1, Yt,1), . . . , (Xt,nM, Yt,nM) i.i.d from Pt,

Pierre Alquier Lifelong Learning

slide-24
SLIDE 24

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Probabilistic setting for LTL

Possible probabilistic setting : P1, . . . , PM i.i.d from P, (Xt,1, Yt,1), . . . , (Xt,nM, Yt,nM) i.i.d from Pt, Rt(f ) = E(X,Y )∼Pt[ℓ(Y , f (X))],

Pierre Alquier Lifelong Learning

slide-25
SLIDE 25

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Probabilistic setting for LTL

Possible probabilistic setting : P1, . . . , PM i.i.d from P, (Xt,1, Yt,1), . . . , (Xt,nM, Yt,nM) i.i.d from Pt, Rt(f ) = E(X,Y )∼Pt[ℓ(Y , f (X))], quantitative criterion to minimize w.r.t I RLTL(I) = EP∼P

  • min

f ∈C E(X,Y )∼P [ℓ(Y , f (I, X))]

  • .

Pierre Alquier Lifelong Learning

slide-26
SLIDE 26

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Probabilistic setting for LTL

Possible probabilistic setting : P1, . . . , PM i.i.d from P, (Xt,1, Yt,1), . . . , (Xt,nM, Yt,nM) i.i.d from Pt, Rt(f ) = E(X,Y )∼Pt[ℓ(Y , f (X))], quantitative criterion to minimize w.r.t I RLTL(I) = EP∼P

  • min

f ∈C E(X,Y )∼P [ℓ(Y , f (I, X))]

  • .

Note the strong Bayesian flavor...

Pierre Alquier Lifelong Learning

slide-27
SLIDE 27

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example of LTL : dictionary learning

Example taken from :

Pierre Alquier Lifelong Learning

slide-28
SLIDE 28

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example of LTL : dictionary learning

Example : dictionary learning. The Xt,i ∈ RK, but all the relevant information is in DXt,i ∈ Rk, k ≪ K. The matrix D is unknown.

Pierre Alquier Lifelong Learning

slide-29
SLIDE 29

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example of LTL : dictionary learning

Example : dictionary learning. The Xt,i ∈ RK, but all the relevant information is in DXt,i ∈ Rk, k ≪ K. The matrix D is unknown. β1, . . . , βM i.i.d from P,

Pierre Alquier Lifelong Learning

slide-30
SLIDE 30

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example of LTL : dictionary learning

Example : dictionary learning. The Xt,i ∈ RK, but all the relevant information is in DXt,i ∈ Rk, k ≪ K. The matrix D is unknown. β1, . . . , βM i.i.d from P, (Xt,1, Yt,1), . . . , (Xt,n, Yt,n) i.i.d from Pβt : Y = βT

t DX + ε,

Pierre Alquier Lifelong Learning

slide-31
SLIDE 31

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example of LTL : dictionary learning

Example : dictionary learning. The Xt,i ∈ RK, but all the relevant information is in DXt,i ∈ Rk, k ≪ K. The matrix D is unknown. β1, . . . , βM i.i.d from P, (Xt,1, Yt,1), . . . , (Xt,n, Yt,n) i.i.d from Pβt : Y = βT

t DX + ε,

Rt(β, ∆) = E(X,Y )∼Pβt [ℓ(Y , βT∆X)],

Pierre Alquier Lifelong Learning

slide-32
SLIDE 32

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example of LTL : dictionary learning

Example : dictionary learning. The Xt,i ∈ RK, but all the relevant information is in DXt,i ∈ Rk, k ≪ K. The matrix D is unknown. β1, . . . , βM i.i.d from P, (Xt,1, Yt,1), . . . , (Xt,n, Yt,n) i.i.d from Pβt : Y = βT

t DX + ε,

Rt(β, ∆) = E(X,Y )∼Pβt [ℓ(Y , βT∆X)], quantitative criterion to minimize w.r.t M RLTL(∆) = Eβ∼P

  • E(X,Y )∼Pβ
  • ℓ(Y , βT∆X)
  • .

Pierre Alquier Lifelong Learning

slide-33
SLIDE 33

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example of LTL : dictionary learning

Maurer, Pontil and Romera-Paredes propose : ˆ D = arg min

∆ M

  • t=1

arg min

βt1≤α n

  • i=1

ℓ(Yt,i, βT

t ∆Xt,i)

Theorem (Maurer et al) Under suitable assumptions, with probability at least 1 − δ, RLTL( ˆ D) ≤ inf

∆ RLTL(∆)+C

 αk

  • 1

M +

  • log

1

δ

  • M

+ α

  • 1

n   . Note that C can depend on (k, K) or not, depending on assumptions on the distribution of X under Pβ...

Pierre Alquier Lifelong Learning

slide-34
SLIDE 34

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Going online : lifelong learning

Lifelong learning (LL) Online version of learning-to-learn ? Recent work with The Tien Mai and Massimiliano Pontil. Objectives :

Pierre Alquier Lifelong Learning

slide-35
SLIDE 35

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Going online : lifelong learning

Lifelong learning (LL) Online version of learning-to-learn ? Recent work with The Tien Mai and Massimiliano Pontil. Objectives : consider that tasks can be revealed sequentially. Use the tools of online learning theory : avoid probabilistic assumptions.

Pierre Alquier Lifelong Learning

slide-36
SLIDE 36

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Going online : lifelong learning

Lifelong learning (LL) Online version of learning-to-learn ? Recent work with The Tien Mai and Massimiliano Pontil. Objectives : consider that tasks can be revealed sequentially. Use the tools of online learning theory : avoid probabilistic assumptions. if possible, define a general strategy that does not depend

  • n the learning algorithm used within each task.

Pierre Alquier Lifelong Learning

slide-37
SLIDE 37

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

1

Transfer learning, multitask learning, lifelong learning...

2

A strategy for lifelong learning, with regret analysis

3

Open questions

Pierre Alquier Lifelong Learning

slide-38
SLIDE 38

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Massimiliano Pontil (UCL, IIT) The Tien Mai (U. of Oslo)

Pierre Alquier Lifelong Learning

slide-39
SLIDE 39

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

Pierre Alquier Lifelong Learning

slide-40
SLIDE 40

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y,

Pierre Alquier Lifelong Learning

slide-41
SLIDE 41

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y, loss function ℓ. Lifelong-learning problem (LL) Propose initial g.

Pierre Alquier Lifelong Learning

slide-42
SLIDE 42

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y, loss function ℓ. Lifelong-learning problem (LL) Propose initial g. For t = 1, 2, . . . ,

Pierre Alquier Lifelong Learning

slide-43
SLIDE 43

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y, loss function ℓ. Lifelong-learning problem (LL) Propose initial g. For t = 1, 2, . . . ,

1 propose initial ht.

For i = 1, . . . , nt

Pierre Alquier Lifelong Learning

slide-44
SLIDE 44

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y, loss function ℓ. Lifelong-learning problem (LL) Propose initial g. For t = 1, 2, . . . ,

1 propose initial ht.

For i = 1, . . . , nt

1

xt,i revealed,

Pierre Alquier Lifelong Learning

slide-45
SLIDE 45

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y, loss function ℓ. Lifelong-learning problem (LL) Propose initial g. For t = 1, 2, . . . ,

1 propose initial ht.

For i = 1, . . . , nt

1

xt,i revealed,

2

predict ˆ yt,i = ht ◦ g(xt,i),

Pierre Alquier Lifelong Learning

slide-46
SLIDE 46

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y, loss function ℓ. Lifelong-learning problem (LL) Propose initial g. For t = 1, 2, . . . ,

1 propose initial ht.

For i = 1, . . . , nt

1

xt,i revealed,

2

predict ˆ yt,i = ht ◦ g(xt,i),

3

yt,i revealed, suffer loss ˆ ℓt,i := ℓ(yt,i, ˆ yt,i),

Pierre Alquier Lifelong Learning

slide-47
SLIDE 47

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y, loss function ℓ. Lifelong-learning problem (LL) Propose initial g. For t = 1, 2, . . . ,

1 propose initial ht.

For i = 1, . . . , nt

1

xt,i revealed,

2

predict ˆ yt,i = ht ◦ g(xt,i),

3

yt,i revealed, suffer loss ˆ ℓt,i := ℓ(yt,i, ˆ yt,i),

4

update ht.

Pierre Alquier Lifelong Learning

slide-48
SLIDE 48

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting

  • bjects in X, labels in Y,

set of functions G : X → Z and H : Z → Y, loss function ℓ. Lifelong-learning problem (LL) Propose initial g. For t = 1, 2, . . . ,

1 propose initial ht.

For i = 1, . . . , nt

1

xt,i revealed,

2

predict ˆ yt,i = ht ◦ g(xt,i),

3

yt,i revealed, suffer loss ˆ ℓt,i := ℓ(yt,i, ˆ yt,i),

4

update ht.

2 udpate g. Pierre Alquier Lifelong Learning

slide-49
SLIDE 49

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Within-task algorithm

For t = 1, 2, . . . ,

1 Solve a usual online task, input zt,i = g(xt,i), output yt,i. 2 udpate g. Pierre Alquier Lifelong Learning

slide-50
SLIDE 50

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Within-task algorithm

For t = 1, 2, . . . ,

1 Solve a usual online task, input zt,i = g(xt,i), output yt,i. 2 udpate g.

We can do it using any online algorithm. Will be refered to as “within-task algorithm”. For many algorithms, bounds are known on the (normalized)-regret : Rt(g) = 1 nt

nt

  • i=1

ℓ(yt,i, ˆ yt,i)

  • = 1

nt

nt

i=1 ˆ

ℓt,i=ˆ Lt(g)

− 1 nt inf

h∈H nt

  • i=1

ℓ(yt,i, h(zt,i)).

Pierre Alquier Lifelong Learning

slide-51
SLIDE 51

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Examples of within-task algorithms

Online gradient for convex ℓ Initialize h = 0. Update h ← h − η∇f =hℓ(yt,i, f (zt,i)).

Pierre Alquier Lifelong Learning

slide-52
SLIDE 52

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Examples of within-task algorithms

Online gradient for convex ℓ Initialize h = 0. Update h ← h − η∇f =hℓ(yt,i, f (zt,i)). Many variants and improvements (projected gradient, online Newton-step, ...). Rt(g) in 1/√nt or 1/nt depending on assumptions on ℓ.

Pierre Alquier Lifelong Learning

slide-53
SLIDE 53

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Examples of within-task algorithms

Online gradient for convex ℓ Initialize h = 0. Update h ← h − η∇f =hℓ(yt,i, f (zt,i)). Many variants and improvements (projected gradient, online Newton-step, ...). Rt(g) in 1/√nt or 1/nt depending on assumptions on ℓ. EWA (Exponentially Weighted Aggregation) Prior ρ1 = π, initialize h ∼ ρ1. Update ρi+1(df ) ∝ exp[−ηℓ(yt,i, f (zt,i))]ρi(df ), h ∼ ρi+1.

Pierre Alquier Lifelong Learning

slide-54
SLIDE 54

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Examples of within-task algorithms

Online gradient for convex ℓ Initialize h = 0. Update h ← h − η∇f =hℓ(yt,i, f (zt,i)). Many variants and improvements (projected gradient, online Newton-step, ...). Rt(g) in 1/√nt or 1/nt depending on assumptions on ℓ. EWA (Exponentially Weighted Aggregation) Prior ρ1 = π, initialize h ∼ ρ1. Update ρi+1(df ) ∝ exp[−ηℓ(yt,i, f (zt,i))]ρi(df ), h ∼ ρi+1. E[Rt(g)] in 1/√nt under boundedness assumption. Integrated variant : Rt(g) in 1/nt if ℓ is exp-concave.

Pierre Alquier Lifelong Learning

slide-55
SLIDE 55

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

EWA for lifelong learning

EWA-LL Prior π = ρ1 on G. Draw g ∼ π. For t = 1, 2, . . .

1 run the within-task algorithm on task t. Suffer ˆ

Lt(g).

2 update ρt+1(df ) ∝ exp[−ηˆ

Lt(f )]ρt(df ).

3 draw g ∼ ρt+1. Pierre Alquier Lifelong Learning

slide-56
SLIDE 56

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

EWA for lifelong learning

EWA-LL Prior π = ρ1 on G. Draw g ∼ π. For t = 1, 2, . . .

1 run the within-task algorithm on task t. Suffer ˆ

Lt(g).

2 update ρt+1(df ) ∝ exp[−ηˆ

Lt(f )]ρt(df ).

3 draw g ∼ ρt+1.

Next : we provide two examples that are corollaries of a general result (stated later).

Pierre Alquier Lifelong Learning

slide-57
SLIDE 57

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 1 : dictionary learning

X = RK → Z = Rk → Y = R x → Dx → h, Dx = hTDx.

Pierre Alquier Lifelong Learning

slide-58
SLIDE 58

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 1 : dictionary learning

X = RK → Z = Rk → Y = R x → Dx → h, Dx = hTDx. within-task algorithm : online gradient descent on h.

Pierre Alquier Lifelong Learning

slide-59
SLIDE 59

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 1 : dictionary learning

X = RK → Z = Rk → Y = R x → Dx → h, Dx = hTDx. within-task algorithm : online gradient descent on h. EWA-LL, prior : columns of D i.i.d uniform on unit sphere.

Pierre Alquier Lifelong Learning

slide-60
SLIDE 60

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 1 : dictionary learning

X = RK → Z = Rk → Y = R x → Dx → h, Dx = hTDx. within-task algorithm : online gradient descent on h. EWA-LL, prior : columns of D i.i.d uniform on unit sphere. Theorem (Corollary 4.4) - ℓ is bounded by B & L-Lipschitz E

  • 1

T

T

  • t=1

1 nt

nt

  • i=1

ˆ ℓt,i

  • ≤ inf

D

1 T

T

  • t=1

inf

ht≤C

1 nt

nt

  • i=1

ℓ(yt,i, hT

t Dxt,i)

+ C 4

  • Kk

T (log(T) + 7) + BL √ T + 1 T

T

  • t=1

BL √ 2k √nt .

Pierre Alquier Lifelong Learning

slide-61
SLIDE 61

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 1 : dictionary learning

X = RK → Z = Rk → Y = R x → Dx → h, Dx = hTDx. within-task algorithm : online gradient descent on h. EWA-LL, prior : columns of D i.i.d uniform on unit sphere. Theorem (Corollary 4.4) - ℓ is bounded by B & L-Lipschitz E

  • 1

T

T

  • t=1

1 nt

nt

  • i=1

ˆ ℓt,i

  • ≤ inf

D

1 T

T

  • t=1

inf

ht≤C

1 nt

nt

  • i=1

ℓ(yt,i, hT

t Dxt,i)

+ C 4

  • Kk

T (log(T) + 7) + BL √ T + BL √ 2k √ ¯ n .

Pierre Alquier Lifelong Learning

slide-62
SLIDE 62

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 1 (dictionary learning) : simulations

simulations X = R5 → Z = R2 → Y = R with ℓ the quadratic loss, T = 150, each nt = 100.

Pierre Alquier Lifelong Learning

slide-63
SLIDE 63

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 1 (dictionary learning) : simulations

simulations X = R5 → Z = R2 → Y = R with ℓ the quadratic loss, T = 150, each nt = 100. implementation of EWA-LL, at each step, D is updated using N iterations of Metropolis-Hastings.

Pierre Alquier Lifelong Learning

slide-64
SLIDE 64

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 1 (dictionary learning) : simulations

simulations X = R5 → Z = R2 → Y = R with ℓ the quadratic loss, T = 150, each nt = 100. implementation of EWA-LL, at each step, D is updated using N iterations of Metropolis-Hastings.

Pierre Alquier Lifelong Learning

slide-65
SLIDE 65

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 2 : finite set of predictors

x

g∈G

→ g(x)

h∈H

→ h(g(x)). card(G) = G < +∞, card(H) = H < +∞

Pierre Alquier Lifelong Learning

slide-66
SLIDE 66

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 2 : finite set of predictors

x

g∈G

→ g(x)

h∈H

→ h(g(x)). card(G) = G < +∞, card(H) = H < +∞ within-task algorithm : EWA, uniform prior.

Pierre Alquier Lifelong Learning

slide-67
SLIDE 67

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 2 : finite set of predictors

x

g∈G

→ g(x)

h∈H

→ h(g(x)). card(G) = G < +∞, card(H) = H < +∞ within-task algorithm : EWA, uniform prior. EWA-LL, uniform prior.

Pierre Alquier Lifelong Learning

slide-68
SLIDE 68

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 2 : finite set of predictors

x

g∈G

→ g(x)

h∈H

→ h(g(x)). card(G) = G < +∞, card(H) = H < +∞ within-task algorithm : EWA, uniform prior. EWA-LL, uniform prior. Theorem (Corollary 4.2) - ℓ bounded by C & α-exp-concave E

  • 1

T

T

  • t=1

1 m

m

  • i=1

ˆ ℓt,i

  • ≤ inf

g∈G

1 T

T

  • t=1

inf

ht∈H

1 m

m

  • i=1

ℓ(yt,i, ht◦g(xt,i)) + C

  • log G

2T + α log H ¯ n .

Pierre Alquier Lifelong Learning

slide-69
SLIDE 69

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 2 : improvement on existing results

The “online-to-batch” trick allows to deduce from our online method a statistical estimator with a controled LTL risk in O

  • log G

T + log H n

  • .

Pierre Alquier Lifelong Learning

slide-70
SLIDE 70

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Example 2 : improvement on existing results

The “online-to-batch” trick allows to deduce from our online method a statistical estimator with a controled LTL risk in O

  • log G

T + log H n

  • .

In this case, a previous bound by Pentina and Lampert was in O

  • log G

T +

  • log H

n

  • .

Pierre Alquier Lifelong Learning

slide-71
SLIDE 71

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

General regret bound

Theorem (Theorem 3.1) - ℓ bounded by C If for any g ∈ G, the within-task algorithm has a regret bound Rt(g) ≤ β(g, nt), then E

  • 1

T

T

  • t=1

1 nt

nt

  • i=1

ˆ ℓt,i

  • ≤ inf

ρ

1 T

T

  • t=1

inf

ht∈H

1 nt

nt

  • i=1

  • yt,i, ht ◦ g(xt,i)
  • + 1

T

T

  • t=1

β(g, nt)

  • ρ(dg) + ηC 2

8 + K(ρ, π) ηT

  • .

Pierre Alquier Lifelong Learning

slide-72
SLIDE 72

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

1

Transfer learning, multitask learning, lifelong learning...

2

A strategy for lifelong learning, with regret analysis

3

Open questions

Pierre Alquier Lifelong Learning

slide-73
SLIDE 73

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Efficient algorithms ?

Our online analysis allows to avoid explicit probabilistic assumptions on the data, and allows a free choice of the within-task algorithm.

Pierre Alquier Lifelong Learning

slide-74
SLIDE 74

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Efficient algorithms ?

Our online analysis allows to avoid explicit probabilistic assumptions on the data, and allows a free choice of the within-task algorithm. However, EWA-LL is not “truly online” as its computation requires to store all the data seen so far.

Pierre Alquier Lifelong Learning

slide-75
SLIDE 75

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Efficient algorithms ?

Our online analysis allows to avoid explicit probabilistic assumptions on the data, and allows a free choice of the within-task algorithm. However, EWA-LL is not “truly online” as its computation requires to store all the data seen so far. Moreover, its computation is not scalable.

Pierre Alquier Lifelong Learning

slide-76
SLIDE 76

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Efficient Lifelong Learning Algorithm : ELLA

dictionary learning,

Pierre Alquier Lifelong Learning

slide-77
SLIDE 77

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Efficient Lifelong Learning Algorithm : ELLA

dictionary learning, fast update of D and β at each step, truly

  • nline : no need to

store the data,

Pierre Alquier Lifelong Learning

slide-78
SLIDE 78

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Efficient Lifelong Learning Algorithm : ELLA

dictionary learning, fast update of D and β at each step, truly

  • nline : no need to

store the data, very good empirical performances,

Pierre Alquier Lifelong Learning

slide-79
SLIDE 79

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Efficient Lifelong Learning Algorithm : ELLA

dictionary learning, fast update of D and β at each step, truly

  • nline : no need to

store the data, very good empirical performances, no regret bound.

Pierre Alquier Lifelong Learning

slide-80
SLIDE 80

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

More progress on dictionary learning

dictionary learning,

Pierre Alquier Lifelong Learning

slide-81
SLIDE 81

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

More progress on dictionary learning

dictionary learning, fast update of β at each step, fast update

  • f D at the end of

each task, truly online,

Pierre Alquier Lifelong Learning

slide-82
SLIDE 82

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

More progress on dictionary learning

dictionary learning, fast update of β at each step, fast update

  • f D at the end of

each task, truly online, very good empirical performances,

Pierre Alquier Lifelong Learning

slide-83
SLIDE 83

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

More progress on dictionary learning

dictionary learning, fast update of β at each step, fast update

  • f D at the end of

each task, truly online, very good empirical performances, LTL bound in O

  • 1

T +

  • 1

n

  • .

Pierre Alquier Lifelong Learning

slide-84
SLIDE 84

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Algorithms : open questions

Open question 1 An efficient algorithm with theoretical guarantees (if possible beyond dictionary learning).

Pierre Alquier Lifelong Learning

slide-85
SLIDE 85

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Algorithms : open questions

Open question 1 An efficient algorithm with theoretical guarantees (if possible beyond dictionary learning). theoretical analysis of ELLA ?

Pierre Alquier Lifelong Learning

slide-86
SLIDE 86

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Algorithms : open questions

Open question 1 An efficient algorithm with theoretical guarantees (if possible beyond dictionary learning). theoretical analysis of ELLA ? can we justify to update D at each step ? this leads to the next big open problem...

Pierre Alquier Lifelong Learning

slide-87
SLIDE 87

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Optimality of the bounds

ELLA : updates D at each step. Doing so, after T tasks with n steps in each task, we would expect a bound in O

  • 1

nT +

  • 1

n

  • .

Pierre Alquier Lifelong Learning

slide-88
SLIDE 88

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Optimality of the bounds

ELLA : updates D at each step. Doing so, after T tasks with n steps in each task, we would expect a bound in O

  • 1

nT +

  • 1

n

  • .

Denevi et al : bound in O

  • 1

T +

  • 1

n

  • .

Pierre Alquier Lifelong Learning

slide-89
SLIDE 89

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Optimality of the bounds

ELLA : updates D at each step. Doing so, after T tasks with n steps in each task, we would expect a bound in O

  • 1

nT +

  • 1

n

  • .

Denevi et al : bound in O

  • 1

T +

  • 1

n

  • .

So, what are the optimal rates in LL & LTL ?

Pierre Alquier Lifelong Learning

slide-90
SLIDE 90

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Insights from a toy model

θ1 fixed once and for all, task t : θ2,t fixed for the task for i = 1, . . . , n, yt,i = (θ1 + ε1,i,t, θ2,t + ε2,i,t) with εj,i,t ∼ N(0, 1).

Pierre Alquier Lifelong Learning

slide-91
SLIDE 91

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Insights from a toy model

θ1 fixed once and for all, task t : θ2,t fixed for the task for i = 1, . . . , n, yt,i = (θ1 + ε1,i,t, θ2,t + ε2,i,t) with εj,i,t ∼ N(0, 1). ˆ θ1 =

1 nT

T

t=1

n

i=1(yt,i)1 can be computed in the online

setting and one has E

θ1 − θ1|

  • = O
  • 1

nT

  • .

Pierre Alquier Lifelong Learning

slide-92
SLIDE 92

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Insights from a toy model

θ1 fixed once and for all, task t : θ2,t fixed for the task for i = 1, . . . , n, yt,i = (θ1 + ε1,i,t, θ2,t + ε2,i,t) with εj,i,t ∼ N(0, 1). ˆ θ1 =

1 nT

T

t=1

n

i=1(yt,i)1 can be computed in the online

setting and one has E

θ1 − θ1|

  • = O
  • 1

nT

  • .

Fits our setting with x = ∅, gθ1(x) = θ1, hθ2(z) = (z, θ2).

Pierre Alquier Lifelong Learning

slide-93
SLIDE 93

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Insights from a toy model

θ1 fixed once and for all, task t : θ2,t and ε1,t ∼ N(0, 1) fixed for the task. for i = 1, . . . , n, yt,i = (θ1 + ε1,t, θ2,t + ε2,i,t) with ε2,i,t ∼ N(0, 1). ˆ θ1 = 1

T

T

t=1(yt,i)1 can be computed in the online setting and

  • ne has

E

θ1 − θ1|

  • = O
  • 1

T

  • .

Still fits our setting and LTL !

Pierre Alquier Lifelong Learning

slide-94
SLIDE 94

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Optimal rates : open questions

Open question 2 What are the optimal rates in lifelong learning and in LTL ?

Pierre Alquier Lifelong Learning

slide-95
SLIDE 95

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Optimal rates : open questions

Open question 2 What are the optimal rates in lifelong learning and in LTL ? requires to define properly class of predictors,

Pierre Alquier Lifelong Learning

slide-96
SLIDE 96

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Optimal rates : open questions

Open question 2 What are the optimal rates in lifelong learning and in LTL ? requires to define properly class of predictors, the optimal rate will also depend on the setting. This leads to the next question...

Pierre Alquier Lifelong Learning

slide-97
SLIDE 97

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Are our definitions even right ?

Note that the terminology is not exen fixed : for example, Pentina and Lampert call lifelong learning what we call learning to learn (we don’t claim we are right !).

Pierre Alquier Lifelong Learning

slide-98
SLIDE 98

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Are our definitions even right ?

Note that the terminology is not exen fixed : for example, Pentina and Lampert call lifelong learning what we call learning to learn (we don’t claim we are right !). We used :

1

LTL : samples from all the tasks presented at once.

2

LL : tasks presented sequentially, within each task, pairs presented sequentially.

3

why not tasks presented sequentially, but within each task, samples presented all at once ? .

Pierre Alquier Lifelong Learning

slide-99
SLIDE 99

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Are our definitions even right ?

Note that the terminology is not exen fixed : for example, Pentina and Lampert call lifelong learning what we call learning to learn (we don’t claim we are right !). We used :

1

LTL : “Batch-within-batch”

2

LL : “Online-within-online”

3

“Batch-within-online”, see our paper and Denivi et al.

Pierre Alquier Lifelong Learning

slide-100
SLIDE 100

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Towards more models ?

One can imagine even more settings :

  • bservations not ordered by tasks ?

Pierre Alquier Lifelong Learning

slide-101
SLIDE 101

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Towards more models ?

One can imagine even more settings :

  • bservations not ordered by tasks ?

for some tasks, the information is complete, for other tasks, this is not the case. For example some tasks are sequential predictions, others are bandit problems.

Pierre Alquier Lifelong Learning

slide-102
SLIDE 102

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Towards more models ?

One can imagine even more settings :

  • bservations not ordered by tasks ?

for some tasks, the information is complete, for other tasks, this is not the case. For example some tasks are sequential predictions, others are bandit problems. more complicated : we use within tasks an algorithm for which we don’t have a regret bound, for example deep neural network for image classification in self-driving cars. We have a partial feedback that is not the missclassification rate but depends on it : number of accidents, user feedback...

Pierre Alquier Lifelong Learning

slide-103
SLIDE 103

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Towards more models ?

One can imagine even more settings :

  • bservations not ordered by tasks ?

for some tasks, the information is complete, for other tasks, this is not the case. For example some tasks are sequential predictions, others are bandit problems. more complicated : we use within tasks an algorithm for which we don’t have a regret bound, for example deep neural network for image classification in self-driving cars. We have a partial feedback that is not the missclassification rate but depends on it : number of accidents, user feedback... Do we really need a paper for each possible variant ?...

Pierre Alquier Lifelong Learning

slide-104
SLIDE 104

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting : open questions

Open question 3 Which settings are relevant ? Which settings are not ? To what extent is a general theory possible ?

Pierre Alquier Lifelong Learning

slide-105
SLIDE 105

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting : open questions

Open question 3 Which settings are relevant ? Which settings are not ? To what extent is a general theory possible ? depends on the applications.

Pierre Alquier Lifelong Learning

slide-106
SLIDE 106

Transfer learning, multitask learning, lifelong learning... A strategy for lifelong learning, with regret analysis Open questions

Setting : open questions

Open question 3 Which settings are relevant ? Which settings are not ? To what extent is a general theory possible ? depends on the applications. should also have a look on other existing approaches (econometrics of panel data ↔ multitask learning).

Pierre Alquier Lifelong Learning