An Optimal Transport View on Generalization Nemo Fournier January - - PowerPoint PPT Presentation

an optimal transport view on generalization
SMART_READER_LITE
LIVE PREVIEW

An Optimal Transport View on Generalization Nemo Fournier January - - PowerPoint PPT Presentation

Framework Main results An application Deep Neural Networks Conclusion An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport View on Generalization 1 / 10 Framework Main results An application


slide-1
SLIDE 1

Framework Main results An application Deep Neural Networks Conclusion

An Optimal Transport View on Generalization

Nemo Fournier January 13, 2020

An Optimal Transport View on Generalization 1 / 10

slide-2
SLIDE 2

Framework Main results An application Deep Neural Networks Conclusion

Outline

Framework Main results An application Deep Neural Networks

An Optimal Transport View on Generalization 2 / 10

slide-3
SLIDE 3

Framework Main results An application Deep Neural Networks Conclusion

instance space Z = X × Y hypothesis space W loss function ℓ : Z×W → R+ learning algorithm A : Zn → W underlying distribution D training sample Sn ∼ D⊗n

An Optimal Transport View on Generalization 3 / 10

slide-4
SLIDE 4

Framework Main results An application Deep Neural Networks Conclusion

instance space Z = X × Y hypothesis space W loss function ℓ : Z×W → R+ learning algorithm A : Zn → W underlying distribution D training sample Sn ∼ D⊗n risk R(w) = Ez∼D[ℓ(z,w)] empirical risk RSn(w) = Ez∼Sn[ℓ(z,h)] = 1

n

n

i=1 ℓ(zi,w)

generalization error G

  • D,PW|Sn
  • = E
  • R(W) − RSn(W)
  • An Optimal Transport View on Generalization

3 / 10

slide-5
SLIDE 5

Framework Main results An application Deep Neural Networks Conclusion

µ and ν two measures on W coupling T measure on W × W such that

  • T(X,W) = µ(X)

T(W,X) = ν(X) wasserstein distance ❲1(µ,ν) = inf

T∈Γ(µ,ν)E(W,W ′)∼T[dW(W,W′)]

algorithmic transport cost of algorithm A (PW|Sn) Opt

  • D,PW|Sn
  • = Ez∼D
  • ❲1
  • PW,PW|z
  • An Optimal Transport View on Generalization

4 / 10

slide-6
SLIDE 6

Framework Main results An application Deep Neural Networks Conclusion

A.G.T. Opt

  • D,PW|Sn
  • = Ez∼D
  • ❲1
  • PW,PW|z
  • main theorem

G

  • D,PW|Sn
  • = E
  • R(W) − RSn(W)
  • ≤ K × Opt
  • D,PW|Sn
  • An Optimal Transport View on Generalization

5 / 10

slide-7
SLIDE 7

Framework Main results An application Deep Neural Networks Conclusion

1 a A :              Zn → W Sn = {(x1,y1),...,(xn,yn)} →          max

1≤i≤n s.t. yi=0

xi if {i | yi = 0} ∅ 0 otherwise ❲ ❲

An Optimal Transport View on Generalization 6 / 10

slide-8
SLIDE 8

Framework Main results An application Deep Neural Networks Conclusion

1 a A :              Zn → W Sn = {(x1,y1),...,(xn,yn)} →          max

1≤i≤n s.t. yi=0

xi if {i | yi = 0} ∅ 0 otherwise PW(w) = (1 − a)n−k + n(w + 1 − a)n PW|z = δx if x ≤ a PW|z = δ0 otherwise ❲1(µ,δt) = EX∼µ[d(X,t)] = ⇒ ❲1(PW,δx) = a |x − w|PW(w)dw

An Optimal Transport View on Generalization 6 / 10

slide-9
SLIDE 9

Framework Main results An application Deep Neural Networks Conclusion

❲1(PW,δx) = 1 2(an + a)

  • a2((−a + 1)n + 2)n + a2(3(−a + 1)n + 2) + 2((−a + 1)nn + (−a + 1)n)x2

−4

  • a2 − ax − a
  • (−a + x + 1)n − 2a((−a + 1)n + 1) − 2 (a(2(−a + 1)n + 1)n

+a(2(−a + 1)n + 1))x)

0.05 0.1 0.15 0.2

x

0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12

W(PW, δx)

0.1 0.2 0.3 0.4 0.5 0.6

x

0.1 0.2 0.3 0.4 0.5

W(PW, δx)

0.1 0.2 0.3 0.4 0.5 0.6

x

0.1 0.2 0.3 0.4 0.5 0.6

W(PW, δx)

Figure: Graphs of the Wasserstein distance between PW and δx for several value of a and n. First row: n 10 with a 0 2 (left) and

An Optimal Transport View on Generalization 7 / 10

slide-10
SLIDE 10

Framework Main results An application Deep Neural Networks Conclusion

Opt

  • D,PWn
  • =

a ❲1 (PW,δx)dx + (1 − a)❲1 (PW,δ0)

Opt

  • D,PWn
  • =

1 6(n2 + 3n + 2)

  • 2a2(2(−a + 1)n − 3) −
  • a2(4(−a + 1)n + 3) − 3a((−a + 1)n + 2)
  • n2

−3

  • 3a2 + 3a((−a + 1)n − 2) − 2(−a + 1)n + 2
  • n
  • − 6a((−a + 1)n − 2)

G

  • D,PW|Sn
  • ≤ 1 × Opt
  • D,PWn
  • An Optimal Transport View on Generalization

8 / 10

slide-11
SLIDE 11

Framework Main results An application Deep Neural Networks Conclusion

E

  • R(W) − RSn(W)
  • ≤ exp
  • −H

2 log 1 η

  • K2R2I(Sn;W)

2n

An Optimal Transport View on Generalization 9 / 10

slide-12
SLIDE 12

Framework Main results An application Deep Neural Networks Conclusion

Powerful theoretical tool (average case, link with information theory) Quite quickly too convoluted to provide concrete bounds

An Optimal Transport View on Generalization 10 / 10