an optimal transport view on generalization
play

An Optimal Transport View on Generalization Nemo Fournier January - PowerPoint PPT Presentation

Framework Main results An application Deep Neural Networks Conclusion An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport View on Generalization 1 / 10 Framework Main results An application


  1. Framework Main results An application Deep Neural Networks Conclusion An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport View on Generalization 1 / 10

  2. Framework Main results An application Deep Neural Networks Conclusion Outline Framework Main results An application Deep Neural Networks An Optimal Transport View on Generalization 2 / 10

  3. Framework Main results An application Deep Neural Networks Conclusion learning algorithm A : Z n → W instance space Z = X × Y hypothesis space W underlying distribution D training sample S n ∼ D ⊗ n loss function ℓ : Z×W → R + An Optimal Transport View on Generalization 3 / 10

  4. Framework Main results An application Deep Neural Networks Conclusion learning algorithm A : Z n → W instance space Z = X × Y hypothesis space W underlying distribution D training sample S n ∼ D ⊗ n loss function ℓ : Z×W → R + risk R ( w ) = E z ∼ D [ ℓ ( z , w )] empirical risk R S n ( w ) = E z ∼ S n [ ℓ ( z , h )] = 1 � n i = 1 ℓ ( z i , w ) n � � � � = E R ( W ) − R S n ( W ) generalization error G D , P W | S n An Optimal Transport View on Generalization 3 / 10

  5. Framework Main results An application Deep Neural Networks Conclusion µ and ν two measures on W � T ( X , W ) = µ ( X ) coupling T measure on W × W such that T ( W , X ) = ν ( X ) wasserstein distance T ∈ Γ ( µ,ν ) E ( W , W ′ ) ∼ T [ d W ( W , W ′ )] ❲ 1 ( µ,ν ) = inf algorithmic transport cost of algorithm A ( P W | S n ) � � � � �� Opt D , P W | S n = E z ∼ D ❲ 1 P W , P W | z An Optimal Transport View on Generalization 4 / 10

  6. Framework Main results An application Deep Neural Networks Conclusion � � � � �� A.G.T. Opt D , P W | S n = E z ∼ D ❲ 1 P W , P W | z main theorem � � � � � � R ( W ) − R S n ( W ) ≤ K × Opt G D , P W | S n = E D , P W | S n An Optimal Transport View on Generalization 5 / 10

  7. ❲ ❲ Framework Main results An application Deep Neural Networks Conclusion 1 a 0 Z n  → W     x i if { i | y i = 0 } � ∅ max   A :    1 ≤ i ≤ n  S n = { ( x 1 , y 1 ) ,..., ( x n , y n ) } �→   s.t. y i = 0       0 otherwise    An Optimal Transport View on Generalization 6 / 10

  8. Framework Main results An application Deep Neural Networks Conclusion 1 a 0 Z n  → W     x i if { i | y i = 0 } � ∅ max   A :    1 ≤ i ≤ n  S n = { ( x 1 , y 1 ) ,..., ( x n , y n ) } �→   s.t. y i = 0       0 otherwise    P W ( w ) = ( 1 − a ) n − k + n ( w + 1 − a ) n P W | z = δ x if x ≤ a P W | z = δ 0 otherwise � a ❲ 1 ( µ,δ t ) = E X ∼ µ [ d ( X , t )] = ⇒ ❲ 1 ( P W ,δ x ) = | x − w | P W ( w ) d w 0 An Optimal Transport View on Generalization 6 / 10

  9. Framework Main results An application Deep Neural Networks Conclusion 1 a 2 (( − a + 1 ) n + 2 ) n + a 2 ( 3 ( − a + 1 ) n + 2 ) + 2 (( − a + 1 ) n n + ( − a + 1 ) n ) x 2 � ❲ 1 ( P W ,δ x ) = 2 ( an + a ) ( − a + x + 1 ) n − 2 a (( − a + 1 ) n + 1 ) − 2 ( a ( 2 ( − a + 1 ) n + 1 ) n a 2 − ax − a � � − 4 + a ( 2 ( − a + 1 ) n + 1 )) x ) W ( P W , δ x ) W ( P W , δ x ) 0.5 0.12 0.11 0.4 0.1 0.09 0.3 0.08 0.2 0.07 0.06 0.1 0.05 x x 0 0.05 0.1 0.15 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 W ( P W , δ x ) 0.6 0.5 0.4 0.3 0.2 0.1 x 0.1 0.2 0.3 0.4 0.5 0.6 Figure: Graphs of the Wasserstein distance between P W and δ x for An Optimal Transport View on Generalization 7 / 10 several value of a and n . First row: n 10 with a 0 2 (left) and

  10. Framework Main results An application Deep Neural Networks Conclusion � a � � = ❲ 1 ( P W ,δ x ) d x + ( 1 − a ) ❲ 1 ( P W ,δ 0 ) Opt D , P W n 0 1 2 a 2 ( 2 ( − a + 1 ) n − 3 ) − a 2 ( 4 ( − a + 1 ) n + 3 ) − 3 a (( − a + 1 ) n + 2 ) � � � � � n 2 D , P W n = Opt 6 ( n 2 + 3 n + 2 ) 3 a 2 + 3 a (( − a + 1 ) n − 2 ) − 2 ( − a + 1 ) n + 2 − 6 a (( − a + 1 ) n − 2 ) � � � − 3 n � � � � D , P W | S n ≤ 1 × Opt D , P W n G An Optimal Transport View on Generalization 8 / 10

  11. Framework Main results An application Deep Neural Networks Conclusion �� � K 2 R 2 I ( S n ; W ) 2 log 1 − H � � R ( W ) − R S n ( W ) ≤ exp E 2 n η An Optimal Transport View on Generalization 9 / 10

  12. Framework Main results An application Deep Neural Networks Conclusion Powerful theoretical tool (average case, link with information theory) Quite quickly too convoluted to provide concrete bounds An Optimal Transport View on Generalization 10 / 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend