An Optimal Transport View on Generalization Nemo Fournier January - PowerPoint PPT Presentation

Framework Main results An application Deep Neural Networks Conclusion An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport View on Generalization 1 / 10

Framework Main results An application Deep Neural Networks Conclusion Outline Framework Main results An application Deep Neural Networks An Optimal Transport View on Generalization 2 / 10

Framework Main results An application Deep Neural Networks Conclusion learning algorithm A : Z n → W instance space Z = X × Y hypothesis space W underlying distribution D training sample S n ∼ D ⊗ n loss function ℓ : Z×W → R + An Optimal Transport View on Generalization 3 / 10

Framework Main results An application Deep Neural Networks Conclusion learning algorithm A : Z n → W instance space Z = X × Y hypothesis space W underlying distribution D training sample S n ∼ D ⊗ n loss function ℓ : Z×W → R + risk R ( w ) = E z ∼ D [ ℓ ( z , w )] empirical risk R S n ( w ) = E z ∼ S n [ ℓ ( z , h )] = 1 � n i = 1 ℓ ( z i , w ) n � � � � = E R ( W ) − R S n ( W ) generalization error G D , P W | S n An Optimal Transport View on Generalization 3 / 10

Framework Main results An application Deep Neural Networks Conclusion µ and ν two measures on W � T ( X , W ) = µ ( X ) coupling T measure on W × W such that T ( W , X ) = ν ( X ) wasserstein distance T ∈ Γ ( µ,ν ) E ( W , W ′ ) ∼ T [ d W ( W , W ′ )] ❲ 1 ( µ,ν ) = inf algorithmic transport cost of algorithm A ( P W | S n ) � � � � �� Opt D , P W | S n = E z ∼ D ❲ 1 P W , P W | z An Optimal Transport View on Generalization 4 / 10

Framework Main results An application Deep Neural Networks Conclusion � � � � �� A.G.T. Opt D , P W | S n = E z ∼ D ❲ 1 P W , P W | z main theorem � � � � � � R ( W ) − R S n ( W ) ≤ K × Opt G D , P W | S n = E D , P W | S n An Optimal Transport View on Generalization 5 / 10

❲ ❲ Framework Main results An application Deep Neural Networks Conclusion 1 a 0 Z n  → W     x i if { i | y i = 0 } � ∅ max   A :    1 ≤ i ≤ n  S n = { ( x 1 , y 1 ) ,..., ( x n , y n ) } �→   s.t. y i = 0       0 otherwise    An Optimal Transport View on Generalization 6 / 10

Framework Main results An application Deep Neural Networks Conclusion 1 a 0 Z n  → W     x i if { i | y i = 0 } � ∅ max   A :    1 ≤ i ≤ n  S n = { ( x 1 , y 1 ) ,..., ( x n , y n ) } �→   s.t. y i = 0       0 otherwise    P W ( w ) = ( 1 − a ) n − k + n ( w + 1 − a ) n P W | z = δ x if x ≤ a P W | z = δ 0 otherwise � a ❲ 1 ( µ,δ t ) = E X ∼ µ [ d ( X , t )] = ⇒ ❲ 1 ( P W ,δ x ) = | x − w | P W ( w ) d w 0 An Optimal Transport View on Generalization 6 / 10

Framework Main results An application Deep Neural Networks Conclusion 1 a 2 (( − a + 1 ) n + 2 ) n + a 2 ( 3 ( − a + 1 ) n + 2 ) + 2 (( − a + 1 ) n n + ( − a + 1 ) n ) x 2 � ❲ 1 ( P W ,δ x ) = 2 ( an + a ) ( − a + x + 1 ) n − 2 a (( − a + 1 ) n + 1 ) − 2 ( a ( 2 ( − a + 1 ) n + 1 ) n a 2 − ax − a � � − 4 + a ( 2 ( − a + 1 ) n + 1 )) x ) W ( P W , δ x ) W ( P W , δ x ) 0.5 0.12 0.11 0.4 0.1 0.09 0.3 0.08 0.2 0.07 0.06 0.1 0.05 x x 0 0.05 0.1 0.15 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 W ( P W , δ x ) 0.6 0.5 0.4 0.3 0.2 0.1 x 0.1 0.2 0.3 0.4 0.5 0.6 Figure: Graphs of the Wasserstein distance between P W and δ x for An Optimal Transport View on Generalization 7 / 10 several value of a and n . First row: n 10 with a 0 2 (left) and

Framework Main results An application Deep Neural Networks Conclusion � a � � = ❲ 1 ( P W ,δ x ) d x + ( 1 − a ) ❲ 1 ( P W ,δ 0 ) Opt D , P W n 0 1 2 a 2 ( 2 ( − a + 1 ) n − 3 ) − a 2 ( 4 ( − a + 1 ) n + 3 ) − 3 a (( − a + 1 ) n + 2 ) � � � � � n 2 D , P W n = Opt 6 ( n 2 + 3 n + 2 ) 3 a 2 + 3 a (( − a + 1 ) n − 2 ) − 2 ( − a + 1 ) n + 2 − 6 a (( − a + 1 ) n − 2 ) � � � − 3 n � � � � D , P W | S n ≤ 1 × Opt D , P W n G An Optimal Transport View on Generalization 8 / 10

Framework Main results An application Deep Neural Networks Conclusion �� K 2 R 2 I ( S n ; W ) 2 log 1 − H � � R ( W ) − R S n ( W ) ≤ exp E 2 n η An Optimal Transport View on Generalization 9 / 10

Framework Main results An application Deep Neural Networks Conclusion Powerful theoretical tool (average case, link with information theory) Quite quickly too convoluted to provide concrete bounds An Optimal Transport View on Generalization 10 / 10

An Optimal Transport View on Generalization Nemo Fournier January - PowerPoint PPT Presentation

Framework Main results An application Deep Neural Networks Conclusion An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport View on Generalization 1 / 10 Framework Main results An application

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 /

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Generalization of Cycle-Covering Heuristics Clemens B uchner Department of Mathematics and

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Joint Local Transport Plan for West of England Bristol Transport Strategy The emerging transport

Tensor Approach to Optimal Control problems with Fractional Elliptic Operator Volker Schulz

Breaking 802.11 using PMKID Joakim Rdland Friday 25 th January, 2019 Chair of Network

Math 104 Calculus 10.2 Infinite Series Math 104 -

Michael Goldberg University of Cincinnati AMS National Meeting, Baltimore, MD. January 18, 2014.

Everything Great About Upstream Graphics Daniel Vetter, Intel VTT @danvet ELC Europe 2019,

Classical dynamics, arrow of time, and genesis of the Heisenberg commutation relations Detlev

Hausdorff dimension of union of affine subspaces Korn elia H era E otv os Lor and

On the dimension of points which escape to infinity at given rate under exponential iteration