generalisation bounds for neural networks
play

Generalisation Bounds for Neural Networks Pascale Gourdeau - PowerPoint PPT Presentation

Generalisation Bounds for Neural Networks Pascale Gourdeau University of Oxford 15 November 2018 Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 1 / 31 Overview Introduction 1 General Strategies to


  1. Generalisation Bounds for Neural Networks Pascale Gourdeau University of Oxford 15 November 2018 Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 1 / 31

  2. Overview Introduction 1 General Strategies to Obtain Generalisation Bounds 2 Survey of Generalisation Bounds for Neural Networks 3 A Compression Approach [Arora et al., 2018] 4 Conclusion, Research Directions 5 Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 2 / 31

  3. Overview Introduction 1 General Strategies to Obtain Generalisation Bounds 2 Survey of Generalisation Bounds for Neural Networks 3 A Compression Approach [Arora et al., 2018] 4 Conclusion, Research Directions 5 Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 3 / 31

  4. What is generalisation? The ability to perform well on unseen data. Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 4 / 31

  5. What is generalisation? The ability to perform well on unseen data. Assumption: the data (both for the training and testing) comes i.i.d. from a distribution D . Usually work in a distribution-agnostic setting. Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 4 / 31

  6. What are generalisation bounds? Classification setting : input space X and output space Y := { 1 , . . . , k } with a distribution D on X × Y . Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 5 / 31

  7. What are generalisation bounds? Classification setting : input space X and output space Y := { 1 , . . . , k } with a distribution D on X × Y . Goal: to learn a function f : X → Y from a sample S := { ( x i , y i ) } m i =1 ⊆ X × Y . Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 5 / 31

  8. What are generalisation bounds? Classification setting : input space X and output space Y := { 1 , . . . , k } with a distribution D on X × Y . Goal: to learn a function f : X → Y from a sample S := { ( x i , y i ) } m i =1 ⊆ X × Y . Generalisation bounds: bounding the difference between the expected and empirical losses of f with high probability over S . Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 5 / 31

  9. What are generalisation bounds? Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 6 / 31

  10. What are generalisation bounds? For neural networks, we use the expected classification loss: � � L 0 ( f ) := P ( x , y ) ∼ D f ( x ) y ≤ max y ′ � = y f ( x ) y ′ , Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 6 / 31

  11. What are generalisation bounds? For neural networks, we use the expected classification loss: � � L 0 ( f ) := P ( x , y ) ∼ D f ( x ) y ≤ max y ′ � = y f ( x ) y ′ , and the empirical margin loss: � � m � L γ ( f ) := 1 � f ( x ) y ≤ γ + max y ′ � = y f ( x ) y ′ . 1 m i =1 Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 6 / 31

  12. Why are generalisation bounds useful? They allow us to quantify a given model’s expected generalisation performance. Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 7 / 31

  13. Why are generalisation bounds useful? They allow us to quantify a given model’s expected generalisation performance. E.g.: With probability 95% over the training sample, the error is at most 1%. Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 7 / 31

  14. Why are generalisation bounds useful? They allow us to quantify a given model’s expected generalisation performance. E.g.: With probability 95% over the training sample, the error is at most 1%. They can also: Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 7 / 31

  15. Why are generalisation bounds useful? They allow us to quantify a given model’s expected generalisation performance. E.g.: With probability 95% over the training sample, the error is at most 1%. They can also: Provide insight on the ability of a model to generalise. Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 7 / 31

  16. Why are generalisation bounds useful? They allow us to quantify a given model’s expected generalisation performance. E.g.: With probability 95% over the training sample, the error is at most 1%. They can also: Provide insight on the ability of a model to generalise. This is of particular interest for us: neural networks have many counter-intuitive properties. Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 7 / 31

  17. Why are generalisation bounds useful? They allow us to quantify a given model’s expected generalisation performance. E.g.: With probability 95% over the training sample, the error is at most 1%. They can also: Provide insight on the ability of a model to generalise. This is of particular interest for us: neural networks have many counter-intuitive properties. Inspire new algorithms or regularisation techniques. Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 7 / 31

  18. Overview Introduction 1 General Strategies to Obtain Generalisation Bounds 2 Survey of Generalisation Bounds for Neural Networks 3 A Compression Approach [Arora et al., 2018] 4 Conclusion, Research Directions 5 Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 8 / 31

  19. General Strategies Generalisation bounds (GB) for neural networks are usually obtained by 1 Defining a class H of functions computed by neural networks with certain properties (e.g., weight matrices with bounded norms, number of layers, etc.), Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 9 / 31

  20. General Strategies Generalisation bounds (GB) for neural networks are usually obtained by 1 Defining a class H of functions computed by neural networks with certain properties (e.g., weight matrices with bounded norms, number of layers, etc.), 2 Deriving a generalisation bound in terms of a complexity measure M ( H ) (e.g. size of H , Rademacher complexity), Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 9 / 31

  21. General Strategies Generalisation bounds (GB) for neural networks are usually obtained by 1 Defining a class H of functions computed by neural networks with certain properties (e.g., weight matrices with bounded norms, number of layers, etc.), 2 Deriving a generalisation bound in terms of a complexity measure M ( H ) (e.g. size of H , Rademacher complexity), 3 Upper bounding M ( H ) in terms of model parameters (e.g., norm of weight matrices, number of layers, etc.). Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 9 / 31

  22. General Strategies: Rademacher Complexity Definition (Rademacher complexity) Let G be a family of functions from a set Z to R . Let σ 1 , . . . , σ m be Rademacher variables: P ( σ i = 1) = P ( σ i = − 1) = 1 / 2. The empirical Rademacher complexity of G w.r.t. to a sample S = { z i } m i =1 is Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 10 / 31

  23. General Strategies: Rademacher Complexity Definition (Rademacher complexity) Let G be a family of functions from a set Z to R . Let σ 1 , . . . , σ m be Rademacher variables: P ( σ i = 1) = P ( σ i = − 1) = 1 / 2. The empirical Rademacher complexity of G w.r.t. to a sample S = { z i } m i =1 is � � m � 1 R S ( G ) = E σ sup σ i g ( z i ) . m g ∈ G i =1 Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 10 / 31

  24. General Strategies: Rademacher Complexity Definition (Rademacher complexity) Let G be a family of functions from a set Z to R . Let σ 1 , . . . , σ m be Rademacher variables: P ( σ i = 1) = P ( σ i = − 1) = 1 / 2. The empirical Rademacher complexity of G w.r.t. to a sample S = { z i } m i =1 is � � m � 1 R S ( G ) = E σ sup σ i g ( z i ) . m g ∈ G i =1 Intuition: How much G correlates with random noise on S . Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 10 / 31

  25. General Strategies: Rademacher Complexity Definition (Rademacher complexity) Let G be a family of functions from a set Z to R . Let σ 1 , . . . , σ m be Rademacher variables: P ( σ i = 1) = P ( σ i = − 1) = 1 / 2. The empirical Rademacher complexity of G w.r.t. to a sample S = { z i } m i =1 is � � m � 1 R S ( G ) = E σ sup σ i g ( z i ) . m g ∈ G i =1 Intuition: How much G correlates with random noise on S . Simple examples... Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 10 / 31

  26. General Strategies: Rademacher Complexity Theorem Let G be a family of functions from Z to [0 , 1] , and let S be a sample of size m drawn from Z according to D. Let L ( g ) = E z ∼ D [ g ( z )] and � m � L ( g ) = 1 i =1 g ( z i ) . Then for any δ > 0 , with probability at least 1 − δ m over S, for all functions g ∈ G, �� � log(1 /δ ) L ( g ) ≤ � L ( g ) + 2 R S ( G ) + O . m Pascale Gourdeau (University of Oxford) Generalisation Bounds for NNs 15 November 2018 11 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend