limit distributions for smooth total variation and 2
play

Limit Distributions for Smooth Total Variation and 2 -Divergence in - PowerPoint PPT Presentation

Limit Distributions for Smooth Total Variation and 2 -Divergence in High Dimensions Ziv Goldfeld and Kengo Kato Cornell University The 2020 International Symposium on Information Theory June 2020 Statistical Distances Definition: Measure


  1. Limit Distributions for Smooth Total Variation and χ 2 -Divergence in High Dimensions Ziv Goldfeld and Kengo Kato Cornell University The 2020 International Symposium on Information Theory June 2020

  2. Statistical Distances Definition: Measure discrepancy between prob. distributions 2/12

  3. Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q 2/12

  4. Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric 2/12

  5. Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric Popular Examples: 2/12

  6. Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric Popular Examples: � � �� d P f -divergence: D f ( P � Q ) := E Q , convex f : R → [0 , ∞ ) f d Q (KL divergence, total variation, χ 2 -divergence, etc.) 2/12

  7. Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric Popular Examples: � � �� d P f -divergence: D f ( P � Q ) := E Q , convex f : R → [0 , ∞ ) f d Q (KL divergence, total variation, χ 2 -divergence, etc.) � � X − Y � p �� 1 /p � p -Wasserstein dist.: W p ( P, Q ) := π ∈ Π( P,Q ) E π inf Π( P, Q ) is the set of coupling of P, Q 2/12

  8. Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric Popular Examples: � � �� d P f -divergence: D f ( P � Q ) := E Q , convex f : R → [0 , ∞ ) f d Q (KL divergence, total variation, χ 2 -divergence, etc.) � � X − Y � p �� 1 /p � p -Wasserstein dist.: W p ( P, Q ) := π ∈ Π( P,Q ) E π inf Π( P, Q ) is the set of coupling of P, Q Integral probability metrics: γ F ( P, Q ) := sup f ∈F E P [ f ] − E Q [ f ] (W 1 , TV, MMD, Dudley, Sobolev) 2/12

  9. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory 3/12

  10. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) 3/12

  11. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) 3/12

  12. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. 3/12

  13. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . 3/12

  14. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning 3/12

  15. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning Implicit generative modeling 3/12

  16. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning Implicit generative modeling Barycenter computation 3/12

  17. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning Implicit generative modeling Barycenter computation Anomaly detection, model ensembling, etc. 3/12

  18. Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning Implicit generative modeling Barycenter computation Anomaly detection, model ensembling, etc. 3/12

  19. Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution 4/12

  20. Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable 4/12

  21. Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d 4/12

  22. Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d Expand Z to R d space via (random) transformation Q ( θ ) X | Z 4/12

  23. Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d Expand Z to R d space via (random) transformation Q ( θ ) X | Z R d 0 Q ( θ ) ⇒ Generative model: Q θ ( · ) := � X | Z ( ·| z ) d Q Z ( z ) = 4/12

  24. � � � � ��� � Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d Expand Z to R d space via (random) transformation Q ( θ ) X | Z R d 0 Q ( θ ) ⇒ Generative model: Q θ ( · ) := � X | Z ( ·| z ) d Q Z ( z ) = Target � Space Latent � Space � � | � 4/12

  25. � � � ��� � � Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d Expand Z to R d space via (random) transformation Q ( θ ) X | Z R d 0 Q ( θ ) ⇒ Generative model: Q θ ( · ) := � X | Z ( ·| z ) d Q Z ( z ) = Target � Space Latent � Space � � | � θ ⋆ ∈ argmin � P, Q θ � Minimum Distance Estimation: Solve δ θ 4/12

  26. Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) 5/12

  27. Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) Estimation: We don’t have P but data 5/12

  28. Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) Estimation: We don’t have P but data { X i } n i =1 are i.i.d. samples from P ∈ P ( R d ) 5/12

  29. Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) Estimation: We don’t have P but data { X i } n i =1 are i.i.d. samples from P ∈ P ( R d ) n Empirical distribution P n := 1 � δ X i n i =1 5/12

  30. Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) Estimation: We don’t have P but data { X i } n i =1 are i.i.d. samples from P ∈ P ( R d ) n Empirical distribution P n := 1 � δ X i n i =1 ⇒ Inherently we work with δ ( P n , Q θ ) = 5/12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend