Limit Distributions for Smooth Total Variation and 2 -Divergence in - PowerPoint PPT Presentation

Limit Distributions for Smooth Total Variation and χ 2 -Divergence in High Dimensions Ziv Goldfeld and Kengo Kato Cornell University The 2020 International Symposium on Information Theory June 2020

Statistical Distances Definition: Measure discrepancy between prob. distributions 2/12

Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q 2/12

Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric 2/12

Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric Popular Examples: 2/12

Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric Popular Examples: � � �� d P f -divergence: D f ( P � Q ) := E Q , convex f : R → [0 , ∞ ) f d Q (KL divergence, total variation, χ 2 -divergence, etc.) 2/12

Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric Popular Examples: � � �� d P f -divergence: D f ( P � Q ) := E Q , convex f : R → [0 , ∞ ) f d Q (KL divergence, total variation, χ 2 -divergence, etc.) � � X − Y � p �� 1 /p � p -Wasserstein dist.: W p ( P, Q ) := π ∈ Π( P,Q ) E π inf Π( P, Q ) is the set of coupling of P, Q 2/12

Statistical Distances Definition: Measure discrepancy between prob. distributions δ : P ( R d ) × P ( R d ) → [0 , ∞ ) s.t. δ ( P, Q ) = 0 ⇐ ⇒ P = Q If symmetric & δ ( P, Q ) ≤ δ ( P, R ) + δ ( R, Q ) then δ is a metric Popular Examples: � � �� d P f -divergence: D f ( P � Q ) := E Q , convex f : R → [0 , ∞ ) f d Q (KL divergence, total variation, χ 2 -divergence, etc.) � � X − Y � p �� 1 /p � p -Wasserstein dist.: W p ( P, Q ) := π ∈ Π( P,Q ) E π inf Π( P, Q ) is the set of coupling of P, Q Integral probability metrics: γ F ( P, Q ) := sup f ∈F E P [ f ] − E Q [ f ] (W 1 , TV, MMD, Dudley, Sobolev) 2/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory 3/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) 3/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) 3/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. 3/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . 3/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning 3/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning Implicit generative modeling 3/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning Implicit generative modeling Barycenter computation 3/12

Statistical Distances: Why are they useful? Historically: Prob. theory, mathematical statistics, information theory Topological and metric structure of P ( R d ) Inequalities (Pinsker, Talagrand, joint-range, etc.) Hypothesis testing, goodness-of-fit tests, etc. Fundamental performance limits of operational problems . . . Recently: Variety of applications in machine learning Implicit generative modeling Barycenter computation Anomaly detection, model ensembling, etc. 3/12

Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution 4/12

Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable 4/12

Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d 4/12

Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d Expand Z to R d space via (random) transformation Q ( θ ) X | Z 4/12

Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d Expand Z to R d space via (random) transformation Q ( θ ) X | Z R d 0 Q ( θ ) ⇒ Generative model: Q θ ( · ) := � X | Z ( ·| z ) d Q Z ( z ) = 4/12

� � � � �� Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d Expand Z to R d space via (random) transformation Q ( θ ) X | Z R d 0 Q ( θ ) ⇒ Generative model: Q θ ( · ) := � X | Z ( ·| z ) d Q Z ( z ) = Target � Space Latent � Space � � | � 4/12

� � � �� Implicit (Latent Variable) Generative Models Goal: Learn a model Q θ ≈ P to approximate data distribution Method: Complicated transformation of a simple latent variable Latent variable Z ∼ Q Z ∈ P ( R p ) , p ≪ d Expand Z to R d space via (random) transformation Q ( θ ) X | Z R d 0 Q ( θ ) ⇒ Generative model: Q θ ( · ) := � X | Z ( ·| z ) d Q Z ( z ) = Target � Space Latent � Space � � | � θ ⋆ ∈ argmin � P, Q θ � Minimum Distance Estimation: Solve δ θ 4/12

Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) 5/12

Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) Estimation: We don’t have P but data 5/12

Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) Estimation: We don’t have P but data { X i } n i =1 are i.i.d. samples from P ∈ P ( R d ) 5/12

Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) Estimation: We don’t have P but data { X i } n i =1 are i.i.d. samples from P ∈ P ( R d ) n Empirical distribution P n := 1 � δ X i n i =1 5/12

Implicit (Latent Variable) Generative Models 2 Goal: Solve OPT := inf θ δ ( P, Q θ ) exactly (find θ ⋆ ) Estimation: We don’t have P but data { X i } n i =1 are i.i.d. samples from P ∈ P ( R d ) n Empirical distribution P n := 1 � δ X i n i =1 ⇒ Inherently we work with δ ( P n , Q θ ) = 5/12

Limit Distributions for Smooth Total Variation and 2 -Divergence in - PowerPoint PPT Presentation

Limit Distributions for Smooth Total Variation and 2 -Divergence in High Dimensions Ziv Goldfeld and Kengo Kato Cornell University The 2020 International Symposium on Information Theory June 2020 Statistical Distances Definition: Measure

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Advanced Algorithms (XIII) Shanghai Jiao Tong University Chihao Zhang June 1, 2020 Total

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Total Variation in Image Analysis (The Homo Erectus Stage?) Franois Lauze 1Department of

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Lecture 5: Probability Distributions Random Variables Probability Distributions

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

We now mention some useful modifications of the limit idea. One-sided limits. + or

1 2 Total Budgeted in Total Planned in Total Planned in 2019 for Total Budgeted 2020 for

Nonhomogeneous linear systems of DEs Diagonalization, Variation of Parameters ITI 11/04/2020

Mixing Time Def: Total variation distance of distributions and on the same countable set

A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies

Lecture 2 Measures of Information I-Hsiang Wang Department of Electrical Engineering National

18.650 Statistics for Applications Chapter 3: Maximum Likelihood Estimation 1/23 Total

Advanced Software Engineering with C++ Templates Administrative Issues Thomas Gschwind <thg at

General Transformations for GPU Execution of Tree Traversals Michael Goldfarb*, Youngjoon Jo**,

CS 221 Tuesday 8 November 2011 Agenda 1. Announcements 2. Review: Solving Equations (Text

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes Debashis Ganguly and John

Unsupervised Domain Adaptation Based on Source-guided Discrepancy 23 th Sep. Han Bao (The

Sambuz

Useful Links

Newsletter

Mail Us

Limit Distributions for Smooth Total Variation and 2 -Divergence in - PowerPoint PPT Presentation

Limit Distributions for Smooth Total Variation and 2 -Divergence in High Dimensions Ziv Goldfeld and Kengo Kato Cornell University The 2020 International Symposium on Information Theory June 2020 Statistical Distances Definition: Measure

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Advanced Algorithms (XIII) Shanghai Jiao Tong University Chihao Zhang June 1, 2020 Total

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Total Variation in Image Analysis (The Homo Erectus Stage?) Franois Lauze 1Department of

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Lecture 5: Probability Distributions Random Variables Probability Distributions

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

We now mention some useful modifications of the limit idea. One-sided limits. + or

1 2 Total Budgeted in Total Planned in Total Planned in 2019 for Total Budgeted 2020 for

Nonhomogeneous linear systems of DEs Diagonalization, Variation of Parameters ITI 11/04/2020

Mixing Time Def: Total variation distance of distributions and on the same countable set

A Kullback-Leibler Divergence for Bayesian Model Comparison with Applications to Diabetes Studies

Lecture 2 Measures of Information I-Hsiang Wang Department of Electrical Engineering National

18.650 Statistics for Applications Chapter 3: Maximum Likelihood Estimation 1/23 Total

Advanced Software Engineering with C++ Templates Administrative Issues Thomas Gschwind &lt;thg at

General Transformations for GPU Execution of Tree Traversals Michael Goldfarb*, Youngjoon Jo**,

CS 221 Tuesday 8 November 2011 Agenda 1. Announcements 2. Review: Solving Equations (Text

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes Debashis Ganguly and John

Unsupervised Domain Adaptation Based on Source-guided Discrepancy 23 th Sep. Han Bao (The

Sambuz

Useful Links

Newsletter

Mail Us

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Advanced Software Engineering with C++ Templates Administrative Issues Thomas Gschwind <thg at