 
              Quantitative CLTs via Martingale Embeddings Dan Mikulincer Weizmann Institute of Science Joint work with Ronen Eldan and Alex Zhai 1
The central limit theorem i =1 be i.i.d. copies of a random vector X in R d with Let { X i } ∞ E [ X ] = 0 and Cov ( X ) = Σ . n 1 If S n := � X i and G ∼ N (0 , Σ) then √ n i =1 S n − → n →∞ G , in an appropriate sense. • We usually normalize X to be isotropic, that is, Σ = I d . • We are interested in bounding the convergence rate. 2
The central limit theorem i =1 be i.i.d. copies of a random vector X in R d with Let { X i } ∞ E [ X ] = 0 and Cov ( X ) = Σ . n 1 If S n := � X i and G ∼ N (0 , Σ) then √ n i =1 S n − → n →∞ G , in an appropriate sense. • We usually normalize X to be isotropic, that is, Σ = I d . • We are interested in bounding the convergence rate. 2
The central limit theorem i =1 be i.i.d. copies of a random vector X in R d with Let { X i } ∞ E [ X ] = 0 and Cov ( X ) = Σ . n 1 If S n := � X i and G ∼ N (0 , Σ) then √ n i =1 S n − → n →∞ G , in an appropriate sense. • We usually normalize X to be isotropic, that is, Σ = I d . • We are interested in bounding the convergence rate. 2
Quantitative central limit theorem Berry-Esseen is an early examples of a quantitative bound. Theorem (Berry-Esseen) In the 1 -dimensional case, for any t ∈ R , � | X | 3 � | P ( S n ≤ t ) − P ( G ≤ t ) | ≤ E √ n . This estimate is sharp. 3
Quantitative central limit theorem Berry-Esseen is an early examples of a quantitative bound. Theorem (Berry-Esseen) In the 1 -dimensional case, for any t ∈ R , � | X | 3 � | P ( S n ≤ t ) − P ( G ≤ t ) | ≤ E √ n . This estimate is sharp. 3
Quantitative central limit theorem In higher dimensions the current best known result is due to Bentkus. Theorem (Bentkus, 2003) In the d-dimensional case, for any convex set K ⊂ R d , 1 � || X || 3 � 4 E | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d √ n . 1 4 term is the maximal Gaussian surface area of a convex • The d set in R d . If K ε is the ε enlargement of K then P ( G ∈ K ε \ K ) ≤ 4 ε d 1 4 . 1 4 remains an open question. • Whether one can omit d 4
Quantitative central limit theorem In higher dimensions the current best known result is due to Bentkus. Theorem (Bentkus, 2003) In the d-dimensional case, for any convex set K ⊂ R d , 1 � || X || 3 � 4 E | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d √ n . 1 4 term is the maximal Gaussian surface area of a convex • The d set in R d . If K ε is the ε enlargement of K then P ( G ∈ K ε \ K ) ≤ 4 ε d 1 4 . 1 4 remains an open question. • Whether one can omit d 4
Quantitative central limit theorem In higher dimensions the current best known result is due to Bentkus. Theorem (Bentkus, 2003) In the d-dimensional case, for any convex set K ⊂ R d , 1 � || X || 3 � 4 E | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d √ n . 1 4 term is the maximal Gaussian surface area of a convex • The d set in R d . If K ε is the ε enlargement of K then P ( G ∈ K ε \ K ) ≤ 4 ε d 1 4 . 1 4 remains an open question. • Whether one can omit d 4
Other metrics We consider stronger notions of distance Definition (Relative entropy between X and G ) Ent ( X || G ) := E [ln( f ( X ))] , where f is the density of X with respect to G . Definition (Wasserstein distance between X and G ) || X − G || 2 � � 1 / 2 � � W 2 ( X , G ) := inf E π π where π ranges over all possible couplings of X and G . 5
Other metrics We consider stronger notions of distance Definition (Relative entropy between X and G ) Ent ( X || G ) := E [ln( f ( X ))] , where f is the density of X with respect to G . Definition (Wasserstein distance between X and G ) || X − G || 2 � � 1 / 2 � � W 2 ( X , G ) := inf E π π where π ranges over all possible couplings of X and G . 5
Relative entropy For relative entropy, if A ⊂ R d is any measurable set, then by Pinsker’s inequality, � | P ( S n ∈ A ) − P ( G ∈ A ) | ≤ Ent ( S n || G ) . • In 84’ Barron showed that if Ent ( X || G ) < ∞ then n →∞ Ent ( S n || G ) = 0 . lim • In 2011, Bobkov, Chistyakov and G¨ otze showed that if, in addition, X has a finite fourth moment then Ent ( S n || G ) ≤ C n . • The above constant may depend on X as well as the dimension. 6
Relative entropy For relative entropy, if A ⊂ R d is any measurable set, then by Pinsker’s inequality, � | P ( S n ∈ A ) − P ( G ∈ A ) | ≤ Ent ( S n || G ) . • In 84’ Barron showed that if Ent ( X || G ) < ∞ then n →∞ Ent ( S n || G ) = 0 . lim • In 2011, Bobkov, Chistyakov and G¨ otze showed that if, in addition, X has a finite fourth moment then Ent ( S n || G ) ≤ C n . • The above constant may depend on X as well as the dimension. 6
Relative entropy For relative entropy, if A ⊂ R d is any measurable set, then by Pinsker’s inequality, � | P ( S n ∈ A ) − P ( G ∈ A ) | ≤ Ent ( S n || G ) . • In 84’ Barron showed that if Ent ( X || G ) < ∞ then n →∞ Ent ( S n || G ) = 0 . lim • In 2011, Bobkov, Chistyakov and G¨ otze showed that if, in addition, X has a finite fourth moment then Ent ( S n || G ) ≤ C n . • The above constant may depend on X as well as the dimension. 6
Relative entropy For relative entropy, if A ⊂ R d is any measurable set, then by Pinsker’s inequality, � | P ( S n ∈ A ) − P ( G ∈ A ) | ≤ Ent ( S n || G ) . • In 84’ Barron showed that if Ent ( X || G ) < ∞ then n →∞ Ent ( S n || G ) = 0 . lim • In 2011, Bobkov, Chistyakov and G¨ otze showed that if, in addition, X has a finite fourth moment then Ent ( S n || G ) ≤ C n . • The above constant may depend on X as well as the dimension. 6
Wasserstein distance The approximation error on a convex set K ⊂ R d , can be related to the Wasserstein distance using the following inequality by Zhai 1 2 6 W 2 ( S n , G ) 3 . | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d Proof. = W 2 ( S n , G ) 2 . || S n − G || 2 � � Take the optimal coupling, so E P ( S n ∈ K ) ≤ P ( || S n − G || ≤ ε, S n ∈ K ) + P ( || S n − G || > ε ) ≤ P ( G ∈ K ε ) + ε − 2 W 2 ( S n , G ) 2 1 4 + ε − 2 W 2 ( S n , G ) 2 . ≤ P ( G ∈ K ) + ε d Now, optimize over ε . 7
Wasserstein distance The approximation error on a convex set K ⊂ R d , can be related to the Wasserstein distance using the following inequality by Zhai 1 2 6 W 2 ( S n , G ) 3 . | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d Proof. = W 2 ( S n , G ) 2 . || S n − G || 2 � � Take the optimal coupling, so E P ( S n ∈ K ) ≤ P ( || S n − G || ≤ ε, S n ∈ K ) + P ( || S n − G || > ε ) ≤ P ( G ∈ K ε ) + ε − 2 W 2 ( S n , G ) 2 1 4 + ε − 2 W 2 ( S n , G ) 2 . ≤ P ( G ∈ K ) + ε d Now, optimize over ε . 7
Wasserstein distance Theorem (Zhai) If || X || ≤ β almost surely then √ d β log( n ) √ n W 2 ( S n , G ) ≤ . • Plugging this into the previous inequality shows 1 2 2 β | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d 3 . 1 n 3 for β 3 in Bentkus’ bound gives || X || 3 � • Substituting E � 1 4 β 3 | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d . 1 n 2 • the bounds are not comparable. 8
Wasserstein distance Theorem (Zhai) If || X || ≤ β almost surely then √ d β log( n ) √ n W 2 ( S n , G ) ≤ . • Plugging this into the previous inequality shows 1 2 2 β | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d 3 . 1 n 3 for β 3 in Bentkus’ bound gives || X || 3 � • Substituting E � 1 4 β 3 | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d . 1 n 2 • the bounds are not comparable. 8
Wasserstein distance Theorem (Zhai) If || X || ≤ β almost surely then √ d β log( n ) √ n W 2 ( S n , G ) ≤ . • Plugging this into the previous inequality shows 1 2 2 β | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d 3 . 1 n 3 for β 3 in Bentkus’ bound gives || X || 3 � • Substituting E � 1 4 β 3 | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d . 1 n 2 • the bounds are not comparable. 8
Wasserstein distance Theorem (Zhai) If || X || ≤ β almost surely then √ d β log( n ) √ n W 2 ( S n , G ) ≤ . • Plugging this into the previous inequality shows 1 2 2 β | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d 3 . 1 n 3 for β 3 in Bentkus’ bound gives || X || 3 � • Substituting E � 1 4 β 3 | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d . 1 n 2 • the bounds are not comparable. 8
Wasserstein distance √ √ Consider X , distributed uniformly on ± de i . In this case, β = d and Zhai’s bound gives 5 | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d 6 . 1 n 3 5 So, we can expect the CLT to hold whenever d 2 << n . On the other hand, Bentkus’ bound gives 7 | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d 4 . 1 n 2 7 2 << n for convergence. In this case, we would require d 9
Wasserstein distance √ √ Consider X , distributed uniformly on ± de i . In this case, β = d and Zhai’s bound gives 5 | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d 6 . 1 n 3 5 So, we can expect the CLT to hold whenever d 2 << n . On the other hand, Bentkus’ bound gives 7 | P ( S n ∈ K ) − P ( G ∈ K ) | ≤ d 4 . 1 n 2 7 2 << n for convergence. In this case, we would require d 9
Recommend
More recommend