Concentration inequalities G abor Lugosi ICREA and Pompeu Fabra - PowerPoint PPT Presentation

example: kernel density estimation Let X 1 , . . . , X n be i.i.d. real samples drawn according to some density φ . The kernel density estimate is � x − X i � n � φ n (x) = 1 K , nh h i=1 � where h > 0 , and K is a nonnegative “kernel” K = 1 . The L 1 error is � Z = f(X 1 , . . . , X n ) = | φ (x) − φ n (x) | dx .

example: kernel density estimation Let X 1 , . . . , X n be i.i.d. real samples drawn according to some density φ . The kernel density estimate is � x − X i � n � φ n (x) = 1 K , nh h i=1 � where h > 0 , and K is a nonnegative “kernel” K = 1 . The L 1 error is � Z = f(X 1 , . . . , X n ) = | φ (x) − φ n (x) | dx . It is easy to see that | f(x 1 , . . . , x n ) − f(x 1 , . . . , x ′ i , . . . , x n ) | � � �� x − x i � � x − x ′ � � 1 � dx ≤ 2 � i � ≤ � K − K n , nh h h Var (Z) ≤ 2 so we get n .

example: uniform deviations Let A be a collection of subsets of X , and let X 1 , . . . , X n be n random points in X drawn i.i.d. Let n � P n (A) = 1 P(A) = P { X 1 ∈ A } and ✶ X i ∈ A n i=1 If Z = sup A ∈A | P(A) − P n (A) | , Var (Z) ≤ 1 2n

example: uniform deviations Let A be a collection of subsets of X , and let X 1 , . . . , X n be n random points in X drawn i.i.d. Let n � P n (A) = 1 P(A) = P { X 1 ∈ A } and ✶ X i ∈ A n i=1 If Z = sup A ∈A | P(A) − P n (A) | , Var (Z) ≤ 1 2n regardless of the distribution and the richness of A .

✶ ✶ ✶ bounding the expectation � n i ∈ A and let E ′ denote expectation only n (A) = 1 Let P ′ i=1 ✶ X ′ n with respect to X ′ 1 , . . . , X ′ n . | E ′ [P n (A) − P ′ E sup | P n (A) − P(A) | = E sup n (A)] | A ∈A A ∈A � � � � n � n (A) | = 1 � � | P n (A) − P ′ ≤ E sup n E sup ( ✶ X i ∈ A − ✶ X ′ i ∈ A ) � � � � A ∈A A ∈A i=1

bounding the expectation � n i ∈ A and let E ′ denote expectation only n (A) = 1 Let P ′ i=1 ✶ X ′ n with respect to X ′ 1 , . . . , X ′ n . | E ′ [P n (A) − P ′ E sup | P n (A) − P(A) | = E sup n (A)] | A ∈A A ∈A � � � � n � n (A) | = 1 � � | P n (A) − P ′ ≤ E sup n E sup ( ✶ X i ∈ A − ✶ X ′ i ∈ A ) � � � � A ∈A A ∈A i=1 Second symmetrization: if ε 1 , . . . , ε n are independent Rademacher variables, then � � � � � � � � n n � � = 1 � ≤ 2 � � � � n E sup ε i ( ✶ X i ∈ A − ✶ X ′ i ∈ A ) n E sup ε i ✶ X i ∈ A � � � � � � � A ∈A A ∈A i=1 i=1

conditional rademacher average If � � � � n � � � R n = E ε sup ε i ✶ X i ∈ A � � � � A ∈A i=1 then | P n (A) − P(A) | ≤ 2 E sup n E R n . A ∈A

conditional rademacher average If � � � � n � � � R n = E ε sup ε i ✶ X i ∈ A � � � � A ∈A i=1 then | P n (A) − P(A) | ≤ 2 E sup n E R n . A ∈A R n is a data-dependent quantity!

concentration of conditional rademacher average Define � � � � � � � � � R (i) n = E ε sup ε j ✶ X j ∈ A � � � � A ∈A j � =i One can show easily that n � 0 ≤ R n − R (i) (R n − R (i) n ≤ 1 and n ) ≤ R n . i=1 By the Efron-Stein inequality, n � n ) 2 ≤ E R n . (R n − R (i) Var (R n ) ≤ E i=1

concentration of conditional rademacher average Define � � � � � � � � � R (i) n = E ε sup ε j ✶ X j ∈ A � � � � A ∈A j � =i One can show easily that n � 0 ≤ R n − R (i) (R n − R (i) n ≤ 1 and n ) ≤ R n . i=1 By the Efron-Stein inequality, n � n ) 2 ≤ E R n . (R n − R (i) Var (R n ) ≤ E i=1 Standard deviation is at most √ E R n !

concentration of conditional rademacher average Define � � � � � � � � � R (i) n = E ε sup ε j ✶ X j ∈ A � � � � A ∈A j � =i One can show easily that n � 0 ≤ R n − R (i) (R n − R (i) n ≤ 1 and n ) ≤ R n . i=1 By the Efron-Stein inequality, n � n ) 2 ≤ E R n . (R n − R (i) Var (R n ) ≤ E i=1 Standard deviation is at most √ E R n ! Such functions are called self-bounding.

bounding the conditional rademacher average If S(X n 1 , A ) is the number of different sets of form { X 1 , . . . , X n } ∩ A : A ∈ A then R n is the maximum of S(X n 1 , A ) sub-Gaussian random variables. By the maximal inequality, � log S(X n 1 1 , A ) 2R n ≤ . 2n

bounding the conditional rademacher average If S(X n 1 , A ) is the number of different sets of form { X 1 , . . . , X n } ∩ A : A ∈ A then R n is the maximum of S(X n 1 , A ) sub-Gaussian random variables. By the maximal inequality, � log S(X n 1 1 , A ) 2R n ≤ . 2n In particular, � log S(X n 1 , A ) E sup | P n (A) − P(A) | ≤ 2 E . 2n A ∈A

random VC dimension Let V = V(x n 1 , A ) be the size of the largest subset of { x 1 , . . . , x n } shattered by A . By Sauer’s lemma, log S(X n 1 , A ) ≤ V(X n 1 , A ) log(n + 1) .

random VC dimension Let V = V(x n 1 , A ) be the size of the largest subset of { x 1 , . . . , x n } shattered by A . By Sauer’s lemma, log S(X n 1 , A ) ≤ V(X n 1 , A ) log(n + 1) . V is also self-bounding: n � (V − V (i) ) 2 ≤ V i=1 so by Efron-Stein, Var (V) ≤ E V

vapnik and chervonenkis Alexey Chervonenkis Vladimir Vapnik

beyond the variance X 1 , . . . , X n are independent random variables taking values in some set X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . Recall the Doob martingale representation: n � Z − E Z = ∆ i where ∆ i = E i Z − E i − 1 Z , i=1 with E i [ · ] = E [ ·| X 1 , . . . , X i ] . To get exponential inequalities, we bound the moment generating function E e λ (Z − E Z) .

azuma’s inequality Suppose that the martingale differences are bounded: | ∆ i | ≤ c i . Then �� n − 1 � � n E e λ (Z − E Z) = E e λ ( i=1 ∆ i ) = EE n e λ i=1 ∆ i + λ ∆ n �� n − 1 � λ i=1 ∆ i E n e λ ∆ n = E e �� n − 1 � λ i=1 ∆ i n / 2 (by Hoeffding) e λ 2 c 2 ≤ E e · · · � n ≤ e λ 2 ( i=1 c 2 i ) / 2 . This is the Azuma-Hoeffding inequality for sums of bounded martingale differences.

bounded differences inequality If Z = f(X 1 , . . . , X n ) and f is such that | f(x 1 , . . . , x n ) − f(x 1 , . . . , x ′ i , . . . , x n ) | ≤ c i then the martingale differences are bounded.

bounded differences inequality If Z = f(X 1 , . . . , X n ) and f is such that | f(x 1 , . . . , x n ) − f(x 1 , . . . , x ′ i , . . . , x n ) | ≤ c i then the martingale differences are bounded. Bounded differences inequality: if X 1 , . . . , X n are independent, then P {| Z − E Z | > t } ≤ 2e − 2t 2 / � n i=1 c 2 i .

bounded differences inequality If Z = f(X 1 , . . . , X n ) and f is such that | f(x 1 , . . . , x n ) − f(x 1 , . . . , x ′ i , . . . , x n ) | ≤ c i then the martingale differences are bounded. Bounded differences inequality: if X 1 , . . . , X n are independent, then P {| Z − E Z | > t } ≤ 2e − 2t 2 / � n i=1 c 2 i . McDiarmid’s inequality. Colin McDiarmid

hoeffding in a hilbert space Let X 1 , . . . , X n be independent zero-mean random variables in a separable Hilbert space such that � X i � ≤ c / 2 and denote v = nc 2 / 4 . Then, for all t ≥ √ v , �� n � ≤ e − (t −√ v) 2 / (2v) . � � P � X i � � > t � i=1

hoeffding in a hilbert space Let X 1 , . . . , X n be independent zero-mean random variables in a separable Hilbert space such that � X i � ≤ c / 2 and denote v = nc 2 / 4 . Then, for all t ≥ √ v , �� n � ≤ e − (t −√ v) 2 / (2v) . � � P � X i � � > t � i=1 � � �� n � has the bounded Proof: By the triangle inequality, i=1 X i differences property with constants c , so �� n n n n � � � � � � � � � � � � X i � > t = P X i � − E X i � > t − E X i P � � � � � � � � � � � � � i=1 i=1 i=1 i=1 � � � � �� 2 � �� n t − E i=1 X i ≤ exp − . 2v Also, � � � � � � � � 2 � � � � n � n n � � � � E � X i � 2 ≤ √ v . � � � � � E � X i � ≤ X i = E � � � � � � � i=1 i=1 i=1

bounded differences inequality Easy to use. Distribution free. Often close to optimal (e.g., L 1 error of kernel density estimate). Does not exploit “variance information.” Often too rigid. Other methods are necessary.

shannon entropy If X , Y are random variables taking values in a set of size N , � H(X) = − p(x) log p(x) x H(X | Y)= H(X , Y) − H(Y) � = − p(x , y) log p(x | y) Claude Shannon x , y (1916–2001) H(X) ≤ log N and H(X | Y) ≤ H(X)

han’s inequality If X = (X 1 , . . . , X n ) and X (i) = (X 1 , . . . , X i − 1 , X i+1 , . . . , X n ) , then � � n � H(X) − H(X (i) ) ≤ H(X) i=1 Proof: H(X)= H(X (i) ) + H(X i | X (i) ) ≤ H(X (i) ) + H(X i | X 1 , . . . , X i − 1 ) Since � n i=1 H(X i | X 1 , . . . , X i − 1 ) = H(X) , summing Te Sun Han the inequality, we get n � H(X (i) ) . (n − 1)H(X) ≤ i=1

edge isoperimetric inequality on the hypercube Let A ⊂ {− 1 , 1 } n . Let E(A) be the collection of pairs x , x ′ ∈ A such that d H (x , x ′ ) = 1 . Then | E(A) | ≤ | A | 2 × log 2 | A | . Proof: Let X = (X 1 , . . . , X n ) be uniformly distributed over A . Then p(x) = ✶ x ∈ A / | A | . Clearly, H(X) = log | A | . Also, � H(X) − H(X (i) ) = H(X i | X (i) ) = − p(x) log p(x i | x (i) ) . x ∈ A For x ∈ A , � if x (i) ∈ A 1 / 2 p(x i | x (i) ) = 1 otherwise where x (i) = (x 1 , . . . , x i − 1 , − x i , x i+1 , . . . , x n ) .

� H(X) − H(X (i) ) = log 2 ✶ x , x (i) ∈ A | A | x ∈ A and therefore � � n n � � � = log 2 ✶ x , x (i) ∈ A = | E(A) | H(X) − H(X (i) ) 2 log 2 . | A | | A | i=1 x ∈ A i=1 Thus, by Han’s inequality, � � n � | E(A) | H(X) − H(X (i) ) 2 log 2 = ≤ H(X) = log | A | . | A | i=1

This is equivalent to the edge isoperimetric inequality on the hypercube: if � � (x , x ′ ) : x ∈ A , x ′ ∈ A c , d H (x , x ′ ) = 1 ∂ E (A) = . is the edge boundary of A , then 2 n | ∂ E (A) | ≥ log 2 | A | × | A | Equality is achieved for sub-cubes.

VC entropy is self-bounding Let A is a class of subsets of X and x = (x 1 , . . . , x n ) ∈ X n . Recall that S(x , A ) is the number of different sets of form { x 1 , . . . , x n } ∩ A : A ∈ A Let f n (x) = log 2 S(x , A ) be the VC entropy. Then 0 ≤ f n (x) − f n − 1 (x 1 , . . . , x i − 1 , x i+1 . . . , x n ) ≤ 1 and n � (f n (x) − f n − 1 (x 1 , . . . , x i − 1 , x i+1 . . . , x n )) ≤ f n (x) . i=1 Proof: Put the uniform distribution on the class of sets { x 1 , . . . , x n } ∩ A and use Han’s inequality.

VC entropy is self-bounding Let A is a class of subsets of X and x = (x 1 , . . . , x n ) ∈ X n . Recall that S(x , A ) is the number of different sets of form { x 1 , . . . , x n } ∩ A : A ∈ A Let f n (x) = log 2 S(x , A ) be the VC entropy. Then 0 ≤ f n (x) − f n − 1 (x 1 , . . . , x i − 1 , x i+1 . . . , x n ) ≤ 1 and n � (f n (x) − f n − 1 (x 1 , . . . , x i − 1 , x i+1 . . . , x n )) ≤ f n (x) . i=1 Proof: Put the uniform distribution on the class of sets { x 1 , . . . , x n } ∩ A and use Han’s inequality. Corollary: if X 1 , . . . , X n are independent, then Var (log 2 S(X , A )) ≤ E log 2 S(X , A ) .

subadditivity of entropy The entropy of a random variable Z ≥ 0 is Ent (Z) = E Φ(Z) − Φ( E Z) where Φ(x) = x log x . By Jensen’s inequality, Ent (Z) ≥ 0 .

subadditivity of entropy The entropy of a random variable Z ≥ 0 is Ent (Z) = E Φ(Z) − Φ( E Z) where Φ(x) = x log x . By Jensen’s inequality, Ent (Z) ≥ 0 . Han’s inequality implies the following sub-additivity property. Let X 1 , . . . , X n be independent and let Z = f(X 1 , . . . , X n ) , where f ≥ 0 . Denote Ent (i) (Z) = E (i) Φ(Z) − Φ( E (i) Z) Then n � Ent (i) (Z) . Ent (Z) ≤ E i=1

a logarithmic sobolev inequality on the hypercube Let X = (X 1 , . . . , X n ) be uniformly distributed over {− 1 , 1 } n . If f : {− 1 , 1 } n → R and Z = f(X) , n � Ent (Z 2 ) ≤ 1 (Z − Z ′ i ) 2 2 E i=1 The proof uses subadditivity of the entropy and calculus for the case n = 1 . Implies Efron-Stein.

herbst’s argument: exponential concentration If f : {− 1 , 1 } n → R , the log-Sobolev inequality may be used with g(x) = e λ f(x) / 2 where λ ∈ R . If F( λ ) = E e λ Z is the moment generating function of Z = f(X) , � Ze λ Z � � e λ Z � � Ze λ Z � Ent (g(X) 2 )= λ E − E log E = λ F ′ ( λ ) − F( λ ) log F( λ ) . Differential inequalities are obtained for F( λ ) .

herbst’s argument As an example, suppose f is such that � n i=1 (Z − Z ′ i ) 2 + ≤ v . Then by the log-Sobolev inequality, λ F ′ ( λ ) − F( λ ) log F( λ ) ≤ v λ 2 4 F( λ ) If G( λ ) = log F( λ ) , this becomes � G( λ ) � ′ ≤ v 4 . λ This can be integrated: G( λ ) ≤ λ E Z + λ v / 4 , so F( λ ) ≤ e λ E Z − λ 2 v / 4 This implies P { Z > E Z + t } ≤ e − t 2 / v

herbst’s argument As an example, suppose f is such that � n i=1 (Z − Z ′ i ) 2 + ≤ v . Then by the log-Sobolev inequality, λ F ′ ( λ ) − F( λ ) log F( λ ) ≤ v λ 2 4 F( λ ) If G( λ ) = log F( λ ) , this becomes � G( λ ) � ′ ≤ v 4 . λ This can be integrated: G( λ ) ≤ λ E Z + λ v / 4 , so F( λ ) ≤ e λ E Z − λ 2 v / 4 This implies P { Z > E Z + t } ≤ e − t 2 / v Stronger than the bounded differences inequality!

gaussian log-sobolev inequality Let X = (X 1 , . . . , X n ) be a vector of i.i.d. standard normal If f : R n → R and Z = f(X) , � �∇ f(X) � 2 � Ent (Z 2 ) ≤ 2 E (Gross, 1975).

gaussian log-sobolev inequality Let X = (X 1 , . . . , X n ) be a vector of i.i.d. standard normal If f : R n → R and Z = f(X) , � �∇ f(X) � 2 � Ent (Z 2 ) ≤ 2 E (Gross, 1975). Proof sketch: By the subadditivity of entropy, it suffices to prove it for n = 1 . Approximate Z = f(X) by � � m � 1 f ε i √ m i=1 where the ε i are i.i.d. Rademacher random variables. Use the log-Sobolev inequality of the hypercube and the central limit theorem.

gaussian concentration inequality Herbst’t argument may now be repeated: Suppose f is Lipschitz: for all x , y ∈ R n , | f(x) − f(y) | ≤ L � x − y � . Then, for all t > 0 , P { f(X) − E f(X) ≥ t } ≤ e − t 2 / (2L 2 ) . (Tsirelson, Ibragimov, and Sudakov, 1976).

an application: supremum of a gaussian process Let (X t ) t ∈T be an almost surely continuous centered Gaussian process. Let Z = sup t ∈T X t . If � � �� σ 2 = sup X 2 , E t t ∈T then P {| Z − E Z | ≥ u } ≤ 2e − u 2 / (2 σ 2 )

an application: supremum of a gaussian process Let (X t ) t ∈T be an almost surely continuous centered Gaussian process. Let Z = sup t ∈T X t . If � � �� σ 2 = sup X 2 , E t t ∈T then P {| Z − E Z | ≥ u } ≤ 2e − u 2 / (2 σ 2 ) Proof: We may assume T = { 1 , ..., n } . Let Γ be the covariance matrix of X = (X 1 , . . . , X n ) . Let A = Γ 1 / 2 . If Y is a standard normal vector, then distr . f(Y) = i=1 ,..., n (AY) i max = i=1 ,..., n X i max By Cauchy-Schwarz, � �   1 / 2 � � � � � � � � A 2  | (Au) i − (Av) i | = A i , j (u j − v j ) ≤ � u − v � � � i , j � � j j ≤ σ � u − v �

beyond bernoulli and gaussian: the entropy method For general distributions, logarithmic Sobolev inequalities are not available. Solution: modified logarithmic Sobolev inequalities. Suppose X 1 , . . . , X n are independent. Let Z = f(X 1 , . . . , X n ) and Z i = f i (X (i) ) = f i (X 1 , . . . , X i − 1 , X i+1 , . . . , X n ) . Let φ (x) = e x − x − 1 . Then for all λ ∈ R , � Ze λ Z � � e λ Z � � e λ Z � λ E − E log E � � n � e λ Z φ ( − λ (Z − Z i )) ≤ . E i=1 Michel Ledoux

the entropy method i f(X 1 , . . . , x ′ Define Z i = inf x ′ i , . . . , X n ) and suppose n � (Z − Z i ) 2 ≤ v . i=1 Then for all t > 0 , P { Z − E Z > t } ≤ e − t 2 / (2v) .

the entropy method i f(X 1 , . . . , x ′ Define Z i = inf x ′ i , . . . , X n ) and suppose n � (Z − Z i ) 2 ≤ v . i=1 Then for all t > 0 , P { Z − E Z > t } ≤ e − t 2 / (2v) . This implies the bounded differences inequality and much more.

example: the largest eigenvalue of a symmetric matrix Let A = (X i , j ) n × n be symmetric, the X i , j independent ( i ≤ j ) with | X i , j | ≤ 1 . Let u T Au . Z = λ 1 = sup u: � u � =1 and suppose v is such that Z = v T Av . A ′ i , j is obtained by replacing X i , j by x ′ i , j . Then � � v T Av − v T A ′ (Z − Z i , j ) + ≤ i , j v ✶ Z > Z i , j � � � � v T (A − A ′ v i v j (X i , j − X ′ = i , j )v ✶ Z > Z i , j ≤ 2 i , j ) + ≤ 4 | v i v j | . Therefore, � n � 2 � � � 16 | v i v j | 2 ≤ 16 (Z − Z ′ i , j ) 2 v 2 + ≤ = 16 . i 1 ≤ i ≤ j ≤ n 1 ≤ i ≤ j ≤ n i=1

example: convex lipschitz functions Let f : [0 , 1] n → R be a convex function. Let i f(X 1 , . . . , x ′ i , . . . , X n ) and let X ′ i be the value of x ′ Z i = inf x ′ i for which the minimum is achieved. Then, writing (i) = (X 1 , . . . , X i − 1 , X ′ X i , X i+1 , . . . , X n ) , n n � � (i) ) 2 (Z − Z i ) 2 = (f(X) − f(X i=1 i=1 � ∂ f � 2 n � (X i − X ′ i ) 2 ≤ (X) ∂ x i i=1 (by convexity) � ∂ f � 2 n � ≤ (X) ∂ x i i=1 = �∇ f(X) � 2 ≤ L 2 .

convex lipschitz functions If f : [0 , 1] n → R is a convex Lipschitz function and X 1 , . . . , X n are independent taking values in [0 , 1] , Z = f(X 1 , . . . , X n ) satisfies P { Z > E Z + t } ≤ e − t 2 / (2L 2 ) .

convex lipschitz functions If f : [0 , 1] n → R is a convex Lipschitz function and X 1 , . . . , X n are independent taking values in [0 , 1] , Z = f(X 1 , . . . , X n ) satisfies P { Z > E Z + t } ≤ e − t 2 / (2L 2 ) . A similar lower tail bound also holds.

self-bounding functions Suppose Z satisfies n � 0 ≤ Z − Z i ≤ 1 and (Z − Z i ) ≤ Z . i=1 Recall that Var (Z) ≤ E Z . We have much more: P { Z > E Z + t } ≤ e − t 2 / (2 E Z+2t / 3) and P { Z < E Z − t } ≤ e − t 2 / (2 E Z)

self-bounding functions Suppose Z satisfies n � 0 ≤ Z − Z i ≤ 1 and (Z − Z i ) ≤ Z . i=1 Recall that Var (Z) ≤ E Z . We have much more: P { Z > E Z + t } ≤ e − t 2 / (2 E Z+2t / 3) and P { Z < E Z − t } ≤ e − t 2 / (2 E Z) Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions.

self-bounding functions Suppose Z satisfies n � 0 ≤ Z − Z i ≤ 1 and (Z − Z i ) ≤ Z . i=1 Recall that Var (Z) ≤ E Z . We have much more: P { Z > E Z + t } ≤ e − t 2 / (2 E Z+2t / 3) and P { Z < E Z − t } ≤ e − t 2 / (2 E Z) Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions. Configuration functions.

exponential efron-stein inequality Define E ′ � � n � V + = (Z − Z ′ i ) 2 + i=1 and E ′ � � n � V − = (Z − Z ′ i ) 2 . − i=1 By Efron-Stein, Var (Z) ≤ E V − . Var (Z) ≤ E V + and

exponential efron-stein inequality Define E ′ � � n � V + = (Z − Z ′ i ) 2 + i=1 and E ′ � � n � V − = (Z − Z ′ i ) 2 . − i=1 By Efron-Stein, Var (Z) ≤ E V − . Var (Z) ≤ E V + and The following exponential versions hold for all λ, θ > 0 with λθ < 1 : λθ log E e λ (Z − E Z) ≤ 1 − λθ log E e λ V + /θ . If also Z ′ i − Z ≤ 1 for every i , fhen for all λ ∈ (0 , 1 / 2) , 2 λ 1 − 2 λ log E e λ V − . log E e λ (Z − E Z) ≤

weakly self-bounding functions f : X n → [0 , ∞ ) is weakly (a , b) -self-bounding if there exist f i : X n − 1 → [0 , ∞ ) such that for all x ∈ X n , � � 2 n � f(x) − f i (x (i) ) ≤ af(x) + b . i=1

weakly self-bounding functions f : X n → [0 , ∞ ) is weakly (a , b) -self-bounding if there exist f i : X n − 1 → [0 , ∞ ) such that for all x ∈ X n , � � 2 n � f(x) − f i (x (i) ) ≤ af(x) + b . i=1 Then � � t 2 P { Z ≥ E Z + t } ≤ exp − . 2 (a E Z + b + at / 2)

weakly self-bounding functions f : X n → [0 , ∞ ) is weakly (a , b) -self-bounding if there exist f i : X n − 1 → [0 , ∞ ) such that for all x ∈ X n , � � 2 n � f(x) − f i (x (i) ) ≤ af(x) + b . i=1 Then � � t 2 P { Z ≥ E Z + t } ≤ exp − . 2 (a E Z + b + at / 2) If, in addition, f(x) − f i (x (i) ) ≤ 1 , then for 0 < t ≤ E Z , � � t 2 P { Z ≤ E Z − t } ≤ exp − . 2 (a E Z + b + c − t) where c = (3a − 1) / 6 .

the isoperimetric view Let X = (X 1 , . . . , X n ) have independent components, taking values in X n . Let A ⊂ X n . The Hamming distance of X to A is n � d(X , A) = min y ∈ A d(X , y) = min ✶ X i � =y i . y ∈ A i=1 Michel Talagrand

the isoperimetric view Let X = (X 1 , . . . , X n ) have independent components, taking values in X n . Let A ⊂ X n . The Hamming distance of X to A is n � d(X , A) = min y ∈ A d(X , y) = min ✶ X i � =y i . y ∈ A i=1 Michel Talagrand � � � n 1 ≤ e − 2t 2 / n . d(X , A) ≥ t + 2 log P P [A]

the isoperimetric view Let X = (X 1 , . . . , X n ) have independent components, taking values in X n . Let A ⊂ X n . The Hamming distance of X to A is n � d(X , A) = min y ∈ A d(X , y) = min ✶ X i � =y i . y ∈ A i=1 Michel Talagrand � � � n 1 ≤ e − 2t 2 / n . d(X , A) ≥ t + 2 log P P [A] Concentration of measure!

the isoperimetric view Proof: By the bounded differences inequality, P { E d(X , A) − d(X , A) ≥ t } ≤ e − 2t 2 / n . Taking t = E d(X , A) , we get � n 1 E d(X , A) ≤ 2 log P { A } . By the bounded differences inequality again, � � � n 1 ≤ e − 2t 2 / n P d(X , A) ≥ t + 2 log P { A }

talagrand’s convex distance The weighted Hamming distance is � d α (x , A) = inf y ∈ A d α (x , y) = inf | α i | y ∈ A i:x i � =y i where α = ( α 1 , . . . , α n ) . The same argument as before gives � � � � α � 2 1 ≤ e − 2t 2 / � α � 2 , P d α (X , A) ≥ t + log 2 P { A } This implies min ( P { A } , P { d α (X , A) ≥ t } ) ≤ e − t 2 / 2 . sup α : � α � =1

convex distance inequality convex distance: d T (x , A) = sup d α (x , A) . α ∈ [0 , ∞ ) n : � α � =1

convex distance inequality convex distance: d T (x , A) = sup d α (x , A) . α ∈ [0 , ∞ ) n : � α � =1 Talagrand’s convex distance inequality: P { A } P { d T (X , A) ≥ t } ≤ e − t 2 / 4 .

convex distance inequality convex distance: d T (x , A) = sup d α (x , A) . α ∈ [0 , ∞ ) n : � α � =1 Talagrand’s convex distance inequality: P { A } P { d T (X , A) ≥ t } ≤ e − t 2 / 4 . Follows from the fact that d T (X , A) 2 is (4 , 0) weakly self bounding (by a saddle point representation of d T ). Talagrand’s original proof was different.

✶ ✶ convex lipschitz functions For A ⊂ [0 , 1] n and x ∈ [0 , 1] n , define D(x , A) = inf y ∈ A � x − y � . If A is convex, then D(x , A) ≤ d T (x , A) .

convex lipschitz functions For A ⊂ [0 , 1] n and x ∈ [0 , 1] n , define D(x , A) = inf y ∈ A � x − y � . If A is convex, then D(x , A) ≤ d T (x , A) . Proof: D(x , A)= ν ∈M (A) � x − E ν Y � inf (since A is convex) � � n � � � � 2 � ≤ inf (since x j , Y j ∈ [0 , 1] ) E ν ✶ x j � =Y j ν ∈M (A) j=1 n � = inf sup α j E ν ✶ x j � =Y j (by Cauchy-Schwarz) ν ∈M (A) α : � α �≤ 1 j=1 = d T (x , A) (by minimax theorem) .

John von Neumann (1903–1957)

Sergei Lvovich Sobolev (1908–1989)

Concentration inequalities G abor Lugosi ICREA and Pompeu Fabra - PowerPoint PPT Presentation

Concentration inequalities G abor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables. what is concentration? We are

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Concentration inequalities, the entropy method, search for super -concentration Concentration, ...

Concentration inequalities for occupancy models with log-concave marginals Jay Bartroff, Larry

Concentration Inequalities for Random Matrices M. Ledoux Institut de Math ematiques de

Inequalities for Symmetric Polynomials Curtis Greene October 24, 2009 Inequalities for Symmetric

Diffusion Contaminant at Contaminant Solutes (contaminants) migrate due to concentration

Probabilistic Program Analysis and Concentration of Measure Part I: Concentration of Measure

Concentration inequalities and the entropy method G abor Lugosi ICREA and Pompeu Fabra

Concentration for Coulomb gases and Coulomb transport inequalities Djalil Chafa 1 , Adrien Hardy

Wittens Laplacian and the Morse Inequalities Background Morse Inequalities Wittens Idea

Health inequalities slides Wirral January 2020 Version 1.1 Why health inequalities are

Health Inequalities: A postcode lottery Postcode Lottery Health Inequalities Health

Welcome Health inequalities What are health inequalities? Our presenters will be introducing the

The well- -baby vision baby vision The well Span of concentration Span of concentration

Concentration Risk Measures and De-concentration Optimization Luyang Fu, Ph.D., FCAS, MAAA March

Nitrate concentration vs. load : Nitrate concentration vs. load : Management options that

Real Vector Spaces, the Cauchy-Schwarz Inequality, & Convex Functions in ACL2(r) Carl Kwan

The Great Recession and Racial Inequality: Evidence from Measures of Economic Well- Being by

On logarithmic Sobolev inequalities With a focus on the Heisenberg group Ronan Herry LAMA UMR

Who We Are Our mission at CFED is to make it possible for

The Discrepancy Function and the Small Ball Inequality in Higher Dimensions Dmitriy Bilyk

Microeconomics 3200/4200: Part 1 P. Piacquadio p.g.piacquadio@econ.uio.no August 17, 2017 P.

Slutsky Equation Molly W. Dahl Georgetown University Econ 101 Spring 2009 1 Effects of a

SESSION 10: INFLATION, THE HIDDEN TAX! Aswath Damodaran What is inflation? 2 Put simply,

Concentration inequalities G abor Lugosi ICREA and Pompeu Fabra - PowerPoint PPT Presentation

Concentration inequalities G abor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables. what is concentration? We are

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Concentration inequalities, the entropy method, search for super -concentration Concentration, ...

Concentration inequalities for occupancy models with log-concave marginals Jay Bartroff, Larry

Concentration Inequalities for Random Matrices M. Ledoux Institut de Math ematiques de

Inequalities for Symmetric Polynomials Curtis Greene October 24, 2009 Inequalities for Symmetric

Diffusion Contaminant at Contaminant Solutes (contaminants) migrate due to concentration

Probabilistic Program Analysis and Concentration of Measure Part I: Concentration of Measure

Concentration inequalities and the entropy method G abor Lugosi ICREA and Pompeu Fabra

Concentration for Coulomb gases and Coulomb transport inequalities Djalil Chafa 1 , Adrien Hardy

Wittens Laplacian and the Morse Inequalities Background Morse Inequalities Wittens Idea

Health inequalities slides Wirral January 2020 Version 1.1 Why health inequalities are

Health Inequalities: A postcode lottery Postcode Lottery Health Inequalities Health

Welcome Health inequalities What are health inequalities? Our presenters will be introducing the

The well- -baby vision baby vision The well Span of concentration Span of concentration

Concentration Risk Measures and De-concentration Optimization Luyang Fu, Ph.D., FCAS, MAAA March

Nitrate concentration vs. load : Nitrate concentration vs. load : Management options that

Real Vector Spaces, the Cauchy-Schwarz Inequality, &amp; Convex Functions in ACL2(r) Carl Kwan

The Great Recession and Racial Inequality: Evidence from Measures of Economic Well- Being by

On logarithmic Sobolev inequalities With a focus on the Heisenberg group Ronan Herry LAMA UMR

Who We Are Our mission at CFED is to make it possible for

The Discrepancy Function and the Small Ball Inequality in Higher Dimensions Dmitriy Bilyk

Microeconomics 3200/4200: Part 1 P. Piacquadio p.g.piacquadio@econ.uio.no August 17, 2017 P.

Slutsky Equation Molly W. Dahl Georgetown University Econ 101 Spring 2009 1 Effects of a

SESSION 10: INFLATION, THE HIDDEN TAX! Aswath Damodaran What is inflation? 2 Put simply,

Real Vector Spaces, the Cauchy-Schwarz Inequality, & Convex Functions in ACL2(r) Carl Kwan