matrix valued chernoff bounds and applications
play

Matrix-valued Chernoff Bounds and Applications China Theory Week - PowerPoint PPT Presentation

Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University of Toronto September 2010 Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental


  1. Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University of Toronto September 2010

  2. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  3. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  4. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  5. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  6. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  7. Law of Large Numbers Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X 1 ,..., X t be independent copies of a random variable X Goal: estimate the mean E [ X ] using samples X 1 ,..., X t Approximate by the empirical mean t � 1 X t ≈ E [ X ] t i = 1 How good is the approximation (non-asymptotics)?

  8. Law of Large Numbers Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X 1 ,..., X t be independent copies of a random variable X Goal: estimate the mean E [ X ] using samples X 1 ,..., X t Approximate by the empirical mean t � 1 X t ≈ E [ X ] t i = 1 How good is the approximation (non-asymptotics)? Question: Is there a matrix-valued LLN?

  9. Matrix-valued Random Variables Let ( Ω , F , P ) be a probability space. A matrix-valued random variable is a measurable function M : Ω → R d × d Its expectation is a d × d matrix, denote by E [ M ] ∈ R d × d Self-adjoint matrix-valued random variable: M : Ω → S d × d Caveat: Entries may or may not be correlated with each other

  10. Matrix-valued Random Variables Let ( Ω , F , P ) be a probability space. A matrix-valued random variable is a measurable function M : Ω → R d × d Its expectation is a d × d matrix, denote by E [ M ] ∈ R d × d Self-adjoint matrix-valued random variable: M : Ω → S d × d Caveat: Entries may or may not be correlated with each other Matrix-valued random variable is a random matrix with (possibly) correlated entries

  11. Real-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α

  12. Real-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then  � �  � � � � t  �  � � − C ε 2 t   1  � �    X i − E [ X ]  � �  P > ε  ≤ 2exp .    � � t γ 2 � � i = 1

  13. Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1

  14. Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1 ...and many more...

  15. Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1 Question: How would the matrix-valued generalizations look like?

  16. Real-valued to Matrix-valued Is there a meaningful way to generalize the real-valued inequalities to matrix -valued? Would these inequalities be useful to us?

  17. Real-valued to Matrix-valued Is there a meaningful way to generalize the real-valued inequalities to matrix -valued? Would these inequalities be useful to us? A , B ∈ S d × d α,̙ ∈ R Comments A � B A − B is p.s.d. α > ̙ � A � | α | Spectral norm e A e α Matrix Exponential

  18. Matrix-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued r.v. and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α

  19. Matrix-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued r.v. and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α Lemma (Matrix-valued Markov [AW02]) Let M � 0 be a self adjoint matrix-valued r.v. and α > 0 . Then P ( M � α · I ) ≤ tr ( E [ M ]) . α Remark: P ( M � α · I ) = P ( λ max ( M ) > α )

  20. Matrix-valued Probabilistic Inequalities Theorem (Chernoff) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then  � �  � � � � t  �  � � − C ε 2 t   1   � �  X i − E [ X ]   � �  P > ε  ≤ 2exp .    � � t γ 2 � � i = 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend