Matrix-valued Chernoff Bounds and Applications China Theory Week - PowerPoint PPT Presentation

Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University of Toronto September 2010

Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

Law of Large Numbers Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X 1 ,..., X t be independent copies of a random variable X Goal: estimate the mean E [ X ] using samples X 1 ,..., X t Approximate by the empirical mean t � 1 X t ≈ E [ X ] t i = 1 How good is the approximation (non-asymptotics)?

Law of Large Numbers Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X 1 ,..., X t be independent copies of a random variable X Goal: estimate the mean E [ X ] using samples X 1 ,..., X t Approximate by the empirical mean t � 1 X t ≈ E [ X ] t i = 1 How good is the approximation (non-asymptotics)? Question: Is there a matrix-valued LLN?

Matrix-valued Random Variables Let ( Ω , F , P ) be a probability space. A matrix-valued random variable is a measurable function M : Ω → R d × d Its expectation is a d × d matrix, denote by E [ M ] ∈ R d × d Self-adjoint matrix-valued random variable: M : Ω → S d × d Caveat: Entries may or may not be correlated with each other

Matrix-valued Random Variables Let ( Ω , F , P ) be a probability space. A matrix-valued random variable is a measurable function M : Ω → R d × d Its expectation is a d × d matrix, denote by E [ M ] ∈ R d × d Self-adjoint matrix-valued random variable: M : Ω → S d × d Caveat: Entries may or may not be correlated with each other Matrix-valued random variable is a random matrix with (possibly) correlated entries

Real-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α

Real-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then  � �  � � � � t  �  � � − C ε 2 t   1  � �    X i − E [ X ]  � �  P > ε  ≤ 2exp .    � � t γ 2 � � i = 1

Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1

Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1 ...and many more...

Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1 Question: How would the matrix-valued generalizations look like?

Real-valued to Matrix-valued Is there a meaningful way to generalize the real-valued inequalities to matrix -valued? Would these inequalities be useful to us?

Real-valued to Matrix-valued Is there a meaningful way to generalize the real-valued inequalities to matrix -valued? Would these inequalities be useful to us? A , B ∈ S d × d α,̙ ∈ R Comments A � B A − B is p.s.d. α > ̙ � A � | α | Spectral norm e A e α Matrix Exponential

Matrix-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued r.v. and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α

Matrix-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued r.v. and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α Lemma (Matrix-valued Markov [AW02]) Let M � 0 be a self adjoint matrix-valued r.v. and α > 0 . Then P ( M � α · I ) ≤ tr ( E [ M ]) . α Remark: P ( M � α · I ) = P ( λ max ( M ) > α )

Matrix-valued Probabilistic Inequalities Theorem (Chernoff) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then  � �  � � � � t  �  � � − C ε 2 t   1   � �  X i − E [ X ]   � �  P > ε  ≤ 2exp .    � � t γ 2 � � i = 1

Matrix-valued Chernoff Bounds and Applications China Theory Week - PowerPoint PPT Presentation

Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University of Toronto September 2010 Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental

Randomness in Computing L ECTURE 10 Last time Chernoff Bounds Today Hoeffding Bounds

Chernoff approximation of diffusions and further applications Yana A. Butko Analysis and

Alex Psomas: Lecture 20. Chernoff and Erd os 1. Confidence intervals 2. Chernoff 3.

Randomized Algorithms Randomized Algorithms The Chernoff bound The Chernoff bound Speaker:

Many-Valued Logic Daniel Bonevac February 27, 2013 Daniel Bonevac Many-Valued Logic Rationales

Lower Bounds on Matrix Rigidity via a Quantum Argument Ronald de Wolf CWI Amsterdam Lower

CS70: Jean Walrand: Lecture 32. Chernoff, Jensen, Polling, Confidence Intervals, Linear Regression

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Balls-into-Bins Model and Chernoff Bounds Advanced Algorithms Nanjing University, Fall 2018

Randomized Algorithms Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

Algebraic Study of Lattice-Valued Logic and Lattice-Valued Modal Logic Yoshihiro Maruyama

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

VECTOR-VALUED FUNCTIONS MATH 200 MAIN QUESTIONS FOR TODAY Whats a vector valued function?

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

I/O Lower Bounds and Algorithms for Matrix-Matrix Multiplication Tyler M. Smith July 5, 2017 1

Model Selection and Fast Rates for Regularized Least-Squares Andrea Caponnetto 1 Plan

Kernels to detect abrupt changes in time series Alain Celisse 1 UMR 8524 CNRS - Universit e

Self-bounding functions and concentration of variance Andreas Maurer Advances in stochastic

A connection between the Uncertainty Principles on the real line (Heisenberg) and on the circle

SzegTaikov inequality for conjugate polynomials Polina Glazyrina Ural Federal University

Supremacy Experiments Complexity-Theoretic Foundations of Quantum . . . . . . 1 / 29 UT

New characterizations of completely monotone functions and Bernstein functions, a converse to

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W . Moore