Modern Discrete Probability VI - Spectral Techniques Background S - PowerPoint PPT Presentation

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Modern Discrete Probability VI - Spectral Techniques Background S´ ebastien Roch UW–Madison Mathematics December 1, 2014 S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Review 1 Bounding the mixing time via the spectral gap 2 Applications: random walk on cycle and hypercube 3 Infinite networks 4 S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Mixing time I Theorem (Convergence to stationarity) Consider a finite state space V. Suppose the transition matrix P is irreducible, aperiodic and has stationary distribution π . Then, for all x , y, P t ( x , y ) → π ( y ) as t → + ∞ . For probability measures µ, ν on V , let their total variation distance be � µ − ν � TV := sup A ⊆ V | µ ( A ) − ν ( A ) | . Definition (Mixing time) The mixing time is t mix ( ε ) := min { t ≥ 0 : d ( t ) ≤ ε } , where d ( t ) := max x ∈ V � P t ( x , · ) − π ( · ) � TV . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Mixing time II Definition (Separation distance) The separation distance is defined as 1 − P t ( x , y ) � � s x ( t ) := max , π ( y ) y ∈ V and we let s ( t ) := max x ∈ V s x ( t ) . Because both { π ( y ) } and { P t ( x , y ) } are non-negative and sum to 1, we have that s x ( t ) ≥ 0. Lemma (Separation distance v. total variation distance) d ( t ) ≤ s ( t ) . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Mixing time III y P t ( x , y ) , Proof: Because 1 = � y π ( y ) = � � � � � � � π ( y ) − P t ( x , y ) P t ( x , y ) − π ( y ) = . y : P t ( x , y ) <π ( y ) y : P t ( x , y ) ≥ π ( y ) So � P t ( x , · ) − π ( · ) � TV = 1 � � � � π ( y ) − P t ( x , y ) � � 2 � y � � � π ( y ) − P t ( x , y ) = y : P t ( x , y ) <π ( y ) 1 − P t ( x , y ) � � � = π ( y ) π ( y ) y : P t ( x , y ) <π ( y ) ≤ s x ( t ) . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Reversible chains Definition (Reversible chain) A transition matrix P is reversible w.r.t. a measure η if η ( x ) P ( x , y ) = η ( y ) P ( y , x ) for all x , y ∈ V . By summing over y , such a measure is necessarily stationary. S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Example I Recall: Definition (Random walk on a graph) Let G = ( V , E ) be a finite or countable, locally finite graph. Simple random walk on G is the Markov chain on V , started at an arbitrary vertex, which at each time picks a uniformly chosen neighbor of the current state. Let ( X t ) be simple random walk on a connected graph G . Then ( X t ) is reversible w.r.t. η ( v ) := δ ( v ) , where δ ( v ) is the degree of vertex v . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Example II Definition (Random walk on a network) Let G = ( V , E ) be a finite or countable, locally finite graph. Let c : E → R + be a positive edge weight function on G . We call N = ( G , c ) a network . Random walk on N is the Markov chain on V , started at an arbitrary vertex, which at each time picks a neighbor of the current state proportionally to the weight of the corresponding edge. Any countable, reversible Markov chain can be seen as a random walk on a network (not necessarily locally finite) by setting c ( e ) := π ( x ) P ( x , y ) = π ( y ) P ( y , x ) for all e = { x , y } ∈ E . Let ( X t ) be random walk on a network N = ( G , c ) . Then ( X t ) is reversible w.r.t. η ( v ) := c ( v ) , where c ( v ) := � x ∼ v c ( v , x ) . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenbasis I We let n := | V | < + ∞ . Assume that P is irreducible and reversible w.r.t. its stationary distribution π > 0. Define � � f � 2 � f , g � π := π ( x ) f ( x ) g ( x ) , π := � f , f � π , x ∈ V � ( Pf )( x ) := P ( x , y ) f ( y ) . y We let ℓ 2 ( V , π ) be the Hilbert space of real-valued functions on V equipped with the inner product �· , ·� π (equivalent to the vector space ( R n , �· , ·� π ) ). Theorem There is an orthonormal basis of ℓ 2 ( V , π ) formed of eigenfunctions { f j } n j = 1 of P with real eigenvalues { λ j } n j = 1 . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenbasis II Proof: We work over ( R n , �· , ·� π ) . Let D π be the diagonal matrix with π on the diagonal. By reversibility, � � π ( x ) π ( y ) M ( x , y ) := π ( y ) P ( x , y ) = π ( x ) P ( y , x ) =: M ( y , x ) . So M = ( M ( x , y )) x , y = D 1 / 2 π PD − 1 / 2 , as a symmetric matrix, has real π j = 1 forming an orthonormal basis of R n with corresponding eigenvectors { φ j } n j = 1 . Define f j := D − 1 / 2 real eigenvalues { λ j } n φ j . Then π Pf j = PD − 1 / 2 φ j = D − 1 / 2 D 1 / 2 π PD − 1 / 2 φ j = D − 1 / 2 M φ j = λ j D − 1 / 2 φ j = λ j f j , π π π π π and � f i , f j � π = � D − 1 / 2 φ i , D − 1 / 2 φ j � π π π � π ( x )[ π ( x ) − 1 / 2 φ i ( x )][ π ( x ) − 1 / 2 φ j ( x )] = x = � φ i , φ j � . S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenbasis III Lemma For all j � = 1 , � x π ( x ) f j ( x ) = 0 . Proof: By orthonormality, � f 1 , f j � π = 0. Now use the fact that f 1 ≡ 1. Let δ x ( y ) := ✶ { x = y } . Lemma For all x , y, � n j = 1 f j ( x ) f j ( y ) = π ( x ) − 1 δ x ( y ) . Proof: Using the notation of the theorem, the matrix Φ whose columns are the φ j s is unitary so ΦΦ ′ = I . That is, � n j = 1 φ j ( x ) φ j ( y ) = δ x ( y ) , or � n � π ( x ) π ( y ) f j ( x ) f j ( y ) = δ x ( y ) . Rearranging gives the result. j = 1 S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenbasis IV Lemma Let g ∈ ℓ 2 ( V , π ) . Then g = � n j = 1 � g , f j � π f j . Proof: By the previous lemma, for all x n n � � � � π ( y ) g ( y )[ π ( x ) − 1 δ x ( y )] = g ( x ) . � g , f j � π f j ( x ) = π ( y ) g ( y ) f j ( y ) f j ( x ) = j = 1 j = 1 y y Lemma π = � n Let g ∈ ℓ 2 ( V , π ) . Then � g � 2 j = 1 � g , f j � 2 π . Proof: By the previous lemma, 2 � � n � n n � n � � � g � 2 � � � � � � π = � g , f j � π f j = � g , f i � π f i , � g , f j � π f j = � g , f i � π � g , f j � π � f i , f j � π , � � � � j = 1 i = 1 j = 1 i , j = 1 � � π π S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Eigenvalues I Let P be finite, irreducible and reversible. Lemma Any eigenvalue λ of P satisfies | λ | ≤ 1 . Proof: Pf = λ f = ⇒ | λ |� f � ∞ = � Pf � ∞ = max x | � y P ( x , y ) f ( y ) | ≤ � f � ∞ We order the eigenvalues 1 ≥ λ 1 ≥ · · · ≥ λ n ≥ − 1. In fact: Lemma We have λ 1 = 1 and λ 2 < 1 . Also we can take f 1 ≡ 1 . Proof: Because P is stochastic, the all-one vector is a right eigenvector with eigenvalue 1. Any eigenfunction with eigenvalue 1 is P -harmonic. By Corollary 3.22 for a finite, irreducible chain the only harmonic functions are the constant functions. So the eigenspace corresponding to 1 is one-dimensional. Since all eigenvalues are real, we must have λ 2 < 1. S´ ebastien Roch, UW–Madison Modern Discrete Probability – Spectral Techniques

Modern Discrete Probability VI - Spectral Techniques Background S - PowerPoint PPT Presentation

Review Bounding the mixing time via the spectral gap Applications: random walk on cycle and hypercube Infinite networks Modern Discrete Probability VI - Spectral Techniques Background S ebastien Roch UWMadison Mathematics December 1,

Discrete Probability Each repetition of an experiment is called a trial . The result of each

DISCRETE PROBABILITY Discrete Probability is a finite or countable set called the

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

Modern Discrete Probability III - Stopping times and martingales Review S ebastien Roch

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Discrete Mathematics and Its Applications Lecture 5: Discrete Probability: Probability Basics

Discrete Probability CMPS/MATH 2170: Discrete Mathematics 1 Applications of Probability in

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

Segmentation Professor Fei Fei Li Stanford Vision Lab 1 19 Apr 11 Lecture 8 -

Three Dimensional Multi-Mode Rayleigh-Taylor and Richtmyer-Meshkov Instabilities at All Density

3 Lecture no: Narrow- and wideband channels Ove Edfors, Department of Electrical and

The Cut tree of the Brownian Continuum Random Tree and the Reverse Problem Minmin Wang Joint

Harmonic Rayleigh-Ritz for the multiparameter eigenvalue problem Bor Plestenjak Department of

Rayleigh- -Taylor instability Taylor instability Rayleigh in partially ionized in partially

Well-balanced DG scheme for Euler equations with gravity Praveen. C praveen@tifrbng.res.in

An Implementation and Analysis of the Refined Projection Method For (Jacobi-)Davidson Type Methods