numerical analysis and random matrix theory
play

Numerical analysis and random matrix theory Tom Trogdon - PowerPoint PPT Presentation

Numerical analysis and random matrix theory Tom Trogdon ttrogdon@math.uci.edu UC Irvine Acknowledgements This is joint work with: Percy Deift Govind Menon Sheehan Olver Raj Rao Numerical analysis and random matrix theory


  1. Numerical analysis and random matrix theory Tom Trogdon ttrogdon@math.uci.edu UC Irvine

  2. Acknowledgements This is joint work with: • Percy Deift • Govind Menon • Sheehan Olver • Raj Rao

  3. Numerical analysis and random matrix theory • Using techniques from numerical analysis to analyze random matrices Trotter (1984), Silverstein (1985), Edelman (1989), Dumitriu and Edelman (2002) • Computing distributions from random matrix theory Bornemann (2008), Witte, Bornemann and Forrester (2012), Olver and T (2014) • Generating samples from random matrix distributions Mezzadri (2006), Edelman and Rao (2005), Menon and Li (2014), Olver, Rao and T (2015) • Using random matrices to analyze algorithms, statistically Spielman and Teng (2009), Borgwardt (2012), Smale (1983, 1985), Deift and T (2016, 2017), Menon and T (2016), Feldheim, Paquette, and Zeitouni (2014) • Randomized linear algebra Liberty, Woolfe, Martinsson, Rokhlin, and Tygert (2007)

  4. Statistical analysis of algorithms • Smoothed analysis: Spielman and Teng (2009) • Simplex algorithm: Borgwardt (2012), Smale (1985) • Universality: Pfrang, Deift and Menon (2014)

  5. Universality in numerical computation w/ Percy Deift, Govind Menon and others

  6. Universality: A toy example Consider the numerical solution of the system H = H ⇤ , ( I � H ) x = b , k H k 2 < 1. The ensemble. Assume H = U diag ( λ 1 , λ 2 ,..., λ N ) U ∗ where ( λ j ) 1 ≤ j ≤ N are iid with Z P ( λ j ∈ B ) = p ( x ) d x . B ∩ [ − 1,1 ] The method. A naïve iterative scheme is given by x n = H x n − 1 + b , x 0 = 0 .

  7. Universality: A toy example The halting time. To keep things simple, define T = T ( H , ε ) to be the smallest integer T such that � T − 1 � � � � ( I − H ) − 1 − X H k < ε . � � � � � k = 0 2 The random variable T is the halting time. The question. What is the distribution of T ? Is it universal as N → ∞ ?

  8. Define λ min = min λ j , λ max = max λ j . Then for fixed t ≥ 0 as N → ∞ 1 − t = p ( 1 ) t ⇣ ⌘ N + o ( N − 1 ) , N ≤ λ j ≤ 1 P − 1 ≤ λ j ≤ − 1 + t = p ( − 1 ) t ⇣ ⌘ N + o ( N − 1 ) . P N Assume p ( 1 ) 6 = 0. Then as N ! 1 dist. p ( 1 ) N ( 1 � λ max ) �! exp ( 1 ) , dist. N ( 1 + λ min ) �! exp ( p ( � 1 )) . Because H is Hermitian ⇢ | λ max | n � � n − 1 | 1 − λ max | , | λ min | n � � � X H k � ( I − H ) − 1 − = max . � � � � | 1 − λ min | � k = 0 2

  9. For ε > 0, define t = t ( H , ε ) by ⇢ | λ max | t | λ min | t � max | 1 − λ max | , = ε . | 1 − λ min | Then T ( H , ε ) = max { 0, d t ( H , ε ) e } and ⇢ log ε � 1 | 1 � λ max | � 1 , log ε � 1 | 1 � λ min | � 1 � t ( H , ε ) = max . log | λ max | � 1 log | λ min | � 1 Define t max = log ε − 1 | 1 − λ max | − 1 t min = log ε − 1 | 1 − λ min | − 1 , . log | λ max | − 1 log | λ min | − 1

  10. “Universality” Two estimates are needed as N → ∞ : −→ 1 t max dist. ξ ∼ exp ( 1 ) , ξ , r ( 1 ) N log [ N ε − 1 ] t min = O ( N ) . Because t min is asymptotically smaller than t max it does not affect the limit: T ( H , ε ) −→ 1 dist. ξ ∼ exp ( 1 ) . ξ , r ( 1 ) N log [ N ε − 1 ]

  11. This establishes universality for T ( H , ε ) in this simple case. It required three main components: • A formula to estimate error. • Universality for key statistics ( λ max ). • Estimates for the rest ( λ min ).

  12. Significant work goes into determining the scaling. If the scaling is not known, but universality is going to be investigated, a normalization must be inferred from the data. Given a collection of samples of the random variable T = T ( H , ε ) , define the empirical fluctuations T ( H , ε ) − 〈 T 〉 τ ( H , ε ) = , σ T where 〈 T 〉 is the sample median and σ T is the median absolute deviation. 0.6 0.5 Two choices for p ( x ) : 0.4 • p ( x ) = 1 0.3 • p ( x ) = e − x / ( e − e − 1 ) 0.2 0.1 0.0 - 1.5 - 1.0 - 0.5 0.0 0.5 1.0 1.5 2.0

  13. The ( inverse ) power method The ensembles. A sample covariance matrix (SCM) is an N × N real symmetric ( β = 1) or complex Hermitian ( β = 2) matrix H = X ∗ X / M , X = ( X i j ) 1 ≤ i ≤ M ,1 ≤ j ≤ N iid, M ∼ N / d , 0 < d < 1, E | X i j | 2 = 1. E X i j = 0, For β = 2, E X 2 i j = 0 is imposed. We require uniform exponential tails. The method. The power method is given by ( x 0 specified) 0 H 2 k − 1 x 0 µ k = x ∗ → λ N = λ max , k → ∞ . 0 H 2 k − 2 x 0 x ∗ The halting time. The halting time is given by T ( H , x 0 , ε ) = min { n : | µ n − µ n − 1 | < ε 2 } . The question. What is the distribution of T ? Is it universal as N → ∞ ?

  14. Observing universality Using 4 complex SCMs, the fluctuations τ ( H , ε ) = T ( H , ε ) − 〈 T 〉 , σ T appear universal. 1.2 1.0 0.8 0.6 0.4 0.2 - 1.0 - 0.5 0.0 0.5 1.0 1.5

  15. A formula to estimate the error Let H = U Λ U ∗ and U ∗ x 0 = [ β 1 , β 2 ,..., β N ] T , Λ = diag ( λ 1 , λ 2 ,..., λ N ) , 0 ≤ λ 1 ≤ λ 2 ≤ ··· ≤ λ N . To estmate the halting time for the power method, we must find the large N and large k asymptotics for 2 Ç λ j å 2 k − 1 0 � � 1 N N − 1 β j � � X X | β j | 2 λ 2 k − 1 1 + � � j � � β N λ N B C � � j = 1 j = 1 B C = λ N µ k = . B C 2 Ç λ j B C N å 2 k − 2 � � N − 1 β j B C X | β j | 2 λ 2 k − 2 � � X 1 + @ A � � j � � β N λ N j = 1 � � j = 1

  16. A historical interlude For eigenvalues: • The seminal work of Geman (1980) showed that the largest eigenvalue of an SCM converges a.s. • Silverstein (1985) established that the smallest eigenvalue converges a.s. to λ − for iid standard normal random variables. • Johnstone (2001); Johansson(2000); Forrester (1993) gave the first results on the fluctuations of the largest and smallest eigenvalues for (real or complex) standard normal distributions. • Universality was obtained by Ben Arous and Péché (2005) (see also Tao and Vu (2012)). • We reference Pillai and Yin (2014) and Bloemendal, Knowles, Yau and Yin (2016) for the most comprehensive results. For eigenvectors: • The limits of the eigenvectors have also been considered in various ways, see Silverstein (1986); Bai, Miao and Pan (2007). • We reference Bloemendal, Knowles, Yau and Yin (2016) for the generality needed to prove our theorems. We require exponential tails which is stronger than the assumptions in Yin (1986); Geman (1980).

  17. Universality for key statistics For SCMs Theorem. N 1 / 2 ( | β N | , | β N − 1 | , | β N − 2 | ) converge jointly in distribution to ( | X 1 | , | X 2 | , | X 3 | ) where { X 1 , X 2 , X 3 } are iid real ( β = 1) or complex ( β = 2) standard normal random variables. Additionally, N 2 / 3 λ − 2 / 3 d 1 / 2 ( λ + − λ N , λ + − λ N − 1 , λ + − λ N − 2 ) + converge jointly in distribution to random variables ( Λ 1, β , Λ 2, β , Λ 3, β ) which are the smallest three eigenvalues of the so-called stochastic Airy operator. This follows from Ramírez, Rider and Virág (2011); Pillai and Yin (2014); Bloemendal, Knowles, Yau and Yin (2016).

  18. Estimates for the rest For any s > 0 Theorem (Pillai and Yin (2014)). | λ n − γ n | ≤ N − 2 / 3 + s ( max { n , N − n + 1 } ) − 1 / 3 for all n Ä ä lim = 1. N →∞ P For any s > 0 Theorem (Bloemendal, Knowles, Yau and Yin (2016)). | β n | ≤ N − 1 / 2 + s for all n Ä ä lim = 1. N →∞ P The first theorem is known as rigidity and it was first shown for Wigner ensembles by Erd˝ os, Yau and Yin (2012) (see also Bourgade, Erdös and Yau (2014)). The second theorem is known as delocalization.

  19. Universality for the ( inverse ) power method The distribution function F gap β ( t ) , supported on t ≥ 0 for β = 1,2 is defined by ! 1 F gap β ( t ) : = lim ≤ t . N →∞ P 2 − 7 / 6 N 2 / 3 λ − 2 / 3 d − 1 / 2 ( λ N − λ N − 1 ) + Let H be a real ( β = 1) or complex ( β = 2) SCM and let Theorem (Deift and T). x 0 be a random unit vector independent of H . Assuming ε ≤ N − 5 / 3 − σ ! T ( H , x 0 , ε ) = F gap lim ≤ t β ( t ) . N →∞ P 2 − 7 / 6 λ 1 / 3 + d − 1 / 2 N 2 / 3 ( log ε − 1 − 2 / 3log N )

  20. A note on the proof In the the proof we give the following approximation: � � � � ⇣ ⌘ log ✏ − 1 + log � β N − 1 λ N + 1 1 − 2 log 2 � N + log � � � � T ( H, x 0 , ✏ ) λ N − 1 β N prob . � � � → 0 − − � � N 2 / 3 N 2 / 3 log λ N � � λ N − 1 � � This we obtain: log ✏ − 1 − 2 3 log N T ( H, x 0 , ✏ ) prob . → 0 N 2 / 3 log N − − N 2 / 3 log N � − 1 + | � N − � N − 1 | Convergence is at an ( at best ) logarithmic rate.

  21. A demonstration β � 2 β � 1 1.0 1.0 0.8 0.8 Frequency Frequency 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0 1 2 3 4 0 1 2 3 4 Rescaled halting time Rescaled halting time Thanks to Forkmar Bornemann (TUM) ! T ( H , x 0 , ε ) = F gap lim ≤ t β ( t ) . N →∞ P 2 − 7 / 6 λ 1 / 3 + d − 1 / 2 N 2 / 3 ( log ε − 1 − 2 / 3log N + γ )

  22. Other theorems Similar techniques give universality theorems for: • The QR eigenvalue algorithm. • The Toda algorithm. See Deift and T (2016) and Deift and T (2017).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend