Numerical analysis and random matrix theory Tom Trogdon - PowerPoint PPT Presentation

Numerical analysis and random matrix theory Tom Trogdon ttrogdon@math.uci.edu UC Irvine

Acknowledgements This is joint work with: • Percy Deift • Govind Menon • Sheehan Olver • Raj Rao

Numerical analysis and random matrix theory • Using techniques from numerical analysis to analyze random matrices Trotter (1984), Silverstein (1985), Edelman (1989), Dumitriu and Edelman (2002) • Computing distributions from random matrix theory Bornemann (2008), Witte, Bornemann and Forrester (2012), Olver and T (2014) • Generating samples from random matrix distributions Mezzadri (2006), Edelman and Rao (2005), Menon and Li (2014), Olver, Rao and T (2015) • Using random matrices to analyze algorithms, statistically Spielman and Teng (2009), Borgwardt (2012), Smale (1983, 1985), Deift and T (2016, 2017), Menon and T (2016), Feldheim, Paquette, and Zeitouni (2014) • Randomized linear algebra Liberty, Woolfe, Martinsson, Rokhlin, and Tygert (2007)

Statistical analysis of algorithms • Smoothed analysis: Spielman and Teng (2009) • Simplex algorithm: Borgwardt (2012), Smale (1985) • Universality: Pfrang, Deift and Menon (2014)

Universality in numerical computation w/ Percy Deift, Govind Menon and others

Universality: A toy example Consider the numerical solution of the system H = H ⇤ , ( I � H ) x = b , k H k 2 < 1. The ensemble. Assume H = U diag ( λ 1 , λ 2 ,..., λ N ) U ∗ where ( λ j ) 1 ≤ j ≤ N are iid with Z P ( λ j ∈ B ) = p ( x ) d x . B ∩ [ − 1,1 ] The method. A naïve iterative scheme is given by x n = H x n − 1 + b , x 0 = 0 .

Universality: A toy example The halting time. To keep things simple, define T = T ( H , ε ) to be the smallest integer T such that � T − 1 � � � � ( I − H ) − 1 − X H k < ε . � � � � � k = 0 2 The random variable T is the halting time. The question. What is the distribution of T ? Is it universal as N → ∞ ?

Define λ min = min λ j , λ max = max λ j . Then for fixed t ≥ 0 as N → ∞ 1 − t = p ( 1 ) t ⇣ ⌘ N + o ( N − 1 ) , N ≤ λ j ≤ 1 P − 1 ≤ λ j ≤ − 1 + t = p ( − 1 ) t ⇣ ⌘ N + o ( N − 1 ) . P N Assume p ( 1 ) 6 = 0. Then as N ! 1 dist. p ( 1 ) N ( 1 � λ max ) �! exp ( 1 ) , dist. N ( 1 + λ min ) �! exp ( p ( � 1 )) . Because H is Hermitian ⇢ | λ max | n � � n − 1 | 1 − λ max | , | λ min | n � � � X H k � ( I − H ) − 1 − = max . � � � � | 1 − λ min | � k = 0 2

For ε > 0, define t = t ( H , ε ) by ⇢ | λ max | t | λ min | t � max | 1 − λ max | , = ε . | 1 − λ min | Then T ( H , ε ) = max { 0, d t ( H , ε ) e } and ⇢ log ε � 1 | 1 � λ max | � 1 , log ε � 1 | 1 � λ min | � 1 � t ( H , ε ) = max . log | λ max | � 1 log | λ min | � 1 Define t max = log ε − 1 | 1 − λ max | − 1 t min = log ε − 1 | 1 − λ min | − 1 , . log | λ max | − 1 log | λ min | − 1

“Universality” Two estimates are needed as N → ∞ : −→ 1 t max dist. ξ ∼ exp ( 1 ) , ξ , r ( 1 ) N log [ N ε − 1 ] t min = O ( N ) . Because t min is asymptotically smaller than t max it does not affect the limit: T ( H , ε ) −→ 1 dist. ξ ∼ exp ( 1 ) . ξ , r ( 1 ) N log [ N ε − 1 ]

This establishes universality for T ( H , ε ) in this simple case. It required three main components: • A formula to estimate error. • Universality for key statistics ( λ max ). • Estimates for the rest ( λ min ).

Significant work goes into determining the scaling. If the scaling is not known, but universality is going to be investigated, a normalization must be inferred from the data. Given a collection of samples of the random variable T = T ( H , ε ) , define the empirical fluctuations T ( H , ε ) − 〈 T 〉 τ ( H , ε ) = , σ T where 〈 T 〉 is the sample median and σ T is the median absolute deviation. 0.6 0.5 Two choices for p ( x ) : 0.4 • p ( x ) = 1 0.3 • p ( x ) = e − x / ( e − e − 1 ) 0.2 0.1 0.0 - 1.5 - 1.0 - 0.5 0.0 0.5 1.0 1.5 2.0

The ( inverse ) power method The ensembles. A sample covariance matrix (SCM) is an N × N real symmetric ( β = 1) or complex Hermitian ( β = 2) matrix H = X ∗ X / M , X = ( X i j ) 1 ≤ i ≤ M ,1 ≤ j ≤ N iid, M ∼ N / d , 0 < d < 1, E | X i j | 2 = 1. E X i j = 0, For β = 2, E X 2 i j = 0 is imposed. We require uniform exponential tails. The method. The power method is given by ( x 0 specified) 0 H 2 k − 1 x 0 µ k = x ∗ → λ N = λ max , k → ∞ . 0 H 2 k − 2 x 0 x ∗ The halting time. The halting time is given by T ( H , x 0 , ε ) = min { n : | µ n − µ n − 1 | < ε 2 } . The question. What is the distribution of T ? Is it universal as N → ∞ ?

Observing universality Using 4 complex SCMs, the fluctuations τ ( H , ε ) = T ( H , ε ) − 〈 T 〉 , σ T appear universal. 1.2 1.0 0.8 0.6 0.4 0.2 - 1.0 - 0.5 0.0 0.5 1.0 1.5

A formula to estimate the error Let H = U Λ U ∗ and U ∗ x 0 = [ β 1 , β 2 ,..., β N ] T , Λ = diag ( λ 1 , λ 2 ,..., λ N ) , 0 ≤ λ 1 ≤ λ 2 ≤ ··· ≤ λ N . To estmate the halting time for the power method, we must find the large N and large k asymptotics for 2 Ç λ j å 2 k − 1 0 � � 1 N N − 1 β j � � X X | β j | 2 λ 2 k − 1 1 + � � j � � β N λ N B C � � j = 1 j = 1 B C = λ N µ k = . B C 2 Ç λ j B C N å 2 k − 2 � � N − 1 β j B C X | β j | 2 λ 2 k − 2 � � X 1 + @ A � � j � � β N λ N j = 1 � � j = 1

A historical interlude For eigenvalues: • The seminal work of Geman (1980) showed that the largest eigenvalue of an SCM converges a.s. • Silverstein (1985) established that the smallest eigenvalue converges a.s. to λ − for iid standard normal random variables. • Johnstone (2001); Johansson(2000); Forrester (1993) gave the first results on the fluctuations of the largest and smallest eigenvalues for (real or complex) standard normal distributions. • Universality was obtained by Ben Arous and Péché (2005) (see also Tao and Vu (2012)). • We reference Pillai and Yin (2014) and Bloemendal, Knowles, Yau and Yin (2016) for the most comprehensive results. For eigenvectors: • The limits of the eigenvectors have also been considered in various ways, see Silverstein (1986); Bai, Miao and Pan (2007). • We reference Bloemendal, Knowles, Yau and Yin (2016) for the generality needed to prove our theorems. We require exponential tails which is stronger than the assumptions in Yin (1986); Geman (1980).

Universality for key statistics For SCMs Theorem. N 1 / 2 ( | β N | , | β N − 1 | , | β N − 2 | ) converge jointly in distribution to ( | X 1 | , | X 2 | , | X 3 | ) where { X 1 , X 2 , X 3 } are iid real ( β = 1) or complex ( β = 2) standard normal random variables. Additionally, N 2 / 3 λ − 2 / 3 d 1 / 2 ( λ + − λ N , λ + − λ N − 1 , λ + − λ N − 2 ) + converge jointly in distribution to random variables ( Λ 1, β , Λ 2, β , Λ 3, β ) which are the smallest three eigenvalues of the so-called stochastic Airy operator. This follows from Ramírez, Rider and Virág (2011); Pillai and Yin (2014); Bloemendal, Knowles, Yau and Yin (2016).

Estimates for the rest For any s > 0 Theorem (Pillai and Yin (2014)). | λ n − γ n | ≤ N − 2 / 3 + s ( max { n , N − n + 1 } ) − 1 / 3 for all n Ä ä lim = 1. N →∞ P For any s > 0 Theorem (Bloemendal, Knowles, Yau and Yin (2016)). | β n | ≤ N − 1 / 2 + s for all n Ä ä lim = 1. N →∞ P The first theorem is known as rigidity and it was first shown for Wigner ensembles by Erd˝ os, Yau and Yin (2012) (see also Bourgade, Erdös and Yau (2014)). The second theorem is known as delocalization.

Universality for the ( inverse ) power method The distribution function F gap β ( t ) , supported on t ≥ 0 for β = 1,2 is defined by ! 1 F gap β ( t ) : = lim ≤ t . N →∞ P 2 − 7 / 6 N 2 / 3 λ − 2 / 3 d − 1 / 2 ( λ N − λ N − 1 ) + Let H be a real ( β = 1) or complex ( β = 2) SCM and let Theorem (Deift and T). x 0 be a random unit vector independent of H . Assuming ε ≤ N − 5 / 3 − σ ! T ( H , x 0 , ε ) = F gap lim ≤ t β ( t ) . N →∞ P 2 − 7 / 6 λ 1 / 3 + d − 1 / 2 N 2 / 3 ( log ε − 1 − 2 / 3log N )

A note on the proof In the the proof we give the following approximation: � � � � ⇣ ⌘ log ✏ − 1 + log � β N − 1 λ N + 1 1 − 2 log 2 � N + log � � � � T ( H, x 0 , ✏ ) λ N − 1 β N prob . � � � → 0 − − � � N 2 / 3 N 2 / 3 log λ N � � λ N − 1 � � This we obtain: log ✏ − 1 − 2 3 log N T ( H, x 0 , ✏ ) prob . → 0 N 2 / 3 log N − − N 2 / 3 log N � − 1 + | � N − � N − 1 | Convergence is at an ( at best ) logarithmic rate.

A demonstration β � 2 β � 1 1.0 1.0 0.8 0.8 Frequency Frequency 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0 1 2 3 4 0 1 2 3 4 Rescaled halting time Rescaled halting time Thanks to Forkmar Bornemann (TUM) ! T ( H , x 0 , ε ) = F gap lim ≤ t β ( t ) . N →∞ P 2 − 7 / 6 λ 1 / 3 + d − 1 / 2 N 2 / 3 ( log ε − 1 − 2 / 3log N + γ )

Other theorems Similar techniques give universality theorems for: • The QR eigenvalue algorithm. • The Toda algorithm. See Deift and T (2016) and Deift and T (2017).

Numerical analysis and random matrix theory Tom Trogdon - PowerPoint PPT Presentation

Numerical analysis and random matrix theory Tom Trogdon ttrogdon@math.uci.edu UC Irvine Acknowledgements This is joint work with: Percy Deift Govind Menon Sheehan Olver Raj Rao Numerical analysis and random matrix theory

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of

Random Matrix Theory in a nutshell and applications Manuela Girotti IFT 6085, February 27th,

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Lecture 2. Random Matrix Theory and Phase Transitions of PCA Yuan Yao Hong Kong University of

Seeking Gold in Sand Applying Random Matrix Theory to Separation of Signals from Noise in Stock

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

Anatomy and Interpretability of Neural Networks Leon Yin ~ Data Scientist | Research Engineer

Puzzling Neutron: A Window to Dark Matter? A Detective Story in three parts A Detective Story

Analysis and Computation for Analysis and Computation for Nonlinear Eigenvalue Eigenvalue

Exploring Social Tagging Graph for Web Object Classification Zhijun Yin, Rui Li, Qiaozhu Mei,

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm

YIN Moot Court: Blossom Online Dating Friday 20 th April 2017 09:00 to 10:30 Introduction Sanjay

An optimistic view on inflaton hunt Fumi Takahashi (Tohoku) Mar. 7 2019@Tohoku University Based

Eigenvector Correlations for the Ginibre Ensemble Nick Crawford; The Technion July 24, 2018