[PPT] - Concentration Inequalities for Random Matrices M. Ledoux Institut PowerPoint Presentation

SLIDE 1

Concentration Inequalities for Random Matrices

M. Ledoux

Institut de Math´ ematiques de Toulouse, France

SLIDE 2

exponential tail inequalities classical theme in probability and statistics

SLIDE 3

exponential tail inequalities classical theme in probability and statistics quantify the asymptotic statements

SLIDE 4

exponential tail inequalities classical theme in probability and statistics quantify the asymptotic statements central limit theorems large deviation principles

SLIDE 5

classical exponential inequalities sum of independent random variables Sn = 1 √n (X1 + · · · + Xn)

SLIDE 6

classical exponential inequalities sum of independent random variables Sn = 1 √n (X1 + · · · + Xn) 0 ≤ Xi ≤ 1 independent P

Sn ≥ E(Sn) + t
≤ e−t2/2,

t ≥ 0 Hoeffding’s inequality

SLIDE 7

classical exponential inequalities sum of independent random variables Sn = 1 √n (X1 + · · · + Xn) 0 ≤ Xi ≤ 1 independent P

Sn ≥ E(Sn) + t
≤ e−t2/2,

t ≥ 0 Hoeffding’s inequality same as for Xi standard Gaussian central limit theorem

SLIDE 8

measure concentration ideas

SLIDE 9

measure concentration ideas asymptotic geometric analysis

V. Milman (1970)

SLIDE 10

measure concentration ideas asymptotic geometric analysis

V. Milman (1970)

Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R Lipschitz

SLIDE 11

measure concentration ideas asymptotic geometric analysis

V. Milman (1970)

Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R Lipschitz Gaussian sample

SLIDE 12

measure concentration ideas asymptotic geometric analysis

V. Milman (1970)

Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R Lipschitz Gaussian sample independent random variables

SLIDE 13

concentration inequalities Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R 1-Lipschitz

SLIDE 14

concentration inequalities Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R 1-Lipschitz X1, . . . , Xn independenty standard Gaussian P

F(X) ≥ E
F(X)
+ t
≤ e−t2/2,

t ≥ 0

SLIDE 15

concentration inequalities Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R 1-Lipschitz X1, . . . , Xn independenty standard Gaussian P

F(X) ≥ E
F(X)
+ t
≤ e−t2/2,

t ≥ 0 0 ≤ Xi ≤ 1 independent, F 1-Lipschitz

SLIDE 16

concentration inequalities Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R 1-Lipschitz X1, . . . , Xn independenty standard Gaussian P

F(X) ≥ E
F(X)
+ t
≤ e−t2/2,

t ≥ 0 0 ≤ Xi ≤ 1 independent, F 1-Lipschitz and convex

SLIDE 17

concentration inequalities Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R 1-Lipschitz X1, . . . , Xn independenty standard Gaussian P

F(X) ≥ E
F(X)
+ t
≤ e−t2/2,

t ≥ 0 0 ≤ Xi ≤ 1 independent, F 1-Lipschitz and convex P

F(X) ≥ E
F(X)
+ t
≤ 2 e−t2/4,

t ≥ 0

SLIDE 18

concentration inequalities Sn = 1 √n (X1 + · · · + Xn) F(X) = F(X1, . . . , Xn), F : Rn → R 1-Lipschitz X1, . . . , Xn independenty standard Gaussian P

F(X) ≥ E
F(X)
+ t
≤ e−t2/2,

t ≥ 0 0 ≤ Xi ≤ 1 independent, F 1-Lipschitz and convex P

F(X) ≥ E
F(X)
+ t
≤ 2 e−t2/4,

t ≥ 0

M. Talagrand (1995)

SLIDE 19

empirical processes X1, . . . , Xn independent with values in (S, S) F collection of functions f : S → [0, 1]

SLIDE 20

empirical processes X1, . . . , Xn independent with values in (S, S) F collection of functions f : S → [0, 1] Z = sup

f ∈F n

i=1

f (Xi)

SLIDE 21

empirical processes X1, . . . , Xn independent with values in (S, S) F collection of functions f : S → [0, 1] Z = sup

f ∈F n

i=1

f (Xi) Z Lipschitz and convex

SLIDE 22

empirical processes X1, . . . , Xn independent with values in (S, S) F collection of functions f : S → [0, 1] Z = sup

f ∈F n

i=1

f (Xi) Z Lipschitz and convex concentration inequalities on P

Z − E(Z)
≥ t
,

t ≥ 0

SLIDE 23

Z = sup

f ∈F n

i=1

f (Xi) |f | ≤ 1, E(f (Xi)) = 0, f ∈ F

SLIDE 24

Z = sup

f ∈F n

i=1

f (Xi) |f | ≤ 1, E(f (Xi)) = 0, f ∈ F P

|Z − M| ≥ t
≤ C exp
− t

C log

1 +

t σ2 + M

,

t ≥ 0 C > 0 numerical constant, M mean or median of Z σ2 = supf ∈F n

i=1 E(f 2(Xi))

SLIDE 25

Z = sup

f ∈F n

i=1

f (Xi) |f | ≤ 1, E(f (Xi)) = 0, f ∈ F P

|Z − M| ≥ t
≤ C exp
− t

C log

1 +

t σ2 + M

,

t ≥ 0 C > 0 numerical constant, M mean or median of Z σ2 = supf ∈F n

i=1 E(f 2(Xi))

M. Talagrand (1996)

SLIDE 26

Z = sup

f ∈F n

i=1

f (Xi) |f | ≤ 1, E(f (Xi)) = 0, f ∈ F P

|Z − M| ≥ t
≤ C exp
− t

C log

1 +

t σ2 + M

,

t ≥ 0 C > 0 numerical constant, M mean or median of Z σ2 = supf ∈F n

i=1 E(f 2(Xi))

M. Talagrand (1996)
P. Massart (2000)
S. Boucheron, G. Lugosi, P. Massart (2005)

SLIDE 27

Z = sup

f ∈F n

i=1

f (Xi) |f | ≤ 1, E(f (Xi)) = 0, f ∈ F P

|Z − M| ≥ t
≤ C exp
− t

C log

1 +

t σ2 + M

,

t ≥ 0 C > 0 numerical constant, M mean or median of Z σ2 = supf ∈F n

i=1 E(f 2(Xi))

M. Talagrand (1996)
P. Massart (2000)
S. Boucheron, G. Lugosi, P. Massart (2005)

P.-M. Samson (2000) (dependence)

SLIDE 28

concentration inequalities numerous applications

SLIDE 29

concentration inequalities numerous applications

geometric functional analysis
discrete and combinatorial probability
empirical processes
statistical mechanics
random matrix theory

SLIDE 30

concentration inequalities numerous applications

geometric functional analysis
discrete and combinatorial probability
empirical processes
statistical mechanics
random matrix theory

SLIDE 31

recent studies of random matrix and random growth models

SLIDE 32

recent studies of random matrix and random growth models new asymptotics

SLIDE 33

recent studies of random matrix and random growth models new asymptotics common, non-central, rate (mean)1/3

SLIDE 34

recent studies of random matrix and random growth models new asymptotics common, non-central, rate (mean)1/3 universal limiting Tracy-Widom distribution

SLIDE 35

recent studies of random matrix and random growth models new asymptotics common, non-central, rate (mean)1/3 universal limiting Tracy-Widom distribution random matrices, longest increasing subsequence, random growth models, last passage percolation...

SLIDE 36

sample covariance matrices multivariate statistical inference principal component analysis

SLIDE 37

sample covariance matrices multivariate statistical inference principal component analysis population (Y1, . . . , YN) Yj vectors (column) in RM (characters)

SLIDE 38

sample covariance matrices multivariate statistical inference principal component analysis population (Y1, . . . , YN) Yj vectors (column) in RM (characters) Y = (Y1, . . . , YN) M × N matrix

SLIDE 39

sample covariance matrices multivariate statistical inference principal component analysis population (Y1, . . . , YN) Yj vectors (column) in RM (characters) Y = (Y1, . . . , YN) M × N matrix sample covariance matrix Y Y t (M × M)

SLIDE 40

sample covariance matrices multivariate statistical inference principal component analysis population (Y1, . . . , YN) Yj vectors (column) in RM (characters) Y = (Y1, . . . , YN) M × N matrix sample covariance matrix Y Y t (M × M) (independent) Gaussian Yj : Wishart matrix models

SLIDE 41

is Y Y t a good approximation of the population covariance matrix E(Y Y t) ?

SLIDE 42

is Y Y t a good approximation of the population covariance matrix E(Y Y t) ? M finite 1 N Y Y t → E(Y Y t) N → ∞

SLIDE 43

is Y Y t a good approximation of the population covariance matrix E(Y Y t) ? M finite 1 N Y Y t → E(Y Y t) N → ∞ M infinite ?

SLIDE 44

is Y Y t a good approximation of the population covariance matrix E(Y Y t) ? M finite 1 N Y Y t → E(Y Y t) N → ∞ M infinite ? M = M(N) → ∞ N → ∞

SLIDE 45

is Y Y t a good approximation of the population covariance matrix E(Y Y t) ? M finite 1 N Y Y t → E(Y Y t) N → ∞ M infinite ? M = M(N) → ∞ N → ∞ M N ∼ ρ ∈ (0, ∞) N → ∞

SLIDE 46

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix

SLIDE 47

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N Yij independent identically distributed (real or complex) E(Yij) = 0, E(Y 2

ij ) = 1

SLIDE 48

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N Yij independent identically distributed (real or complex) E(Yij) = 0, E(Y 2

ij ) = 1

Wishart model : Yj standard Gaussian in RM

SLIDE 49

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N Yij independent identically distributed (real or complex) E(Yij) = 0, E(Y 2

ij ) = 1

Wishart model : Yj standard Gaussian in RM numerous extensions

SLIDE 50

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N iid E(Yij) = 0, E(Y 2

ij ) = 1

SLIDE 51

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N iid E(Yij) = 0, E(Y 2

ij ) = 1

center of interest : eigenvalues 0 ≤ λN

1 ≤ · · · ≤ λN M

f

Y Y t (M × M non-negative symmetric matrix)

SLIDE 52

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N iid E(Yij) = 0, E(Y 2

ij ) = 1

center of interest : eigenvalues 0 ≤ λN

1 ≤ · · · ≤ λN M

f

Y Y t (M × M non-negative symmetric matrix)

λN

k

singular values of Y

SLIDE 53

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N iid E(Yij) = 0, E(Y 2

ij ) = 1

center of interest : eigenvalues 0 ≤ λN

1 ≤ · · · ≤ λN M

f

Y Y t (M × M non-negative symmetric matrix)

λN

k

singular values of Y

λN

k = λN k

N eigenvalues of 1 N Y Y t

SLIDE 54

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N iid E(Yij) = 0, E(Y 2

ij ) = 1

center of interest : eigenvalues 0 ≤ λN

1 ≤ · · · ≤ λN M

f

Y Y t (M × M non-negative symmetric matrix)

λN

k

singular values of Y

λN

k = λN k

N eigenvalues of 1 N Y Y t spectral measure 1 M

M

k=1

δ

λN

k

SLIDE 55

sample covariance matrices Y = (Y1, . . . , YN) M × N matrix Y = (Yij)1≤i≤M,1≤j≤N iid E(Yij) = 0, E(Y 2

ij ) = 1

center of interest : eigenvalues 0 ≤ λN

1 ≤ · · · ≤ λN M

f

Y Y t (M × M non-negative symmetric matrix)

λN

k

singular values of Y

λN

k = λN k

N eigenvalues of 1 N Y Y t spectral measure 1 M

M

k=1

δ

λN

k

asymptotics M = M(N) ∼ ρ N N → ∞

SLIDE 56

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ

N k = λN k /N)

SLIDE 57

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ

N k = λN k /N)

1 M

M

k=1

δ

λN

k

→ ν Marchenko-Pastur distribution

SLIDE 58

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ

N k = λN k /N)

1 M

M

k=1

δ

λN

k

→ ν Marchenko-Pastur distribution dν(x) =

1 − 1

ρ

+δ0 +

1 ρ 2πx

(b − x)(x − a) 1[a,b]dx

SLIDE 59

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ

N k = λN k /N)

1 M

M

k=1

δ

λN

k

→ ν Marchenko-Pastur distribution dν(x) =

1 − 1

ρ

+δ0 +

1 ρ 2πx

(b − x)(x − a) 1[a,b]dx

a = a(ρ) =

1 − √ρ

2 b = b(ρ) =

1 + √ρ

2

SLIDE 60

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ

N k = λN k /N)

1 M

M

k=1

δ

λN

k

→ ν Marchenko-Pastur distribution dν(x) =

1 − 1

ρ

+δ0 +

1 ρ 2πx

(b − x)(x − a) 1[a,b]dx

a = a(ρ) =

1 − √ρ

2 b = b(ρ) =

1 + √ρ

2

SLIDE 61

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

global regime

SLIDE 62

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

global regime large deviation asymptotics of the spectral measure

SLIDE 63

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure

SLIDE 64

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure

M

k=1
f
λN

k

−
R f dν
→ G

Gaussian variable f : R → R smooth

SLIDE 65

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

local regime

SLIDE 66

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

local regime behavior of the individual eigenvalues

SLIDE 67

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

local regime behavior of the individual eigenvalues spacings (bulk behavior)

SLIDE 68

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

local regime behavior of the individual eigenvalues spacings (bulk behavior) extremal eigenvalues (edge behavior)

SLIDE 69

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

SLIDE 70

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N

SLIDE 71

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N

SLIDE 72

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ

N k = λN k /N)

1 M

M

k=1

δ

λN

k

→ ν Marchenko-Pastur distribution dν(x) =

1 − 1

ρ

+δ0 +

1 ρ 2πx

(b − x)(x − a) 1[a,b]dx

a = a(ρ) =

1 − √ρ

2 b = b(ρ) =

1 + √ρ

2

SLIDE 73

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N

SLIDE 74

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ)

SLIDE 75

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices)

SLIDE 76

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M2/3 λN

M − b(ρ)

→ C(ρ) FTW

SLIDE 77

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M2/3N−1 λN

M − b(ρ)N

→ C(ρ) FTW

SLIDE 78

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M2/3N−1 λN

M − b(ρ)N

→ C(ρ) FTW

FTW

C. Tracy, H. Widom (1994) distribution

SLIDE 79

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M2/3N−1 λN

M − b(ρ)N

→ C(ρ) FTW

FTW

C. Tracy, H. Widom (1994) distribution
K. Johansson (2000), I. Johnstone (2001)

SLIDE 80

FTW

C. Tracy, H. Widom (1994) distribution

(complex) FTW(s) = exp

−

∞

s

(x − s)u(x)2dx

,

s ∈ R u′′ = 2u3 + xu Painlev´ e II equation

SLIDE 81

FTW

C. Tracy, H. Widom (1994) distribution

(complex) FTW(s) = exp

−

∞

s

(x − s)u(x)2dx

,

s ∈ R u′′ = 2u3 + xu Painlev´ e II equation density

SLIDE 82

mean ≃ −1.77 FTW(s) ∼ e−s3/12 as s → −∞ 1 − FTW(s) ∼ e−4s3/2/3 as s → +∞ density (similar for real case)

SLIDE 83

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M2/3 λN

M − b(ρ)

→ C(ρ) FTW

FTW

C. Tracy, H. Widom (1994) distribution
K. Johansson (2000), I. Johnstone (2001)

SLIDE 84

Gaussian (Wishart matrices)

SLIDE 85

Gaussian (Wishart matrices) completely solvable models

SLIDE 86

Gaussian (Wishart matrices) completely solvable models determinantal structure

rthogonal polynomial analysis

SLIDE 87

Gaussian (Wishart matrices) completely solvable models determinantal structure

rthogonal polynomial analysis

asymptotics of Laguerre orthogonal polynomials

SLIDE 88

Gaussian (Wishart matrices) completely solvable models determinantal structure

rthogonal polynomial analysis

asymptotics of Laguerre orthogonal polynomials

C. Tracy, H. Widom (1994)
K. Johansson (2000), I. Johnstone (2001)

SLIDE 89

extension to non-Gaussian matrices

SLIDE 90

extension to non-Gaussian matrices

A. Soshnikov (2001-02)

moment method E

Tr
(YY t)p

SLIDE 91

extension to non-Gaussian matrices

A. Soshnikov (2001-02)

moment method E

Tr
(YY t)p
L. Erd¨
s, H.-T. Yau (2009-12) (and collaborators)

local Marchenko-Pastur law

T. Tao, V. Vu (2010-11)

Lindeberg comparison method

SLIDE 92

extension to non-Gaussian matrices

A. Soshnikov (2001-02)

moment method E

Tr
(YY t)p
L. Erd¨
s, H.-T. Yau (2009-12) (and collaborators)

local Marchenko-Pastur law

T. Tao, V. Vu (2010-11)

Lindeberg comparison method symmetric matrices

SLIDE 93

(brief) survey of recent approaches to non-asymptotic exponential inequalities

SLIDE 94

(brief) survey of recent approaches to non-asymptotic exponential inequalities quantify the limit theorems

SLIDE 95

(brief) survey of recent approaches to non-asymptotic exponential inequalities quantify the limit theorems spectral measure

SLIDE 96

(brief) survey of recent approaches to non-asymptotic exponential inequalities quantify the limit theorems spectral measure extremal eigenvalues catch the new rate (mean)1/3

SLIDE 97

(brief) survey of recent approaches to non-asymptotic exponential inequalities quantify the limit theorems spectral measure extremal eigenvalues catch the new rate (mean)1/3 from the Gaussian case to non-Gaussian models

SLIDE 98

two main questions and objectives

SLIDE 99

two main questions and objectives tail inequalities for the spectral measure P M

k=1

f ( λN

k ) ≥ t

SLIDE 100

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure

M

k=1
f
λN

k

−
R f dν
→ G

Gaussian variable f : R → R smooth

SLIDE 101

two main questions and objectives tail inequalities for the spectral measure P M

k=1

f ( λN

k ) ≥ t

SLIDE 102

two main questions and objectives tail inequalities for the spectral measure P M

k=1

f ( λN

k ) ≥ t

tail inequalities for the extremal eigenvalues

P λN

M ≥ b(ρ) + ε

SLIDE 103

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M2/3 λN

M − b(ρ)

→ C(ρ) FTW

FTW

C. Tracy, H. Widom (1994) distribution
K. Johansson (2000), I. Johnstone (2001)

SLIDE 104

two main questions and objectives tail inequalities for the spectral measure P M

k=1

f ( λN

k ) ≥ t

tail inequalities for the extremal eigenvalues

P λN

M ≥ b(ρ) + ε

SLIDE 105

two main questions and objectives tail inequalities for the spectral measure P M

k=1

f ( λN

k ) ≥ t

tail inequalities for the extremal eigenvalues

P λN

M ≥ b(ρ) + ε

Wishart matrices

more general covariance matrices

SLIDE 106

measure concentration tool

SLIDE 107

measure concentration tool F = F(Y Y t) = F(Yij)

SLIDE 108

measure concentration tool F = F(Y Y t) = F(Yij) satisfactory for the global regime

SLIDE 109

measure concentration tool F = F(Y Y t) = F(Yij) satisfactory for the global regime less satisfactory for the local regime

SLIDE 110

measure concentration tool F = F(Y Y t) = F(Yij) satisfactory for the global regime less satisfactory for the local regime specific functionals eigenvalue counting function extreme eigenvalues

SLIDE 111

two main questions and objectives tail inequalities for the spectral measure P M

k=1

f ( λN

k ) ≥ t

tail inequalities for the extremal eigenvalues

P λN

M ≥ b(ρ) + ε

Wishart matrices

more general covariance matrices

SLIDE 112

tail inequalities for the spectral measure

A. Guionnet, O. Zeitouni (2000)

measure concentration tool

SLIDE 113

tail inequalities for the spectral measure

A. Guionnet, O. Zeitouni (2000)

measure concentration tool f : R → R smooth (Lipschitz)

SLIDE 114

tail inequalities for the spectral measure

A. Guionnet, O. Zeitouni (2000)

measure concentration tool f : R → R smooth (Lipschitz) X = (Xij)1≤i,j≤M M × M symmetric matrix eigenvalues λ1 ≤ · · · ≤ λM

SLIDE 115

tail inequalities for the spectral measure

A. Guionnet, O. Zeitouni (2000)

measure concentration tool f : R → R smooth (Lipschitz) X = (Xij)1≤i,j≤M M × M symmetric matrix eigenvalues λ1 ≤ · · · ≤ λM F : X → Tr f (X) =

M

k=1

f (λk) Lipschitz with respect to the Euclidean structure on M × M matrices

SLIDE 116

tail inequalities for the spectral measure

A. Guionnet, O. Zeitouni (2000)

measure concentration tool f : R → R smooth (Lipschitz) X = (Xij)1≤i,j≤M M × M symmetric matrix eigenvalues λ1 ≤ · · · ≤ λM F : X → Tr f (X) =

M

k=1

f (λk) Lipschitz with respect to the Euclidean structure on M × M matrices convex if f is convex

SLIDE 117

concentration inequalities Sn =

1 √n (X1 + · · · + Xn)

F(X) = F(X1, . . . , Xn), F : Rn → R 1-Lipschitz X1, . . . , Xn independenty standard Gaussian P

F(X) ≥ E
F(X)
+ t
≤ e−t2/2,

t ≥ 0 0 ≤ Xi ≤ 1 independent, F 1-Lipschitz and convex P

F(X) ≥ E
F(X)
+ t
≤ 2 e−t2/4,

t ≥ 0

M. Talagrand (1995)

SLIDE 118

tail inequalities for the spectral measure

SLIDE 119

tail inequalities for the spectral measure Gaussian entries Yij f : R → R such that f (x2) 1-Lipschitz

SLIDE 120

tail inequalities for the spectral measure Gaussian entries Yij f : R → R such that f (x2) 1-Lipschitz P M

k=1
f (

λN

k ) − E

f (

λN

k )

≥ t
≤ C(ρ) e−t2/C(ρ),

t ≥ 0

SLIDE 121

tail inequalities for the spectral measure Gaussian entries Yij f : R → R such that f (x2) 1-Lipschitz P M

k=1
f (

λN

k ) − E

f (

λN

k )

≥ t
≤ C(ρ) e−t2/C(ρ),

t ≥ 0 compactly supported entries Yij f : R → R such that f (x2) 1-Lipschitz and convex

SLIDE 122

Marchenko-Pastur theorem 1 M

M

k=1

δ

λN

k

→ ν

n
a(ρ), b(ρ)
M ∼ ρ N

global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure

M

k=1
f
λN

k

−
R f dν
→ G

Gaussian variable f : R → R smooth

SLIDE 123

non-Lipschitz functions f

SLIDE 124

non-Lipschitz functions f typically f = 1I, I ⊂ R interval

M

k=1

f

λN

k

= #
λN

k ∈ I

= NI

counting function

SLIDE 125

non-Lipschitz functions f typically f = 1I, I ⊂ R interval

M

k=1

f

λN

k

= #
λN

k ∈ I

= NI

counting function Wishart matrices (determinantal structure)

SLIDE 126

non-Lipschitz functions f typically f = 1I, I ⊂ R interval

M

k=1

f

λN

k

= #
λN

k ∈ I

= NI

counting function Wishart matrices (determinantal structure) I interval in (a, b) 1 √log M

NI − E(NI)
→ G

Gaussian variable

SLIDE 127

non-Lipschitz functions f typically f = 1I, I ⊂ R interval

M

k=1

f

λN

k

= #
λN

k ∈ I

= NI

counting function Wishart matrices (determinantal structure) I interval in (a, b) 1 √log M

NI − E(NI)
→ G

Gaussian variable exponential tail inequalities P

NI − E(NI) ≥ t
≤ C e−ct log(1+t/ log M),

t ≥ 0

SLIDE 128

non-Lipschitz functions f typically f = 1I, I ⊂ R interval

M

k=1

f

λN

k

= #
λN

k ∈ I

= NI

counting function Wishart matrices (determinantal structure) I interval in (a, b) 1 √log M

NI − E(NI)
→ G

Gaussian variable exponential tail inequalities P

NI − E(NI) ≥ t
≤ C e−ct log(1+t/ log M),

t ≥ 0 Var

NI
= O(log M)

SLIDE 129

non-Gaussian covariance matrices comparison with Wishart model

SLIDE 130

non-Gaussian covariance matrices comparison with Wishart model partial results localization results

L. Erd¨
s, H.-T. Yau (2009-12)

Lindeberg comparison method

T. Tao, V. Vu (2010-11)

SLIDE 131

non-Gaussian covariance matrices comparison with Wishart model partial results localization results

L. Erd¨
s, H.-T. Yau (2009-12)

Lindeberg comparison method

T. Tao, V. Vu (2010-11)

Var

NI
= O(log M)
S. Dallaporta, V. Vu (2011)

SLIDE 132

non-Gaussian covariance matrices comparison with Wishart model partial results localization results

L. Erd¨
s, H.-T. Yau (2009-12)

Lindeberg comparison method

T. Tao, V. Vu (2010-11)

Var

NI
= O(log M)
S. Dallaporta, V. Vu (2011)

P

NI − E(NI) ≥ t
≤ C e−ctδ,

t ≥ C log M, 0 < δ ≤ 1

T. Tao, V. Vu (2012)

SLIDE 133

non-Lipschitz functions f typically f = 1I, I ⊂ R interval

M

k=1

f

λN

k

= #
λN

k ∈ I

= NI

counting function Wishart matrices (determinantal structure) I interval in (a, b) 1 √log M

NI − E(NI)
→ G

Gaussian variable exponential tail inequalities P

NI − E(NI) ≥ t
≤ C e−ct log(1+t/ log M),

t ≥ 0 Var

NI
= O(log M)

SLIDE 134

two main questions and objectives tail inequalities for the spectral measure P M

k=1

f ( λN

k ) ≥ t

tail inequalities for the extremal eigenvalues

P λN

M ≥ b(ρ) + ε

Wishart matrices

more general covariance matrices

SLIDE 135

two main questions and objectives tail inequalities for the spectral measure P M

k=1

f ( λN

k ) ≥ t

tail inequalities for the extremal eigenvalues

P λN

M ≥ b(ρ) + ε

Wishart matrices

more general covariance matrices

SLIDE 136

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M2/3 λN

M − b(ρ)

→ C(ρ) FTW

M ∼ ρ N

SLIDE 137

extremal eigenvalues largest eigenvalue λN

M = max1≤k≤M λN k

λN

M = λN M

N → b(ρ) =

1 + √ρ

2 M ∼ ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M2/3 λN

M − b(ρ)

→ C(ρ) FTW

FTW

C. Tracy, H. Widom (1994) distribution
K. Johansson (2000), I. Johnstone (2001)

SLIDE 138

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M2/3 λN

M − b(ρ)

→ C(ρ) FTW

M ∼ ρ N

SLIDE 139

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M2/3 λN

M − b(ρ)

→ C(ρ) FTW

M ∼ ρ N finite M inequalities

SLIDE 140

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M2/3 λN

M − b(ρ)

→ C(ρ) FTW

M ∼ ρ N finite M inequalities at the (mean)1/3 rate reflecting the tails of FTW

SLIDE 141

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M2/3 λN

M − b(ρ)

→ C(ρ) FTW

M ∼ ρ N finite M inequalities at the (mean)1/3 rate reflecting the tails of FTW bounds on Var( λN

M)

SLIDE 142

measure concentration tool

SLIDE 143

measure concentration tool (Gaussian) Wishart matrix Y Y t

SLIDE 144

measure concentration tool (Gaussian) Wishart matrix Y Y t λN

M = max 1≤k≤M λN k = sup |v|=1

|Y v|2 sN

M =

λN

M

Lipschitz of the Gaussian entries Yij

SLIDE 145

measure concentration tool (Gaussian) Wishart matrix Y Y t λN

M = max 1≤k≤M λN k = sup |v|=1

|Y v|2 sN

M =

λN

M

Lipschitz of the Gaussian entries Yij Gaussian concentration P

sN

M ≥ E

sN

M

+ t
≤ e−M t2/C,

t ≥ 0

SLIDE 146

measure concentration tool (Gaussian) Wishart matrix Y Y t λN

M = max 1≤k≤M λN k = sup |v|=1

|Y v|2 sN

M =

λN

M

Lipschitz of the Gaussian entries Yij Gaussian concentration P

sN

M ≥ E

sN

M

+ t
≤ e−M t2/C,

t ≥ 0 E( sN

M) ∼

b(ρ)

SLIDE 147

measure concentration tool (Gaussian) Wishart matrix Y Y t λN

M = max 1≤k≤M λN k = sup |v|=1

|Y v|2 sN

M =

λN

M

Lipschitz of the Gaussian entries Yij Gaussian concentration P

sN

M ≥ E

sN

M

+ t
≤ e−M t2/C,

t ≥ 0 E( sN

M) ∼

b(ρ)

correct large deviation bounds (t ≥ 1)

SLIDE 148

measure concentration tool (Gaussian) Wishart matrix Y Y t λN

M = max 1≤k≤M λN k = sup |v|=1

|Y v|2 sN

M =

λN

M

Lipschitz of the Gaussian entries Yij Gaussian concentration P

sN

M ≥ E

sN

M

+ t
≤ e−M t2/C,

t ≥ 0 E( sN

M) ∼

b(ρ)

does not fit the small deviation regime t = s M−2/3

SLIDE 149

extreme eigenvalues alternate tools

SLIDE 150

extreme eigenvalues alternate tools Riemann-Hilbert analysis (Wishart matrices) tri-diagonal representations (Wishart and β-ensembles) moment methods (Wishart and non-Gaussian matrices)

SLIDE 151

extreme eigenvalues alternate tools Riemann-Hilbert analysis (Wishart matrices) tri-diagonal representations (Wishart and β-ensembles) moment methods (Wishart and non-Gaussian matrices)

SLIDE 152

M2/3 λN

M − b(ρ)

→ C(ρ) FTW

P

λN

M ≤ b(ρ) + s M−2/3

→ FTW(C s)

SLIDE 153

M2/3 λN

M − b(ρ)

→ C(ρ) FTW

P

λN

M ≤ b(ρ) + s M−2/3

→ FTW(C s) bounds for Wishart matrices tri-diagonal representation

B. Rider, M. L. (2010)

SLIDE 154

M2/3 λN

M − b(ρ)

→ C(ρ) FTW

P

λN

M ≤ b(ρ) + s M−2/3

→ FTW(C s) bounds for Wishart matrices tri-diagonal representation

B. Rider, M. L. (2010)

P λN

M ≥ b(ρ) + ǫ

≤ C e−Mε3/2/C,

0 < ε ≤ 1

SLIDE 155

M2/3 λN

M − b(ρ)

→ C(ρ) FTW

P

λN

M ≤ b(ρ) + s M−2/3

→ FTW(C s) bounds for Wishart matrices tri-diagonal representation

B. Rider, M. L. (2010)

P λN

M ≥ b(ρ) + ǫ

≤ C e−Mε3/2/C,

0 < ε ≤ 1 P λN

M ≤ b(ρ) − ǫ

≤ C e−Mε3/C,

0 < ε ≤ b(ρ)

SLIDE 156

P

λN

M ≤ b(ρ) + s M−2/3

→ FTW(C s) bounds for Wishart matrices P λN

M ≥ b(ρ) + ǫ

≤ C e−Mε3/2/C,

0 < ε ≤ 1 P λN

M ≤ b(ρ) − ǫ

≤ C e−Mε3/C,

0 < ε ≤ b(ρ)

SLIDE 157

P

λN

M ≤ b(ρ) + s M−2/3

→ FTW(C s) bounds for Wishart matrices P λN

M ≥ b(ρ) + ǫ

≤ C e−Mε3/2/C,

0 < ε ≤ 1 P λN

M ≤ b(ρ) − ǫ

≤ C e−Mε3/C,

0 < ε ≤ b(ρ) fit the Tracy-Widom asymptotics (ε = s M−2/3)

SLIDE 158

P

λN

M ≤ b(ρ) + s M−2/3

→ FTW(C s) bounds for Wishart matrices P λN

M ≥ b(ρ) + ǫ

≤ C e−Mε3/2/C,

0 < ε ≤ 1 P λN

M ≤ b(ρ) − ǫ

≤ C e−Mε3/C,

0 < ε ≤ b(ρ) fit the Tracy-Widom asymptotics (ε = s M−2/3) 1 − FTW(s) ∼ e−s3/2/C (s → +∞) FTW(s) ∼ e−s3/C (s → −∞)

SLIDE 159

P

λN

M ≤ b(ρ) + s M−2/3

→ FTW(C s) bounds for Wishart matrices P λN

M ≥ b(ρ) + ǫ

≤ C e−Mε3/2/C,

0 < ε ≤ 1 P λN

M ≤ b(ρ) − ǫ

≤ C e−Mε3/C,

0 < ε ≤ b(ρ) fit the Tracy-Widom asymptotics (ε = s M−2/3) 1 − FTW(s) ∼ e−s3/2/C (s → +∞) FTW(s) ∼ e−s3/C (s → −∞) Var( λN

M) = O

1

M4/3

SLIDE 160

M2/3 λN

M − b(ρ)

→ C(ρ) FTW

b(ρ) =

1 + √ρ

2

λN

M = λN M/N,

M = M(N) ∼ ρ N ( √ MN)1/3 ( √ M + √ N)4/3

λN

M − (

√ M + √ N)2 → FTW

SLIDE 161

M2/3 λN

M − b(ρ)

→ C(ρ) FTW

b(ρ) =

1 + √ρ

2

λN

M = λN M/N,

M = M(N) ∼ ρ N ( √ MN)1/3 ( √ M + √ N)4/3

λN

M − (

√ M + √ N)2 → FTW N + 1 ≥ M 0 < ε ≤ 1 P

λN

M ≥ (

√ M + √ N)2(1 + ε)

≤ C e−

√ MN ε3/2( 1

√ε ∧

M

N

1/4

)/C

P

λN

M ≤ (

√ M + √ N)2(1 − ε)

≤ C e−MN ε3( 1

ε ∧

M

N

1/2

)/C

SLIDE 162

bi and tri-diagonal representation

SLIDE 163

bi and tri-diagonal representation B =            χN · · · · · ·

χ(M−1)

χN−1 · · · . . .

χ(M−2)

χN−3 ... . . . . . . ... ... ... . . . · · · ...

χ2

χN−M+2 · · · · · ·

χ1

χN−M+1            χ(N−1), . . . , χ1,

χ(M−1), . . . ,

χ1 independent chi-variables

SLIDE 164

bi and tri-diagonal representation B =            χN · · · · · ·

χ(M−1)

χN−1 · · · . . .

χ(M−2)

χN−3 ... . . . . . . ... ... ... . . . · · · ...

χ2

χN−M+2 · · · · · ·

χ1

χN−M+1            χ(N−1), . . . , χ1,

χ(M−1), . . . ,

χ1 independent chi-variables B Bt same spectrum as Y Y t (Wishart)

SLIDE 165

bi and tri-diagonal representation B =            χN · · · · · ·

χ(M−1)

χN−1 · · · . . .

χ(M−2)

χN−3 ... . . . . . . ... ... ... . . . · · · ...

χ2

χN−M+2 · · · · · ·

χ1

χN−M+1            χ(N−1), . . . , χ1,

χ(M−1), . . . ,

χ1 independent chi-variables B Bt same spectrum as Y Y t (Wishart)

H. Trotter (1984), A. Edelman, I. Dimitriu (2002)

SLIDE 166

bi and tri-diagonal representation B =            χN · · · · · ·

χ(M−1)

χN−1 · · · . . .

χ(M−2)

χN−3 ... . . . . . . ... ... ... . . . · · · ...

χ2

χN−M+2 · · · · · ·

χ1

χN−M+1            χ(N−1), . . . , χ1,

χ(M−1), . . . ,

χ1 independent chi-variables B Bt same spectrum as Y Y t (Wishart)

H. Trotter (1984), A. Edelman, I. Dimitriu (2002)

extension to β-ensembles

SLIDE 167

bounds for non-Gaussian entries moment method E

Tr
(YY t)p
O. Feldheim, S. Sodin (2010)

SLIDE 168

bounds for non-Gaussian entries moment method E

Tr
(YY t)p
O. Feldheim, S. Sodin (2010)

largest eigenvalue (symmetric, subGaussian entries) P λN

M ≥ b(ρ) + ε

≤ C e−M ε3/2/C,

0 < ε ≤ 1

SLIDE 169

bounds for non-Gaussian entries moment method E

Tr
(YY t)p
O. Feldheim, S. Sodin (2010)

largest eigenvalue (symmetric, subGaussian entries) P λN

M ≥ b(ρ) + ε

≤ C e−M ε3/2/C,

0 < ε ≤ 1 below the mean ?

SLIDE 170

bounds for non-Gaussian entries moment method E

Tr
(YY t)p
O. Feldheim, S. Sodin (2010)

largest eigenvalue (symmetric, subGaussian entries) P λN

M ≥ b(ρ) + ε

≤ C e−M ε3/2/C,

0 < ε ≤ 1 below the mean ? necessary for variance bounds

SLIDE 171

variance level Var( λN

M) = O

1

M4/3

S. Dallaporta (2012)

SLIDE 172

variance level Var( λN

M) = O

1

M4/3

S. Dallaporta (2012)

comparison with Wishart model localization results L. Erd¨

s, H.-T. Yau (2009-12)

Lindeberg comparison method T. Tao, V. Vu (2010-11)

SLIDE 173

smallest eigenvalue soft edge M = M(N) ∼ ρ N, ρ < 1 a(ρ) =

1 − √ρ

2

SLIDE 174

smallest eigenvalue soft edge M = M(N) ∼ ρ N, ρ < 1 a(ρ) =

1 − √ρ

2 P λN

1 ≤ a(ρ) − ε

≤ C e−M ε3/2/C,

0 < ε ≤ 1 P λN

1 ≥ a(ρ) + ε

≤ C e−M ε3/C,

0 < ε ≤ a(ρ) Wishart matrices

B. Rider, M. L. (2010)

SLIDE 175

smallest eigenvalue hard edge M = N, ρ = 1 a(ρ) =

1 − √ρ

2 = 0

SLIDE 176

smallest eigenvalue hard edge M = N, ρ = 1 a(ρ) =

1 − √ρ

2 = 0 P

λN

1 ≤ ε

N2

≤ C √ε + C e−cN

large families of covariance matrices

M. Rudelson, R. Vershynin (2008-10)