Self-bounding functions and concentration of variance Andreas - - PowerPoint PPT Presentation

▶

Feb 29, 2024 454 likes •624 views

Self-bounding functions and concentration of variance Andreas Maurer Advances in stochastic inequalities and their applications, BIRS 2009 Notation and denitions := Q n k =1 k is some product space with product probability = n

SLIDE 1

Self-bounding functions and concentration of variance

Andreas Maurer Advances in stochastic inequalities and their applications, BIRS 2009

SLIDE 2

Notation and de…nitions

:= Qn

k=1 k is some product space with product probability = n k=1k.

for x 2 write xy;k :=

x1; :::; xk1; y; xk+1; :::; xn .

f : ! R is some generic function and bded below For 1 k n de…ne functions infk f, Df : ! R by inf

k f (x)

: = inf

y2k

f

xy;k
Df (x)

: =

X

k=1

f (x) inf

k f (x)

2 : Df is a local measure of the sensitivity of f to modi…cations of individual arguments.

SLIDE 3

Theorem 1 Boucheron, Lugosi, Massart (2003), Maurer (2006) Pr ff E [f] tg exp t2 2 kDfk1

!

: If also 8k; f infk f 1 a.s. then Pr fE [f] f tg exp t2 2 kDfk1 + 2t=3

!

: Applies to convex Lipschitz functions, eigenvalues of random symmetric matri- ces, shortest TSP’s...

SLIDE 4

Theorem 2 Boucheron, Lugosi, Massart (2003), Maurer (2006) Suppose Df af a.s., with a > 0; Then Pr ff E [f] tg exp t2 2aE [f] + at

!

: If also 8k; f infk f 1 a.s. and a 1 then Pr fE [f] f tg exp t2 2aE [f]

!

: This talk is about applications of this result.

SLIDE 5

Application 1 Amendment to Theorem 1, idea from Boucheron, Lugosi, Massart (2009) If f 0 and f2 infk f2 1, then Pr fE [f] f tg exp t2 8 kDfk1

!

: Proof: D

=

X

f2 inf

k f2

2 =

X

f inf

k f

2 f + inf

k f

2

(Df) (2f)2 4 kDfk1 f2

so by Theorem 2 applied to f2 Pr fE [f] f tg Pr

n

E

h

f2i f2 E [f] t

t2 8 kDfk1

!

SLIDE 6

Application 2 (with Massi Pontil for COLT09): X; X1; :::; Xn iid r.v. with values in [0; 1]. Want to give bounds on EX in terms of X = (X1; :::; Xn) with high con…dence 1 . Hoe¤ding: Pr

8 < :EX

X

s

ln 1= 2n

9 = ; 1 :

Bernstein/Bennett: Pr

8 < :EX

X p V

s

2 ln 1= n + ln 1= 3n

9 = ; 1 :

To use Bernstein without other information we need a bound on the standard deviation p V in terms of sample.

SLIDE 7

Estimators for variance and standard deviation For the variance use the sample variance ^ V ^ V (x) = 1 2n (n 1)

X

i;j

xi xj

2 for x 2 [0; 1]n

For the standard deviation we use

p ^

V . Then we can show this: f := n ^ V satis…es f inf

k f 1 and Df

n n 1f; and Theorem 2 gives the lower tail bounds Pr

n

V ^ V > t

(n 1) t2 2V

!

, and Pr

p

V

q

^ V > t

(n 1) t2 2

!

:

SLIDE 8

Other methods to get such bounds Audibert, Munos, Szepesvári (2007): Apply Bernstein-like bounds to Xi, Xi and (Xi EX)2 respectively, combine to get Pr

p

V

q

^ Vemp > t

3 exp

nt2 3:24

!

; where ^ Vemp = (n 1) ^ V =n (=variance of empirical distribution). Alternative: ^ V is U-statistic with kernel q

x; x0 = x x02 =2.

Hoe¤dings version of Bennett’s inequality for U-statistics leads to Pr

p

V

q

^ V > t

(n 1) t2 2:62

!

:

SLIDE 9

Empirical Bernstein bounds Substitution of above in Bernstein’s inequality gives empirical version: Pr

8 < :EX

X

q

^ V

s

2 ln 2= n + 7 ln 2= 3 (n 1)

9 = ; 1 :

Applications: Multi-armed bandit problem (Audibert, Munos, Szepesvári, 2007), stopping algorithms (Mnih, Szepesvári, Audibert, 2008), sample variance pe- nalization (Pontil, Maurer, 2009).

SLIDE 10

Application 3 (Largest eigenvalue of the Gramian):

X = (X1; :::; Xn) indep. r.v. distributed in unit ball B of Hilbert space H:

G (x)ij =

D

xi; xj

E

, f (x) = max (x) = largest eigenvalue of G (x) : By Weyls monotonicity infk f (x) = f

x0;k
.

Also 9u 2 Rn; kukRn = 1, such that f (x) f

x0;k
X

uixi

i6=k

uixi

=

*

ukxk;

X

uixi +

X

i6=k

uixi

+

2 jukj
X

uixi

= 2 jukj

q

f (x): Conclusion1: f infk f 1 Conclusion2: Square and sum over k to get Df 4f

SLIDE 11

Application 3 (Largest eigenvalue of the Gramian):

X = (X1; :::; Xn) indep. r.v. distributed in unit ball B of Hilbert space H:

G (x)ij =

D

xi; xj

E

, f (x) = max (x) = largest eigenvalue of G (x) : From Theorem 2 we get Pr fmax Emax > tg

t2 8Emax + 4t

!

Pr fEmax max > tg

t2 8Emax

!

For the largest singular value of the matrix X we get Pr f (max Emax) > tg et2=8:

SLIDE 12

Another result related to self-bounded functions: Theorem 3 Suppose f; g : ! R, 0 f g and Df ag and Dg ag and a 1 Then Pr ff Ef > tg exp t2 4aEg + 3at=2

!

If also f infk f 1 Pr fEf f > tg exp t2 4aEg + at

!

SLIDE 13

Application 4 (any eigenvalue of the Gramian)

X = (X1; :::; Xn) indep. r.v. distributed in unit ball B of Hilbert space H:

G (X)ij =

D

Xi; Xj

E

, now let d (X) be any eigenvalue of G (X) Set f := d=2 and g = max=2. We can show 0 f g and f inf

k f 1 and Df 2g and Dg 2g:

Applying Theorem 3 gives Pr fd Ed > tg

t2 16Emax + 6t

!

Pr fEd d > tg

t2 16Emax + 4t

!

:

SLIDE 14

References

[1] J. Y. Audibert, R. Munos, C. Szepesvári. Exploration-exploitation trade-

¤ using variance estimates in multi-armed bandits, Theoretical Computer

Science, 2008. [2] S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities using the entropy method, Annals of Probability (2003) 31:1583-1614. [3] M. Ledoux, The Concentration of Measure Phenomenon, AMS Surveys and Monographs 89 (2001) [4] A. Maurer, Concentration inequalities for functions of independent vari-

ables. Random Structures Algorithms 29 121–138 2006

SLIDE 15

[5] Volodymyr Mnih, C. Szepesvári, J. Y. Audibert. Empirical Bernstein Stop-

ping. ICML 2008