Graphon Estimation: Minimax Rates and Posterior Contraction Chao Gao - - PowerPoint PPT Presentation

graphon estimation minimax rates and posterior contraction
SMART_READER_LITE
LIVE PREVIEW

Graphon Estimation: Minimax Rates and Posterior Contraction Chao Gao - - PowerPoint PPT Presentation

Graphon Estimation: Minimax Rates and Posterior Contraction Chao Gao Yale University @Leiden, March 2015 Stochastic Block Model z : { 1 , 2 , ..., n } ! { 1 , 2 , ..., k } A ij Bernoulli( ij ) ij = Q z ( i ) z ( j ) Goal: recover ij


slide-1
SLIDE 1

Graphon Estimation: Minimax Rates and Posterior Contraction

Chao Gao Yale University

@Leiden, March 2015

slide-2
SLIDE 2

Stochastic Block Model

z : {1, 2, ..., n} ! {1, 2, ..., k}

Aij ⇠ Bernoulli(θij)

θij = Qz(i)z(j)

Goal: recover θij

slide-3
SLIDE 3

Goal: recover θij

z1 : {1, 2, ..., n} ! {1, 2, ..., k} z2 : {1, 2, ..., m} ! {1, 2, ..., l}

E(Aij) = θij = Qz1(i)z2(j)

Biclustering (Hartigan, 1972)

slide-4
SLIDE 4

yi = f(xi) + ✏i

xi 2 D, ✏i ⇠ N(0, 1)

2 D ⇠ Common assumption: f is smooth on D.

Goal: recover f from both x and y

Nonparametric Regression

slide-5
SLIDE 5

yi = f(xi) + ✏i

xi 2 D, ✏i ⇠ N(0, 1)

2 D ⇠ Common assumption: f is smooth on D.

A More Challenging Problem

Goal: recover f from only y

slide-6
SLIDE 6
slide-7
SLIDE 7
  • 1D Problem
  • 2D Problem
  • Minimax Rate for Stochastic Block Model
  • Minimax Rate for Graphon Estimation
  • Adaptive Bayes Estimation
slide-8
SLIDE 8

1D Problem

inf

ˆ f

sup

f∈F

E 1 n

n

X

i=1

( ˆ f(xi) f(xi))2 ! ⇣ 1 n.

yi = f(xi) + ✏i, xi = i n, i = 1, 2, .., n

F = n f : f(x) = q1 for x 2 (0, 1/2], f(x) = q2 for x 2 (1/2, 1]

slide-9
SLIDE 9

1D Problem

yi = f(xi) + ✏i, xi = i n, i = 1, 2, .., n

Θ = {✓ : half ✓i is q1, half ✓i is q2}

yi = ✓i + ✏i.

Without observing x, the problem is equivalent to

inf

ˆ θ

sup

θ∈Θ

E 1 n

n

X

i=1

(ˆ ✓i ✓i)2 ! ⇣ 1.

slide-10
SLIDE 10

2D Problem

yij = f(⇠i, ⇠j) + ✏ij, ⇠i = i n, i, j = 1, 2, .., n

f(x, y) =                      q1 (x, y) 2 [0, 1/2) ⇥ [0, 1/2) q2 (x, y) 2 [0, 1/2) ⇥ [1/2, 1] q3 (x, y) 2 [1/2, 1] ⇥ [0, 1/2) q4 (x, y) 2 [1/2, 1] ⇥ [1/2, 1]

F collects f such that

slide-11
SLIDE 11

2D Problem

inf

ˆ f

sup

f∈F

E @ 1 n2 X

1≤i,j≤n

( ˆ f(⇠i, ⇠j) f(⇠i, ⇠j))2 1 A ⇣ 1 n2 .  F How about without knowing the design?

inf

ˆ f

sup

f∈F

E @ 1 n2 X

1≤i,j≤n

( ˆ f(⇠i, ⇠j) f(⇠i, ⇠j))2 1 A ⇣ 1 n.

slide-12
SLIDE 12

2D Problem

Let θij = f(ξi, ξj). Does θij have any structure?

{θi1, θi2, ..., θin} are from the same row for each i. {θ1j, θ2j, ..., θnj} are from the same column for each j.

slide-13
SLIDE 13

2D Problem

yij = f(⇠ij) + ✏ij, ⇠ij 2 [0, 1]2, i, j = 1, 2, .., n

{ } Without knowing the design?

inf

ˆ f

sup

f∈F

E @ 1 n2 X

1≤i,j≤n

( ˆ f(⇠ij) f(⇠ij))2 1 A ⇣ 1.

slide-14
SLIDE 14

Stochastic Block Model

Aij ⇠ Bernoulli(✓ij)

Θ2 = n θ : θij = Qz(i)z(j), with z : [n] ! [2]

  • inf

ˆ θ

sup

θ∈Θ2

E @ 1 n2 X

1≤i,j≤n

(ˆ ✓ij ✓ij)2 1 A ⇣ 1 n.

slide-15
SLIDE 15

Stochastic Block Model

Aij ⇠ Bernoulli(✓ij)

Θk = n θ : θij = Qz(i)z(j), with z : [n] ! [k]

  • Theorem 1.1. Under the stochastic block model, we have

inf

ˆ θ

sup

θ∈Θk

E 8 < : 1 n2 X

i,j∈[n]

(ˆ θij θij)2 9 = ; ⇣ k2 n2 + log k n , for any 1  k  n.

slide-16
SLIDE 16

Stochastic Block Model

n Let k ⇣ nδ, for δ 2 [0, 1].

k2 n2 + log k n ⇣ 8 > > > > > > > > > < > > > > > > > > > : n−2 δ = 0, k = 1, n−1 δ = 0, k > 1, n−1 log n δ 2 (0, 1/2], n−2(1−δ) δ 2 (1/2, 1].

slide-17
SLIDE 17

Graphon Estimation

Theorem (Aldous-Hoover). A random array {Aij} is jointly exchangeable in the sense that {Aij} d ={A(i)(j)} for all permutation , if and only if it can be represented as follows: there is a random function F : [0, 1]3 ! R such that Aij

d

=F(⇠i, ⇠j, ⇠ij), where {⇠i} and {⇠ij} are i.i.d. Unif[0, 1].

slide-18
SLIDE 18

Graphon Estimation

⇣ 2 When the graph is undirected and has no self-loop,

Aij|ξi, ξj ⇠ Bernoulli(θij), θij = f(ξi, ξj).

ξi ⇠ Unif(0, 1) i.i.d.

2 Goal: recover f.

slide-19
SLIDE 19

Graphon Estimation

Aij|ξi, ξj ⇠ Bernoulli(θij), θij = f(ξi, ξj).

(ξ1, ..., ξn) ⇠ Pξ

Assumption: f 2 Fα(M).

Theorem 1.2. Consider the H¨

  • lder class Fα(M), defined in Section 2.3. We have

inf

ˆ θ

sup

f∈Fα(M)

sup

ξ∼Pξ

E 8 < : 1 n2 X

i,j∈[n]

(ˆ θij θij)2 9 = ; ⇣ ( n− 2α

α+1 ,

0 < α < 1,

log n n ,

α 1. The expectation is jointly over {Aij} and {ξi}.

slide-20
SLIDE 20

Graphon Estimation

Proof:

min

k

⇢ 1 k2α + k2 n2 + log k n

slide-21
SLIDE 21

Lower Bound Proof

When 1 < k  O(1), the minimax rate is 1 n.  Sufficient to prove for k = 2.

slide-22
SLIDE 22

Lower Bound Proof

Proposition (Fano). Let (Θ, ⇢) be a metric space and {P✓ : ✓ 2 Θ} a collection of probability

  • measures. For any T ⇢ Θ, denote by M(✏, T, ⇢) the ✏-packing number of T w.r.t. ⇢. Define

the KL diameter of T by dKL(T) = sup

✓,✓02T

D(P✓||P✓0). Then inf

ˆ ✓

sup

✓2Θ

E✓⇢2 ⇣ ˆ ✓(X), ✓ ⌘ sup

✏>0

✏2 4 ✓ 1 dKL(T) + log 2 log M(✏, T, ⇢) ◆

slide-23
SLIDE 23
  • Construct a subset
  • Upper bound the KL-diameter
  • Lower bound the packing number

Lower Bound Proof

slide-24
SLIDE 24

Lower Bound Proof

T = ( {✓ij} ∈ [0, 1]n⇥n : ✓ij = 1 2 for (i, j) ∈ (S × S) ∪ (Sc × Sc), ✓ij = 1 2 + c √n for (i, j) ∈ (S × Sc) ∪ (Sc × S), with some S ∈ S ) .

S S S S S Sc S Sc 1 2 1 2 1 2 + c pn 1 2 + c pn

slide-25
SLIDE 25

Lower Bound Proof

S S S S S Sc S Sc 1 2 1 2 1 2 + c pn 1 2 + c pn

⇢2(✓, ✓0) = 1 n2 X

1i,jn

(✓ij ✓0

ij)2 = 2c2

n |IS IS0| n (n |IS IS0|) n .

slide-26
SLIDE 26

Lower Bound Proof

Construct a subset:

  • Upper bound the KL diameter
  • Lower bound the packing number

T ⇢ Θk

|| sup

θ,θ02T

D(Pθ||Pθ0) ≤ sup

θ,θ02T

8||✓ − ✓0||2 ≤ 8c2n.

slide-27
SLIDE 27

Lower Bound Proof

Lower bound the packing number

I −IS0| as the Hamming

  • pick S1, ..., SN ⊂ S

1 4n ≤ |IS − IS0| ≤ 3 4n,

s.t.

⇢2(✓, ✓0) = 2c2 n |IS IS0| n (n |IS IS0|) n c2 8n =: ✏2.

M(✏, T, ⇢) ≥ N ≥ exp(c1n)

slide-28
SLIDE 28

Lower Bound Proof

inf

ˆ θ

sup

θ∈Θ2

E @ 1 n2 X

1≤i,j≤n

(ˆ ✓ij ✓ij)2 1 A c2 32n ✓ 1 8c2n + log 2 c1n ◆ & 1 n.

slide-29
SLIDE 29

Upper Bound

∥ − ∥ Oracle solution When the clustering z is known, an obvious estimator ˆ θij = 1 |z−1(a)||z−1(b)|

  • (i,j)∈z−1(a)×z−1(b)

Aij, for (i, j) ∈ z−1(a) × z−1(b) achieves the rate ∥ˆ θ − θ∥2

F ≤ OP

  • k2

.

slide-30
SLIDE 30

Upper Bound

An equivalent form (least squares) Fixing the known z, then solve min

θ

∥A − θ∥2

F

s.t. θij = Qz(i)z(j) for some Q = QT ∈ [0, 1]k×k A natural estimator Solve min

θ

∥A − θ∥2

F

s.t. θij = Qz(i)z(j) for some Q = QT ∈ [0, 1]k×k and some z : {1, 2, ..., n} → {1, 2, ..., k}.

∥ˆ θ − θ∥2

F ≤ OP

  • k2 + n log k
slide-31
SLIDE 31

Bayes Estimation

  • 1. Sample k ⇠ ⇡.
  • 2. Sample z 2 {z : [n] ! [k]}.
  • 3. Sample Q ⇠ f.
  • 4. Let ✓ij = Qz(i)z(j).

uniform

⇡(k) / exp

  • D(k2 + n log k)
  • ?
slide-32
SLIDE 32

Bayes Estimation

  • 1. Sample k ⇠ ⇡.
  • 2. Sample z 2 {z : [n] ! [k]}.
  • 3. Sample Q ⇠ f.
  • 4. Let ✓ij = Qz(i)z(j).

uniform

⇡(k) / exp

  • D(k2 + n log k)
  • sdf

f(Q) = 1 2 ✓ k p⇡ ◆k2 Γ(k2/2) Γ(k2) ek||Q||

slide-33
SLIDE 33

Bayes Estimation

  • 1. Sample k ⇠ ⇡.
  • 2. Sample z 2 {z : [n] ! [k]}.
  • 3. Sample Q ⇠ f.
  • 4. Let ✓ij = Qz(i)z(j).

uniform

  • sdf

f(Q) = 1 2 ✓ k p⇡ ◆k2 Γ(k2/2) Γ(k2) ek||Q||

⇡(k) / Γ(k2) Γ(k2/2) exp

  • D(k2 + n log k)
slide-34
SLIDE 34

Bayes Estimation

  • 1. Sample k ⇠ ⇡.
  • 2. Sample z 2 {z : [n] ! [k]}.
  • 3. Sample Q ⇠ f.
  • 4. Let ✓ij = Qz(i)z(j).

uniform

⇡(k) / exp

  • D(k2 + n log k)
  • sdf

f(Q) = 1 2 ✓ λk pπ ◆k2 e−λk||Q||

slide-35
SLIDE 35

Bayes Estimation

2 ✓ π ◆ Theorem 1.3. Consider λk = β n k for some constant β > 0. Then Eθ∗Π @ 1 n2 X

i,j

(θij θ⇤

ij)2 > M

✓k2 n2 + log k n ◆

  • A

1 A  exp

  • C0

k2 + n log k

  • ,

for some constants M, C0 > 0.

slide-36
SLIDE 36

Gao, Chao, Yu Lu, and Harrison H. Zhou. "Rate-optimal Graphon Estimation." arXiv preprint arXiv:1410.5837 (2014).

Reference

slide-37
SLIDE 37

Thank you