[PPT] - Information Recovery from Pairwise Measurements A Shannon-Theoretic PowerPoint Presentation

SLIDE 1

Information Recovery from Pairwise Measurements

A Shannon-Theoretic Approach

Yuxin Chen†, Changho Suh∗, Andrea Goldsmith†

Stanford University† KAIST∗

Page 1

SLIDE 2

Recovering data from correlation measurements

A large collection of data instances
In many applications, it is
difficult/infeasible to measure each variable directly
feasible to measure pairwise correlation

Page 2

SLIDE 3

Motivating application: multi-image alignment

Structure from motion: estimate 3D structures from 2D image sequences
Key step: joint alignment

– input: (noisy) estimates of relative camera poses – goal: jointly recover all camera poses

Page 3

SLIDE 4

Motivating application: graph clustering

Real-world networks exhibit community structures
input: pairwise similarities between members
goal: uncover hidden clusters

Page 4

SLIDE 5

This talk: recovery from pairwise difference measurements

Goal: recover a collection of variables {xi}
Can only measure several pairwise difference xi − xj (broadly defined)

Page 5

SLIDE 6

This talk: recovery from pairwise difference measurements

Goal: recover a collection of variables {xi}
Can only measure several pairwise difference xi − xj (broadly defined)
Examples:

— joint alignment

– xi: (angle θi, position zi) – relative rotation/translation (θi − θj, zi − zj)

Page 5

SLIDE 7

This talk: recovery from pairwise difference measurements

Goal: recover a collection of variables {xi}
Can only measure several pairwise difference xi − xj (broadly defined)
Examples:

— joint alignment

– xi: (angle θi, position zi) – relative rotation/translation (θi − θj, zi − zj)

— graph partition

– xi: membership (which partition it belongs to) – cluster agreement: xi − xj =

1,

if i, j ∈ same partition 0, else.

Page 5

SLIDE 8

This talk: recovery from pairwise difference measurements

Goal: recover a collection of variables {xi}
Can only measure several pairwise difference xi − xj (broadly defined)
Examples:

— joint alignment

– xi: (angle θi, position zi) – relative rotation/translation (θi − θj, zi − zj)

— graph partition

– xi: membership (which partition it belongs to) – cluster agreement: xi − xj =

1,

if i, j ∈ same partition 0, else.

— pairwise maps, parity reads, ...

Page 5

SLIDE 9

A fundamental-limit perspective?

A flurry of activity in recovery algorithm design

convex program combinatorial spectral method

Page 6

SLIDE 10

A fundamental-limit perspective?

A flurry of activity in recovery algorithm design

convex program combinatorial spectral method

What are the fundamental recovery limits?

— min. sample complexity? how noisy the measurements can be?

Page 6

SLIDE 11

A fundamental-limit perspective?

A flurry of activity in recovery algorithm design

convex program combinatorial spectral method

What are the fundamental recovery limits?

— min. sample complexity? how noisy the measurements can be?

So far mostly studied in a model-specific manner
Seek a more unified framework

Page 6

SLIDE 12

xi ∈ {0, · · · , M − 1} x3 x1 x4 x5 x7 x2 x6 y26 y

Information network
n vertices
discrete inputs w/ alphabet size:

M — could scale with n

Problem setup: a Shannon-theoretic framework

Page 7

SLIDE 13

measurement graph G measurements of x1 − x2, x1 − x3, x1 − x5, · · ·

x3 x1 x4 x5 x7 x2 x6 y12 y26 y67 y27 y13 y34 y35 y15 y24

Pairwise difference measurements
truth: xi − xj
measurements: yij (arbitrary alphabet)

∗ can be corrupted by noise, distortion, ...

Graphical representation
observe yij

⇐ ⇒ (i, j) ∈ G

Problem setup: a Shannon-theoretic framework

Page 7

SLIDE 14

x1

x2

x6

x7

x1

x5

x2

x7

y12 y15 y27 y67 p (yij

| xi-xj )

x3 x1 x4 x5 x7 x2 x6

channel channel channel channel

Channel-decoding perspective
each measurement is modeled by an i.i.d. channel
transition prob. P(yij | xi − xj)

Problem setup: a Shannon-theoretic framework

Page 7

SLIDE 15

x1

x2

x6

x7

x1

x5

x2

x7

y12 y15 y27 y67 p (yij

| xi-xj )

x3 x1 x4 x5 x7 x2 x6

channel channel channel channel

Goal: recover {xi} exactly (up to global offset)
Unified framework for decoding model
capture similarities among various applications

Problem setup: a Shannon-theoretic framework

Page 7

SLIDE 16

x1 x2 channel channel

x1 − x2 = 1 x1 − x2 = 2

y12 ∼ P1 y12 ∼ P2 Pl := P( yij | xi − xj = l)

Channel distance/resolution
Captured by

KL( Pl Pk)

r

Hellinger( Pl Pk)

r

...

What factors dictate hardness of recovery?

Page 8

SLIDE 17

x1 x2 channel channel

x1 − x2 = 1 x1 − x2 = 3

y12 ∼ P1 y12 ∼ P3 Pl := P( yij | xi − xj = l)

Minimum channel distance/resolution

minl=k KL( Pl Pk) := KLmin

r

minl=k Hellinger( Pl Pk) := Hellingermin

r

...

Uncoded input

What factors dictate hardness of recovery?

Page 8

SLIDE 18

Impossible to recover isolated vertices

measurement graph G

What factors dictate hardness of recovery?

Graph connectivity

Page 9

SLIDE 19

Over-sparse connectivity is fragile

measurement graph G

What factors dictate hardness of recovery?

Graph connectivity

Page 9

SLIDE 20

measurement graph G

Sufficient connectivity removes fragility!

What factors dictate hardness of recovery?

Graph connectivity

Page 9

SLIDE 21

Agenda

Page 10

SLIDE 22

Main result: Erdos-Renyi random graph

Erdos-Renyi graph G(n, pobs). Each edge (i, j) is present independently w.p. pobs

(pobs = 1) (pobs = 0.3)

Page 11

SLIDE 23

Main result: Erdos-Renyi random graph

Erdos-Renyi graph G(n, pobs). Each edge (i, j) is present independently w.p. pobs

(pobs = 1) (pobs = 0.3)

ML decoding works if

Hellingermin > 2 log n + 4 log M pobsn

Page 11

SLIDE 24

Main result: Erdos-Renyi random graph

Erdos-Renyi graph G(n, pobs). Each edge (i, j) is present independently w.p. pobs

(pobs = 1) (pobs = 0.3)

ML decoding works if

Hellingermin > 2 log n + 4 log M pobsn

Converse: no method works if

KLmin < log n pobsn

Page 11

SLIDE 25

           non-asymptotic!

Main result: Erdos-Renyi random graph

Erdos-Renyi graph G(n, pobs). Each edge (i, j) is present independently w.p. pobs

(pobs = 1) (pobs = 0.3)

ML decoding works if

Hellingermin > 2 log n + 4 log M pobsn

Converse: no method works if

KLmin < log n pobsn

Page 11

SLIDE 26

Main result: Erdos-Renyi random graph

(pobs = 1) (pobs = 0.3)

Page 12

SLIDE 27

Main result: Erdos-Renyi random graph

(pobs = 1) (pobs = 0.3)

In the hard regime where dPl

dPk ≈ 1:

KLmin ≈ 2 · Hellingermin

Page 12

SLIDE 28

Recovery conditions

ML works if Hellingermin > 2 log n + 4 log M pobsn Impossible if Hellingermin < log n 2pobsn

Main result: Erdos-Renyi random graph

(pobs = 1) (pobs = 0.3)

In the hard regime where dPl

dPk ≈ 1:

KLmin ≈ 2 · Hellingermin

Page 12

SLIDE 29

Fundamental recovery condition (assuming M poly(n))

Hellingermin log n pobsn

Main result: Erdos-Renyi random graph

(pobs = 1) (pobs = 0.3)

In the hard regime where dPl

dPk ≈ 1:

KLmin ≈ 2 · Hellingermin

Page 12

SLIDE 30

Fundamental recovery condition (assuming M poly(n))

Hellingermin log n pobsn ⇐ ⇒ avg-degree × Hellingermin log n

Main result: Erdos-Renyi random graph

(pobs = 1) (pobs = 0.3)

In the hard regime where dPl

dPk ≈ 1:

KLmin ≈ 2 · Hellingermin

Page 12

SLIDE 31

Fundamental recovery condition (Erdos-Renyi graphs). avg-degree × Hellingermin log n [xi−xj]1≤i,j≤n

1 2 3 4 5 6 7 8 9

Intuition

Page 13

SLIDE 32

Fundamental recovery condition (Erdos-Renyi graphs). avg-degree × Hellingermin log n [xi−xj]1≤i,j≤n hypotheses:

H0: x = [0, 0, · · · , 0] H1: x = [1, 0, · · · , 0]

H0 and H1 differ only at the highlighted region (≈ avg-degree pieces of info)

Intuition

Page 13

SLIDE 33

Fundamental recovery condition (Erdos-Renyi graphs). avg-degree × Hellingermin log n [xi−xj]1≤i,j≤n hypotheses:

H0: x = [0, 0, · · · , 0] H2: x = [0, 1, · · · , 0]

H0 and H2 differ only at the highlighted region (≈ avg-degree pieces of info)

Intuition

Page 13

SLIDE 34

Fundamental recovery condition (Erdos-Renyi graphs). avg-degree × Hellingermin log n (1) [xi−xj]1≤i,j≤n hypotheses:

H0: x = [0, 0, · · · , 0] Hn: x = [0, 0, · · · , 1]

n minimally-separated hypotheses

⇒ needs at least log n bits

the consequence of uncoded inputs

Intuition

Page 13

SLIDE 35

Minimal sample complexity

Fundamental recovery condition (Erdos-Renyi graphs). avg-degree × Hellingermin log n

Sample complexity:

n · avg-degree

Page 14

SLIDE 36

Minimal sample complexity

Fundamental recovery condition (Erdos-Renyi graphs). avg-degree × Hellingermin log n

Sample complexity:

n · avg-degree Min sample complexity ≍ n log n Hellingermin

Page 14

SLIDE 37

How general this limit is?

Fundamental recovery condition (Erdos-Renyi graphs). avg-degree × Hellingermin log n

Can we go beyond Erdos-Renyi graphs?

Page 15

SLIDE 38

Main results: homogeneous graphs

random geometric graph (generalized) ring

Page 16

SLIDE 39

Homogeneous graphs:
min-degree ≍ max-degree ≍ mincut
balanced cut-set distributions

Main results: homogeneous graphs

random geometric graph (generalized) ring

Fundamental recovery condition (various homogeneous graphs). avg-degree × Hellingermin log n

Page 16

SLIDE 40

Depend almost only on graph sparsity

Main results: homogeneous graphs

random geometric graph (generalized) ring

Fundamental recovery condition (various homogeneous graphs). avg-degree × Hellingermin log n

Page 16

SLIDE 41

Main results: general graphs

mincut = 5

Page 17

SLIDE 42

Information across the minimum cut set:

mincut · Hellingermin

Main results: general graphs

mincut = 5

Page 17

SLIDE 43

Recovery conditions

ML works if mincut · Hellingermin τ cut + log n + log M Impossible if mincut · Hellingermin τ cut + mincut max-degree log n

Main results: general graphs

mincut = 5

Page 17

SLIDE 44

Cut-homogeneity exponent

τ cut captures1
growth rate of the cut-set distribution
the ratio

mincut avg-degree

1 τcut := maxk 1 k |N (k · mincut)|, where N (K) := |{cut : cut-size ≤ K}|

Page 18

SLIDE 45

Cut-homogeneity exponent

τ cut captures1
growth rate of the cut-set distribution
the ratio

mincut avg-degree

In general: τ cut 1
For homogeneous graphs: τ cut log n

1 τcut := maxk 1 k |N (k · mincut)|, where N (K) := |{cut : cut-size ≤ K}|

Page 18

SLIDE 46

mincut × Hellingermin (1 ∼ log n) gap log n avg-deg × Hellingermin log n gap ≍ 1 avg-deg × Hellingermin log n gap ≤ 4(1 + 2 log M

log n )

Summary of main results

Page 19

SLIDE 47

Concrete application: stochastic block model

Stochastic block model:
2 clusters
edge densities:

— within-cluster: p = α log n

n

— across-cluster: q = β log n

n

(q < p)

adjacency matrix

Page 20

SLIDE 48

Concrete application: stochastic block model

Stochastic block model:
2 clusters
edge densities:

— within-cluster: p = α log n

n

— across-cluster: q = β log n

n

(q < p)

adjacency matrix

Our theory:

feasible if √α −

β >

√ 2 impossible if √α −

β < 1/2
Fundamental limit (Abbe et al. and Mossel et al.): √α − √β >

√ 2

Page 20

SLIDE 49

Concluding remarks

A unified framework to determine recovery limits
Interplay between IT and graph theory
Tighten the pre-constants?

Arxiv: http://arxiv.org/abs/1504.01369

Page 21