Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong - - PowerPoint PPT Presentation

recovering a hidden hamiltonian cycle via linear
SMART_READER_LITE
LIVE PREVIEW

Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong - - PowerPoint PPT Presentation

Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong Wu Department of Statistics and Data Science Yale University Joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David Tse (Stanford) and Jiaming Xu (Purdue Duke)


slide-1
SLIDE 1

Recovering a Hidden Hamiltonian Cycle via Linear Programming

Yihong Wu

Department of Statistics and Data Science Yale University Joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David Tse (Stanford) and Jiaming Xu (Purdue → Duke)

Workshop on Local Algorithms, MIT, June 13, 2018

slide-2
SLIDE 2

Mathematical problem: Hidden Hamiltonian cycle model

  • Observe: a weighted undirected complete graph on n vertices with

weighted adjacency matrix W

  • Latent: a Hamiltonian cycle C∗
  • Edge weight

We

ind.

  • P

e ∈ C∗ Q e / ∈ C∗

Yihong Wu (Yale) Recovery Threshold for TSP LP 2

slide-3
SLIDE 3

Mathematical problem: Hidden Hamiltonian cycle model

  • Observe: a weighted undirected complete graph on n vertices with

weighted adjacency matrix W

  • Latent: a Hamiltonian cycle C∗
  • Edge weight

We

ind.

  • P

e ∈ C∗ Q e / ∈ C∗

  • Goal: observe W, recover C∗ with high probability

Yihong Wu (Yale) Recovery Threshold for TSP LP 2

slide-4
SLIDE 4

Mathematical problem: Hidden Hamiltonian cycle model

  • Observe: a weighted undirected complete graph on n vertices with

weighted adjacency matrix W

  • Latent: a Hamiltonian cycle C∗
  • Edge weight

We

ind.

  • P

e ∈ C∗ Q e / ∈ C∗

  • Goal: observe W, recover C∗ with high probability

Remarks:

  • P, Q depends on the graph size n
  • For this talk, Q = N(0, 1) and P = N(µ, 1), so that

W = µ · adj matrix of C∗

  • “signal”

+noise

  • Hidden Hamiltonian cycle planted in Erd¨
  • s-R´

enyi graph

[Broder-Frieze-Shamir ’94]

Yihong Wu (Yale) Recovery Threshold for TSP LP 2

slide-5
SLIDE 5

Link information in Chicago datasets

1 Reconstitute chromatin in vitro upon naked DNA 2 Produce cross-links by fixing chromatin with formaldehyde

Chicago datasets generate cross-links among contigs [Putnam et al. ’16 ]

On average more cross-links exist between adjacent contigs

Yihong Wu (Yale) Recovery Threshold for TSP LP 3

slide-6
SLIDE 6

Ordering DNA contigs with Chicago cross-links

DNA Scaffolding

Yihong Wu (Yale) Recovery Threshold for TSP LP 4

slide-7
SLIDE 7

Ordering DNA contigs with Chicago cross-links

DNA Scaffolding Reduces to traveling salesman problem (TSP) Find a path (tour) that visits every contig exactly once with the maximum number of cross-links

Yihong Wu (Yale) Recovery Threshold for TSP LP 4

slide-8
SLIDE 8

Key challenges for DNA scaffolding with Chicago data

  • Computational: TSP is NP-hard in the worst-case
  • Statistical: spurious cross-links between contigs that are far apart

Yihong Wu (Yale) Recovery Threshold for TSP LP 5

slide-9
SLIDE 9

Key challenges for DNA scaffolding with Chicago data

  • Computational: TSP is NP-hard in the worst-case
  • Statistical: spurious cross-links between contigs that are far apart

Key questions:

  • How to efficiently order hundreds of thousands of contigs?
  • How much noise can be tolerated for accurate DNA scaffolding?

Yihong Wu (Yale) Recovery Threshold for TSP LP 5

slide-10
SLIDE 10

Mathematical model for DNA scaffolding

50 100 150 200 20 40 60 80 100 120 140 160 180 200

10 20 30 40 50 60

Chicago dataset [Putnam et al. ’16]

Yihong Wu (Yale) Recovery Threshold for TSP LP 6

slide-11
SLIDE 11

Mathematical model for DNA scaffolding

50 100 150 200 20 40 60 80 100 120 140 160 180 200

10 20 30 40 50 60

Chicago dataset [Putnam et al. ’16]

Yihong Wu (Yale) Recovery Threshold for TSP LP 6

slide-12
SLIDE 12

Mathematical model for DNA scaffolding

50 100 150 200 20 40 60 80 100 120 140 160 180 200

10 20 30 40 50 60

Chicago dataset [Putnam et al. ’16]

50 100 150 200 20 40 60 80 100 120 140 160 180 200

5 10 15 20 25 30 35 40

Simulated Poisson data

Yihong Wu (Yale) Recovery Threshold for TSP LP 6

slide-13
SLIDE 13

Mathematical model for DNA scaffolding

50 100 150 200 20 40 60 80 100 120 140 160 180 200

10 20 30 40 50 60

Chicago dataset [Putnam et al. ’16]

50 100 150 200 20 40 60 80 100 120 140 160 180 200

5 10 15 20 25 30 35 40

Simulated Poisson data

Yihong Wu (Yale) Recovery Threshold for TSP LP 6

slide-14
SLIDE 14

What is known information-theoretically

Maximum likelihood estimator reduces to TSP

  • XTSP = arg max

X

L, X s.t. X is the adjacency matrix of some Hamiltonian cycle where L is the log likelihood ratio matrix Lij = log dP

dQ(Wij). For

Gaussian or Poisson, simply take L = W.

Yihong Wu (Yale) Recovery Threshold for TSP LP 7

slide-15
SLIDE 15

What is known information-theoretically

Maximum likelihood estimator reduces to TSP

  • XTSP = arg max

X

L, X s.t. X is the adjacency matrix of some Hamiltonian cycle where L is the log likelihood ratio matrix Lij = log dP

dQ(Wij). For

Gaussian or Poisson, simply take L = W. Theorem (Sharp threshold) If µ2 < 4 log n, exact recovery is information-theoretically impossible If µ2 > 4 log n, MLE succeeds in exact recovery

Yihong Wu (Yale) Recovery Threshold for TSP LP 7

slide-16
SLIDE 16

What is known algorithmically

  • Spectral methods fails miserably:

◮ µ ≫ n2.5 (spectral gap of cycle is too small) Yihong Wu (Yale) Recovery Threshold for TSP LP 8

slide-17
SLIDE 17

What is known algorithmically

  • Spectral methods fails miserably:

◮ µ ≫ n2.5 (spectral gap of cycle is too small)

  • Thresholding:

◮ µ > √8 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 8

slide-18
SLIDE 18

What is known algorithmically

  • Spectral methods fails miserably:

◮ µ ≫ n2.5 (spectral gap of cycle is too small)

  • Thresholding:

◮ µ > √8 log n

  • Greedy merging [Motahari-Bresler-Tse ’13]:

◮ µ > √6 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 8

slide-19
SLIDE 19

What is known algorithmically

  • Spectral methods fails miserably:

◮ µ ≫ n2.5 (spectral gap of cycle is too small)

  • Thresholding:

◮ µ > √8 log n

  • Greedy merging [Motahari-Bresler-Tse ’13]:

◮ µ > √6 log n

  • This talk: linear programming achieves sharp threshold

µ2 log n > 4 : LP succeeds µ2 log n < 4 : Everything fails

Yihong Wu (Yale) Recovery Threshold for TSP LP 8

slide-20
SLIDE 20

In general

Threshold are determined by R´ enyi divergence of order ρ > 0 from P to Q: Dρ(PQ) 1 ρ − 1 log

  • (dP)ρ(dQ)1−ρ.
  • LP works when

D1/2(PQ) − log n → ∞

  • ptimal under mild assumptions

Yihong Wu (Yale) Recovery Threshold for TSP LP 9

slide-21
SLIDE 21

In general

Threshold are determined by R´ enyi divergence of order ρ > 0 from P to Q: Dρ(PQ) 1 ρ − 1 log

  • (dP)ρ(dQ)1−ρ.
  • LP works when

D1/2(PQ) − log n → ∞

  • ptimal under mild assumptions
  • Thresholding works when

D1/2(PQ) − 2 log n → ∞

  • Greedy works when

D1/3(QP) − log n → ∞

Yihong Wu (Yale) Recovery Threshold for TSP LP 9

slide-22
SLIDE 22

Convex relaxations of TSP

slide-23
SLIDE 23

Integer Linear Programming reformulation of TSP

  • XTSP = arg max

X

W, X s.t.

  • j

Xij = 2, ∀i Xij ∈ {0, 1}

  • i∈I,j /

∈I

Xij ≥ 2, ∀∅ = I ⊂ [n]

Yihong Wu (Yale) Recovery Threshold for TSP LP 11

slide-24
SLIDE 24

Integer Linear Programming reformulation of TSP

  • XTSP = arg max

X

W, X s.t.

  • j

Xij = 2, ∀i Xij ∈ {0, 1}

  • i∈I,j /

∈I

Xij ≥ 2, ∀∅ = I ⊂ [n]

  • The last constraint: subtour elimination

Yihong Wu (Yale) Recovery Threshold for TSP LP 11

slide-25
SLIDE 25

Subtour LP

  • XSUB = arg max

X

W, X s.t.

  • j

Xij = 2, ∀i Xij ∈ [0, 1]

  • i∈I,j /

∈I

Xij ≥ 2, ∀∅ = I ⊂ [n]

Yihong Wu (Yale) Recovery Threshold for TSP LP 12

slide-26
SLIDE 26

Subtour LP

  • XSUB = arg max

X

W, X s.t.

  • j

Xij = 2, ∀i Xij ∈ [0, 1]

  • i∈I,j /

∈I

Xij ≥ 2, ∀∅ = I ⊂ [n]

  • Replacing the integrality constraint with box constraint: SUBTOUR

LP relaxation [Dantzig-Fulkerson-Johnson ’54, Held-Karp ’70]

  • Exponentially many linear constraints, nevertheless solvable using

interior point method

Yihong Wu (Yale) Recovery Threshold for TSP LP 12

slide-27
SLIDE 27

F2F LP

  • XF2F = arg max

X

W, X s.t.

  • j

Xij = 2, ∀i Xij ∈ [0, 1]

  • Further dropping subtour elimination constraints =

⇒ Fractional 2-factor (F2F) LP

Yihong Wu (Yale) Recovery Threshold for TSP LP 13

slide-28
SLIDE 28

F2F LP

  • XF2F = arg max

X

W, X s.t.

  • j

Xij = 2, ∀i Xij ∈ [0, 1]

  • Further dropping subtour elimination constraints =

⇒ Fractional 2-factor (F2F) LP

  • Extensively studied in worst case [Boyd-Carr ’99,Schalekamp-Williamson-van

Zuylen ’14]

◮ The integrality gap

2F F2F ≤ 4 3 for metric TSP (min formulation)

Yihong Wu (Yale) Recovery Threshold for TSP LP 13

slide-29
SLIDE 29

F2F LP

  • XF2F = arg max

X

W, X s.t.

  • j

Xij = 2, ∀i Xij ∈ [0, 1]

  • Further dropping subtour elimination constraints =

⇒ Fractional 2-factor (F2F) LP

  • Extensively studied in worst case [Boyd-Carr ’99,Schalekamp-Williamson-van

Zuylen ’14]

◮ The integrality gap

2F F2F ≤ 4 3 for metric TSP (min formulation)

  • What is the integrality gap whp in our random instance?

Yihong Wu (Yale) Recovery Threshold for TSP LP 13

slide-30
SLIDE 30

Optimality of Fractional 2-Factor LP

Theorem If µ2 − 4 log n → ∞, then XF2F = X∗ with high probability.

Yihong Wu (Yale) Recovery Threshold for TSP LP 14

slide-31
SLIDE 31

Optimality of Fractional 2-Factor LP

Theorem If µ2 − 4 log n → ∞, then XF2F = X∗ with high probability. Remarks

  • The integrality gap is 1 whp!
  • Achieving the IT-limit µ2 = 4 log n

Yihong Wu (Yale) Recovery Threshold for TSP LP 14

slide-32
SLIDE 32

Belief propagation

Max-Product Belief Propagation mi→j(t) = wij − 2nd max

ℓ=j

{mℓ→i(t − 1)} mi→j(0) = wij After T iterations, for each vertex i, keep the two largest incoming messages mℓ→i(T) and delete the rest.

  • BP is exact provided the solution is integral [Bayati-Borgs-Chayes-Zecchina

’11]

  • It can be shown that T = O(n2 log n) whp

Yihong Wu (Yale) Recovery Threshold for TSP LP 15

slide-33
SLIDE 33

SDP relaxations for TSP

Add more constraints to F2F LP

  • SDP1 [Cvetkovi´

c et al ’99]: PSD constraint based on second largest

eigenvalue of cycle X 2 nJ + 2 cos 2π n

  • I − 1

nJ

  • Yihong Wu (Yale)

Recovery Threshold for TSP LP 16

slide-34
SLIDE 34

SDP relaxations for TSP

Add more constraints to F2F LP

  • SDP1 [Cvetkovi´

c et al ’99]: PSD constraint based on second largest

eigenvalue of cycle X 2 nJ + 2 cos 2π n

  • I − 1

nJ

  • ◮ provably weaker than Subtour LP [Goemans-Rendl ’00]

Yihong Wu (Yale) Recovery Threshold for TSP LP 16

slide-35
SLIDE 35

SDP relaxations for TSP

Add more constraints to F2F LP

  • SDP1 [Cvetkovi´

c et al ’99]: PSD constraint based on second largest

eigenvalue of cycle X 2 nJ + 2 cos 2π n

  • I − 1

nJ

  • ◮ provably weaker than Subtour LP [Goemans-Rendl ’00]
  • SDP2 [Zhao et al ’98]: Quadratic Assignment Problem

W, X = W, Π X0

  • fixed

cycle

Π⊤ =

  • W ⊗ X0, vec(Π)vec(Π)⊤
  • relax..
  • Yihong Wu (Yale)

Recovery Threshold for TSP LP 16

slide-36
SLIDE 36

SDP relaxations for TSP

Add more constraints to F2F LP

  • SDP1 [Cvetkovi´

c et al ’99]: PSD constraint based on second largest

eigenvalue of cycle X 2 nJ + 2 cos 2π n

  • I − 1

nJ

  • ◮ provably weaker than Subtour LP [Goemans-Rendl ’00]
  • SDP2 [Zhao et al ’98]: Quadratic Assignment Problem

W, X = W, Π X0

  • fixed

cycle

Π⊤ =

  • W ⊗ X0, vec(Π)vec(Π)⊤
  • relax..
  • ◮ decision variable: n2 × n2 matrix

◮ provably stronger than SDP1 [de Klerk et al ’08] Yihong Wu (Yale) Recovery Threshold for TSP LP 16

slide-37
SLIDE 37

Different relaxations

TSP Subtour LP SDP 2 SDP 1 F2F LP

F2F LP succeeds = ⇒ all other relaxations succeeed.

Yihong Wu (Yale) Recovery Threshold for TSP LP 17

slide-38
SLIDE 38

Theoretical analysis of convex relaxation

slide-39
SLIDE 39

Primal approach vs Dual approach: high level

  • Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

Yihong Wu (Yale) Recovery Threshold for TSP LP 19

slide-40
SLIDE 40

Primal approach vs Dual approach: high level

  • Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

◮ Successful in proving SDP relaxation attaining sharp threshold for

graph partitions: community detection, densest subgraph, etc

[Abbe-Bandeira-Hall ’14,Hajek-W-Xu ’14,’15,Bandeira ’15,Perry-Wein ’15]

Yihong Wu (Yale) Recovery Threshold for TSP LP 19

slide-41
SLIDE 41

Primal approach vs Dual approach: high level

  • Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

◮ Successful in proving SDP relaxation attaining sharp threshold for

graph partitions: community detection, densest subgraph, etc

[Abbe-Bandeira-Hall ’14,Hajek-W-Xu ’14,’15,Bandeira ’15,Perry-Wein ’15]

◮ Limitations: construction is ad hoc Yihong Wu (Yale) Recovery Threshold for TSP LP 19

slide-42
SLIDE 42

Primal approach vs Dual approach: high level

  • Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

◮ Successful in proving SDP relaxation attaining sharp threshold for

graph partitions: community detection, densest subgraph, etc

[Abbe-Bandeira-Hall ’14,Hajek-W-Xu ’14,’15,Bandeira ’15,Perry-Wein ’15]

◮ Limitations: construction is ad hoc

  • Primal argument:

◮ No feasible solution other than the ground truth has a better

  • bjective value whp

Yihong Wu (Yale) Recovery Threshold for TSP LP 19

slide-43
SLIDE 43

Primal approach vs Dual approach: high level

  • Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

◮ Successful in proving SDP relaxation attaining sharp threshold for

graph partitions: community detection, densest subgraph, etc

[Abbe-Bandeira-Hall ’14,Hajek-W-Xu ’14,’15,Bandeira ’15,Perry-Wein ’15]

◮ Limitations: construction is ad hoc

  • Primal argument:

◮ No feasible solution other than the ground truth has a better

  • bjective value whp

◮ Key: for LP, can restrict to extremal points (vertices of the feasible

polytope)

Yihong Wu (Yale) Recovery Threshold for TSP LP 19

slide-44
SLIDE 44

Dual approach

  • KKT conditions (Farkas’ lemma):

XF2F = X∗ ⇐ ⇒ ∃u ∈ Rn (dual certificate): ui + uj ≤ Wij, for i ∼ j in C∗ ui + uj ≥ Wij, for i ∼ j in C∗

Yihong Wu (Yale) Recovery Threshold for TSP LP 20

slide-45
SLIDE 45

Dual approach

  • KKT conditions (Farkas’ lemma):

XF2F = X∗ ⇐ ⇒ ∃u ∈ Rn (dual certificate): ui + uj ≤ Wij, for i ∼ j in C∗ ui + uj ≥ Wij, for i ∼ j in C∗

  • One feasible choice of dual:

ui = 1 2 min{Wij : j ∼ i}

Yihong Wu (Yale) Recovery Threshold for TSP LP 20

slide-46
SLIDE 46

Dual approach

  • KKT conditions (Farkas’ lemma):

XF2F = X∗ ⇐ ⇒ ∃u ∈ Rn (dual certificate): ui + uj ≤ Wij, for i ∼ j in C∗ ui + uj ≥ Wij, for i ∼ j in C∗

  • One feasible choice of dual:

ui = 1 2 min{Wij : j ∼ i}

  • This certificate shows correctness if µ2 > 6 log n (same as greedy

merging)

Yihong Wu (Yale) Recovery Threshold for TSP LP 20

slide-47
SLIDE 47

Synthetic data experiment

Yihong Wu (Yale) Recovery Threshold for TSP LP 21

slide-48
SLIDE 48

Primal approach

  • Show whp for all extremal points X = X∗:

W, X < W, X∗

  • F2F polytope:

  X ∈ [0, 1]n×n :

n

  • j=1

Xij = 2   

  • The proof heavily exploits the characterization of extremal points

Yihong Wu (Yale) Recovery Threshold for TSP LP 22

slide-49
SLIDE 49

Primal approach

  • Show whp for all extremal points X = X∗:

W, X < W, X∗

  • F2F polytope:

  X ∈ [0, 1]n×n :

n

  • j=1

Xij = 2   

  • The proof heavily exploits the characterization of extremal points

◮ F2F polytope is not integral: fractional vertices exist Yihong Wu (Yale) Recovery Threshold for TSP LP 22

slide-50
SLIDE 50

Primal approach

  • Show whp for all extremal points X = X∗:

W, X < W, X∗

  • F2F polytope:

  X ∈ [0, 1]n×n :

n

  • j=1

Xij = 2   

  • The proof heavily exploits the characterization of extremal points

◮ F2F polytope is not integral: fractional vertices exist ◮ Characterization [Balinski ’65]: for any vertex X of F2F polytope

  • Half integrality

Xij ∈ {0, 1/2, 1}

Yihong Wu (Yale) Recovery Threshold for TSP LP 22

slide-51
SLIDE 51

Primal approach

  • Show whp for all extremal points X = X∗:

W, X < W, X∗

  • F2F polytope:

  X ∈ [0, 1]n×n :

n

  • j=1

Xij = 2   

  • The proof heavily exploits the characterization of extremal points

◮ F2F polytope is not integral: fractional vertices exist ◮ Characterization [Balinski ’65]: for any vertex X of F2F polytope

  • Half integrality

Xij ∈ {0, 1/2, 1}

  • 1/2’s form disjoint odd cycle connected by path of 1’s.

Yihong Wu (Yale) Recovery Threshold for TSP LP 22

slide-52
SLIDE 52

Primal approach

  • Show whp for all extremal points X = X∗:

W, X < W, X∗

  • F2F polytope:

  X ∈ [0, 1]n×n :

n

  • j=1

Xij = 2   

  • The proof heavily exploits the characterization of extremal points

◮ F2F polytope is not integral: fractional vertices exist ◮ Characterization [Balinski ’65]: for any vertex X of F2F polytope

  • Half integrality

Xij ∈ {0, 1/2, 1}

  • 1/2’s form disjoint odd cycle connected by path of 1’s.

Yihong Wu (Yale) Recovery Threshold for TSP LP 22

slide-53
SLIDE 53

Why half integral?

Usual proofs:

  • combinatorial proof [Lovasz-Plummer ’86, Schrijver ’04]
  • linear-algebraic proof

◮ F2F polytope (in adjacency vector):

{x ∈ R(

n [2]) : Ax = 21}

◮ A is n ×

n

2

  • zero-one matrix: Aie = 1{i∈e}

◮ Each column of A has exactly two 1’s Yihong Wu (Yale) Recovery Threshold for TSP LP 23

slide-54
SLIDE 54

Why half integral?

Extremal feasible solution x is of the following form x = ( xS

  • fractional

, xSc

  • integral

) for some S ⊂ n

[2]

  • f size n, where
  • xS is the solution to the following linear system:

ASxS = b′

Yihong Wu (Yale) Recovery Threshold for TSP LP 24

slide-55
SLIDE 55

Why half integral?

Extremal feasible solution x is of the following form x = ( xS

  • fractional

, xSc

  • integral

) for some S ⊂ n

[2]

  • f size n, where
  • xS is the solution to the following linear system:

ASxS = b′

  • Cramer’s rule:

(xS)i = det(A(i)

S )

det(AS)

◮ A(i)

S

is obtained by substituting the ith colum by b′, hence det(A(i)

S ) ∈ Z.

◮ Each column of AS has two 1’s =

⇒ det(AS) ∈ {0, ±1, ±2} [Balinski

’65]

Yihong Wu (Yale) Recovery Threshold for TSP LP 24

slide-56
SLIDE 56

Proof of correctness for F2F LP

slide-57
SLIDE 57

Proof Outline

1 Encode the solution: for any extremal point X, represent

2(X − X∗) as a bicolored multigraph GX w(GX) = W, 2(X − X∗)

Yihong Wu (Yale) Recovery Threshold for TSP LP 26

slide-58
SLIDE 58

Proof Outline

1 Encode the solution: for any extremal point X, represent

2(X − X∗) as a bicolored multigraph GX w(GX) = W, 2(X − X∗)

2 Divide and conquer: decompose GX as edge-disjoint union of

graphs in some family F w(GX) =

  • i

w(Fi), Fi ∈ F

Yihong Wu (Yale) Recovery Threshold for TSP LP 26

slide-59
SLIDE 59

Proof Outline

1 Encode the solution: for any extremal point X, represent

2(X − X∗) as a bicolored multigraph GX w(GX) = W, 2(X − X∗)

2 Divide and conquer: decompose GX as edge-disjoint union of

graphs in some family F w(GX) =

  • i

w(Fi), Fi ∈ F

3 Counting: Show that whp w(F) < 0 for all F ∈ F

Yihong Wu (Yale) Recovery Threshold for TSP LP 26

slide-60
SLIDE 60

Step 1: Bicolored multigraph representation

1 1 1 1 1 1 X∗: true cycle

Yihong Wu (Yale) Recovery Threshold for TSP LP 27

slide-61
SLIDE 61

Step 1: Bicolored multigraph representation

1 1 1

1 2 1 2 1 2 1 2 1 2 1 2

X: extremal solution

Yihong Wu (Yale) Recovery Threshold for TSP LP 27

slide-62
SLIDE 62

Step 1: Bicolored multigraph representation

1 1 1

1 2 1 2 1 2 1 2 1 2 1 2

X: extremal solution = ⇒ GX

Yihong Wu (Yale) Recovery Threshold for TSP LP 27

slide-63
SLIDE 63

Step 1: Bicolored multigraph representation

1 1 1

1 2 1 2 1 2 1 2 1 2 1 2

X: extremal solution = ⇒ GX key observation GX is always balanced: red degree = blue degree

Yihong Wu (Yale) Recovery Threshold for TSP LP 27

slide-64
SLIDE 64

1 2 1 2

1

1 2 1 2 1 2 1 2

1

1 2 1 2

1

1 2 1 2

1 1

Yihong Wu (Yale) Recovery Threshold for TSP LP 28

slide-65
SLIDE 65

Step 2: Edge decomposition

Theorem (Kotzig ’68) Every connected balanced bicolored multigraph has an alternating Eulerian circuit.

Yihong Wu (Yale) Recovery Threshold for TSP LP 29

slide-66
SLIDE 66

Step 2: Edge decomposition

Theorem (Kotzig ’68) Every connected balanced bicolored multigraph has an alternating Eulerian circuit. Remarks

  • An Eulerian circuit may traverse a double edge twice

“Dumbbell” structure

Yihong Wu (Yale) Recovery Threshold for TSP LP 29

slide-67
SLIDE 67

Step 2: Edge decomposition

U: collection of graphs recursively constructed

1 Start with an even cycle in alternating colors 2 Blossoming procedure: At each step, contract an edge in any

cycle and attach a flower (path of double edges followed by an alternating odd cycle)

Obtained by starting with an 10-cycle and blossoming 4 times

Yihong Wu (Yale) Recovery Threshold for TSP LP 30

slide-68
SLIDE 68

Step 2: Edge decomposition

U: collection of graphs recursively constructed

1 Start with an even cycle in alternating colors 2 Blossoming procedure: At each step, contract an edge in any

cycle and attach a flower (path of double edges followed by an alternating odd cycle)

Obtained by starting with an 10-cycle and blossoming 4 times

However, not every GX is of this form...

Yihong Wu (Yale) Recovery Threshold for TSP LP 30

slide-69
SLIDE 69
  • Graph homomorphism φ : H → F is a vertex map that preserves

edges and edge multiplicity

2 1 3 9 8 11 10 7 12 4 5 6 H

φ

− − − →

2 1 3 9 8 11 10 7 4 5 6 F Yihong Wu (Yale) Recovery Threshold for TSP LP 31

slide-70
SLIDE 70
  • Graph homomorphism φ : H → F is a vertex map that preserves

edges and edge multiplicity

2 1 3 9 8 11 10 7 12 4 5 6 H

φ

− − − →

2 1 3 9 8 11 10 7 4 5 6 F

Lemma (Decomposition) Every balanced bicolored multigraph G with edge multiplicity at most 2 can be decomposed as an union of elements in F = {F : V (F) ⊂ [n], H → F for some H ∈ U}

2 1 3 4 5 6 decompose

− − − − − − − − →

2 1 3 4 2 3 5 6

Yihong Wu (Yale) Recovery Threshold for TSP LP 31

slide-71
SLIDE 71
  • Graph homomorphism φ : H → F is a vertex map that preserves

edges and edge multiplicity

2 1 3 9 8 11 10 7 12 4 5 6 H

φ

− − − →

2 1 3 9 8 11 10 7 4 5 6 F

Lemma (Decomposition) Every balanced bicolored multigraph G with edge multiplicity at most 2 can be decomposed as an union of elements in F = {F : V (F) ⊂ [n], H → F for some H ∈ U}

2 1 3 4 5 6 decompose

− − − − − − − − →

2 1 3 4 2 3 5 6

  • It remains to show minF∈F w(F) < 0 whp

Yihong Wu (Yale) Recovery Threshold for TSP LP 31

slide-72
SLIDE 72

Step 3: Counting

Fk,ℓ = {F ∈ F : E(F) consists of k double edges and ℓ single edges } Lemma (Counting isomorphism classes) The number of distinct H ∈ U with k double edges and ℓ single edges is at most Ck+ℓ for universal constant C. Lemma (Counting homomorphisms) For each H ∈ U, there exists 0 ≤ r ≤ ℓ/2

  • Number of labelings for double edges:

≤ (Cn)k/2+r/2

  • Number of labelings for single edges conditioned on double edges

≤ (Cn)ℓ/2−r

Yihong Wu (Yale) Recovery Threshold for TSP LP 32

slide-73
SLIDE 73

Step 4: Probabilistic arguments

Fk,ℓ = {F ∈ F : E(F) consists of k double edges and ℓ single edges } Lemma For any k ≥ 0 and ℓ ≥ 3. With probability at least 1 − n−Θ(k+ℓ), max

F∈Fk,ℓ (w(F) − E [w(F)]) ≤ (1 + ǫ) (2k + ℓ)

  • log n

Yihong Wu (Yale) Recovery Threshold for TSP LP 33

slide-74
SLIDE 74

Step 4: Probabilistic arguments

Fk,ℓ = {F ∈ F : E(F) consists of k double edges and ℓ single edges } Lemma For any k ≥ 0 and ℓ ≥ 3. With probability at least 1 − n−Θ(k+ℓ), max

F∈Fk,ℓ (w(F) − E [w(F)]) ≤ (1 + ǫ) (2k + ℓ)

  • log n

Remarks

  • Total: 2k + ℓ edges, half red half blue. Weights on red edges

∼ N(µ, 1). Weights on blue edges ∼ N(0, 1). w(F) ∼ N(−(k + ℓ/2)µ, 4k + ℓ)

  • Proof: Counting Fk,ℓ and large deviation bounds

Yihong Wu (Yale) Recovery Threshold for TSP LP 33

slide-75
SLIDE 75

Real-data experiment

  • 1000 DNA contigs of size 100 kbps
  • 0.45 million Chicago cross-links
  • Subsample each cross-link with probability p

Yihong Wu (Yale) Recovery Threshold for TSP LP 34

slide-76
SLIDE 76

Homosapiens [Putnam et al 16, Genome Research]

Yihong Wu (Yale) Recovery Threshold for TSP LP 35

slide-77
SLIDE 77

Aedes Aegypti (zika mosquito) [Dudchenko et al ’16, Science]

Yihong Wu (Yale) Recovery Threshold for TSP LP 36

slide-78
SLIDE 78

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding

Yihong Wu (Yale) Recovery Threshold for TSP LP 37

slide-79
SLIDE 79

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding Future work

  • More realistic models

◮ 2-NN graph: IT limit becomes √2 log n not achieved by LP. Yihong Wu (Yale) Recovery Threshold for TSP LP 37

slide-80
SLIDE 80

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding Future work

  • More realistic models

◮ 2-NN graph: IT limit becomes √2 log n not achieved by LP. ◮ small-world graphs Yihong Wu (Yale) Recovery Threshold for TSP LP 37

slide-81
SLIDE 81

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding Future work

  • More realistic models

◮ 2-NN graph: IT limit becomes √2 log n not achieved by LP. ◮ small-world graphs

  • Smarter rounding algorithm in practice

Yihong Wu (Yale) Recovery Threshold for TSP LP 37

slide-82
SLIDE 82

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding Future work

  • More realistic models

◮ 2-NN graph: IT limit becomes √2 log n not achieved by LP. ◮ small-world graphs

  • Smarter rounding algorithm in practice
  • Reduction from/to Hamiltonian cycle and path more elegantly

References

  • Vivek Bagaria, Jian Ding, David Tse, W. & Jiaming Xu (2018). Hidden

Hamiltonian Cycle Recovery via Linear Programming, https://arxiv.org/abs/1804.05436

Yihong Wu (Yale) Recovery Threshold for TSP LP 37