[PPT] - Recovering a Hidden Hamiltonian Cycle via Linear Programming Yihong PowerPoint Presentation

SLIDE 1

Recovering a Hidden Hamiltonian Cycle via Linear Programming

Yihong Wu

Department of Statistics and Data Science Yale University Joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David Tse (Stanford) and Jiaming Xu (Purdue → Duke)

Workshop on Local Algorithms, MIT, June 13, 2018

SLIDE 2

Mathematical problem: Hidden Hamiltonian cycle model

Observe: a weighted undirected complete graph on n vertices with

weighted adjacency matrix W

Latent: a Hamiltonian cycle C∗
Edge weight

We

ind.

∼

P

e ∈ C∗ Q e / ∈ C∗

Yihong Wu (Yale) Recovery Threshold for TSP LP 2

SLIDE 3

Mathematical problem: Hidden Hamiltonian cycle model

Observe: a weighted undirected complete graph on n vertices with

weighted adjacency matrix W

Latent: a Hamiltonian cycle C∗
Edge weight

We

ind.

∼

P

e ∈ C∗ Q e / ∈ C∗

Goal: observe W, recover C∗ with high probability

Yihong Wu (Yale) Recovery Threshold for TSP LP 2

SLIDE 4

Mathematical problem: Hidden Hamiltonian cycle model

Observe: a weighted undirected complete graph on n vertices with

weighted adjacency matrix W

Latent: a Hamiltonian cycle C∗
Edge weight

We

ind.

∼

P

e ∈ C∗ Q e / ∈ C∗

Goal: observe W, recover C∗ with high probability

Remarks:

P, Q depends on the graph size n
For this talk, Q = N(0, 1) and P = N(µ, 1), so that

W = µ · adj matrix of C∗

“signal”

+noise

Hidden Hamiltonian cycle planted in Erd¨
s-R´

enyi graph

[Broder-Frieze-Shamir ’94]

Yihong Wu (Yale) Recovery Threshold for TSP LP 2

SLIDE 5

Link information in Chicago datasets

1 Reconstitute chromatin in vitro upon naked DNA 2 Produce cross-links by fixing chromatin with formaldehyde

Chicago datasets generate cross-links among contigs [Putnam et al. ’16 ]

On average more cross-links exist between adjacent contigs

Yihong Wu (Yale) Recovery Threshold for TSP LP 3

SLIDE 6

Ordering DNA contigs with Chicago cross-links

DNA Scaffolding

Yihong Wu (Yale) Recovery Threshold for TSP LP 4

SLIDE 7

Ordering DNA contigs with Chicago cross-links

DNA Scaffolding Reduces to traveling salesman problem (TSP) Find a path (tour) that visits every contig exactly once with the maximum number of cross-links

Yihong Wu (Yale) Recovery Threshold for TSP LP 4

SLIDE 8

Key challenges for DNA scaffolding with Chicago data

Computational: TSP is NP-hard in the worst-case
Statistical: spurious cross-links between contigs that are far apart

Yihong Wu (Yale) Recovery Threshold for TSP LP 5

SLIDE 9

Key challenges for DNA scaffolding with Chicago data

Computational: TSP is NP-hard in the worst-case
Statistical: spurious cross-links between contigs that are far apart

Key questions:

How to efficiently order hundreds of thousands of contigs?
How much noise can be tolerated for accurate DNA scaffolding?

Yihong Wu (Yale) Recovery Threshold for TSP LP 5

SLIDE 10

Mathematical model for DNA scaffolding

50 100 150 200 20 40 60 80 100 120 140 160 180 200

10 20 30 40 50 60

Chicago dataset [Putnam et al. ’16]

Yihong Wu (Yale) Recovery Threshold for TSP LP 6

SLIDE 11

Mathematical model for DNA scaffolding

50 100 150 200 20 40 60 80 100 120 140 160 180 200

10 20 30 40 50 60

Chicago dataset [Putnam et al. ’16]

Yihong Wu (Yale) Recovery Threshold for TSP LP 6

SLIDE 12

Mathematical model for DNA scaffolding

50 100 150 200 20 40 60 80 100 120 140 160 180 200

10 20 30 40 50 60

Chicago dataset [Putnam et al. ’16]

50 100 150 200 20 40 60 80 100 120 140 160 180 200

5 10 15 20 25 30 35 40

Simulated Poisson data

Yihong Wu (Yale) Recovery Threshold for TSP LP 6

SLIDE 13

Mathematical model for DNA scaffolding

50 100 150 200 20 40 60 80 100 120 140 160 180 200

10 20 30 40 50 60

Chicago dataset [Putnam et al. ’16]

50 100 150 200 20 40 60 80 100 120 140 160 180 200

5 10 15 20 25 30 35 40

Simulated Poisson data

Yihong Wu (Yale) Recovery Threshold for TSP LP 6

SLIDE 14

What is known information-theoretically

Maximum likelihood estimator reduces to TSP

XTSP = arg max

X

L, X s.t. X is the adjacency matrix of some Hamiltonian cycle where L is the log likelihood ratio matrix Lij = log dP

dQ(Wij). For

Gaussian or Poisson, simply take L = W.

Yihong Wu (Yale) Recovery Threshold for TSP LP 7

SLIDE 15

What is known information-theoretically

Maximum likelihood estimator reduces to TSP

XTSP = arg max

X

L, X s.t. X is the adjacency matrix of some Hamiltonian cycle where L is the log likelihood ratio matrix Lij = log dP

dQ(Wij). For

Gaussian or Poisson, simply take L = W. Theorem (Sharp threshold) If µ2 < 4 log n, exact recovery is information-theoretically impossible If µ2 > 4 log n, MLE succeeds in exact recovery

Yihong Wu (Yale) Recovery Threshold for TSP LP 7

SLIDE 16

What is known algorithmically

Spectral methods fails miserably:

◮ µ ≫ n2.5 (spectral gap of cycle is too small) Yihong Wu (Yale) Recovery Threshold for TSP LP 8

SLIDE 17

What is known algorithmically

Spectral methods fails miserably:

◮ µ ≫ n2.5 (spectral gap of cycle is too small)

Thresholding:

◮ µ > √8 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 8

SLIDE 18

What is known algorithmically

Spectral methods fails miserably:

◮ µ ≫ n2.5 (spectral gap of cycle is too small)

Thresholding:

◮ µ > √8 log n

Greedy merging [Motahari-Bresler-Tse ’13]:

◮ µ > √6 log n Yihong Wu (Yale) Recovery Threshold for TSP LP 8

SLIDE 19

What is known algorithmically

Spectral methods fails miserably:

◮ µ ≫ n2.5 (spectral gap of cycle is too small)

Thresholding:

◮ µ > √8 log n

Greedy merging [Motahari-Bresler-Tse ’13]:

◮ µ > √6 log n

This talk: linear programming achieves sharp threshold

µ2 log n > 4 : LP succeeds µ2 log n < 4 : Everything fails

Yihong Wu (Yale) Recovery Threshold for TSP LP 8

SLIDE 20

In general

Threshold are determined by R´ enyi divergence of order ρ > 0 from P to Q: Dρ(PQ) 1 ρ − 1 log

(dP)ρ(dQ)1−ρ.
LP works when

D1/2(PQ) − log n → ∞

ptimal under mild assumptions

Yihong Wu (Yale) Recovery Threshold for TSP LP 9

SLIDE 21

In general

Threshold are determined by R´ enyi divergence of order ρ > 0 from P to Q: Dρ(PQ) 1 ρ − 1 log

(dP)ρ(dQ)1−ρ.
LP works when

D1/2(PQ) − log n → ∞

ptimal under mild assumptions
Thresholding works when

D1/2(PQ) − 2 log n → ∞

Greedy works when

D1/3(QP) − log n → ∞

Yihong Wu (Yale) Recovery Threshold for TSP LP 9

SLIDE 22

Convex relaxations of TSP

SLIDE 23

Integer Linear Programming reformulation of TSP

XTSP = arg max

X

W, X s.t.

j

Xij = 2, ∀i Xij ∈ {0, 1}

i∈I,j /

∈I

Xij ≥ 2, ∀∅ = I ⊂ [n]

Yihong Wu (Yale) Recovery Threshold for TSP LP 11

SLIDE 24

Integer Linear Programming reformulation of TSP

XTSP = arg max

X

W, X s.t.

j

Xij = 2, ∀i Xij ∈ {0, 1}

i∈I,j /

∈I

Xij ≥ 2, ∀∅ = I ⊂ [n]

The last constraint: subtour elimination

Yihong Wu (Yale) Recovery Threshold for TSP LP 11

SLIDE 25

Subtour LP

XSUB = arg max

X

W, X s.t.

j

Xij = 2, ∀i Xij ∈ [0, 1]

i∈I,j /

∈I

Xij ≥ 2, ∀∅ = I ⊂ [n]

Yihong Wu (Yale) Recovery Threshold for TSP LP 12

SLIDE 26

Subtour LP

XSUB = arg max

X

W, X s.t.

j

Xij = 2, ∀i Xij ∈ [0, 1]

i∈I,j /

∈I

Xij ≥ 2, ∀∅ = I ⊂ [n]

Replacing the integrality constraint with box constraint: SUBTOUR

LP relaxation [Dantzig-Fulkerson-Johnson ’54, Held-Karp ’70]

Exponentially many linear constraints, nevertheless solvable using

interior point method

Yihong Wu (Yale) Recovery Threshold for TSP LP 12

SLIDE 27

F2F LP

XF2F = arg max

X

W, X s.t.

j

Xij = 2, ∀i Xij ∈ [0, 1]

Further dropping subtour elimination constraints =

⇒ Fractional 2-factor (F2F) LP

Yihong Wu (Yale) Recovery Threshold for TSP LP 13

SLIDE 28

F2F LP

XF2F = arg max

X

W, X s.t.

j

Xij = 2, ∀i Xij ∈ [0, 1]

Further dropping subtour elimination constraints =

⇒ Fractional 2-factor (F2F) LP

Extensively studied in worst case [Boyd-Carr ’99,Schalekamp-Williamson-van

Zuylen ’14]

◮ The integrality gap

2F F2F ≤ 4 3 for metric TSP (min formulation)

Yihong Wu (Yale) Recovery Threshold for TSP LP 13

SLIDE 29

F2F LP

XF2F = arg max

X

W, X s.t.

j

Xij = 2, ∀i Xij ∈ [0, 1]

Further dropping subtour elimination constraints =

⇒ Fractional 2-factor (F2F) LP

Extensively studied in worst case [Boyd-Carr ’99,Schalekamp-Williamson-van

Zuylen ’14]

◮ The integrality gap

2F F2F ≤ 4 3 for metric TSP (min formulation)

What is the integrality gap whp in our random instance?

Yihong Wu (Yale) Recovery Threshold for TSP LP 13

SLIDE 30

Optimality of Fractional 2-Factor LP

Theorem If µ2 − 4 log n → ∞, then XF2F = X∗ with high probability.

Yihong Wu (Yale) Recovery Threshold for TSP LP 14

SLIDE 31

Optimality of Fractional 2-Factor LP

Theorem If µ2 − 4 log n → ∞, then XF2F = X∗ with high probability. Remarks

The integrality gap is 1 whp!
Achieving the IT-limit µ2 = 4 log n

Yihong Wu (Yale) Recovery Threshold for TSP LP 14

SLIDE 32

Belief propagation

Max-Product Belief Propagation mi→j(t) = wij − 2nd max

ℓ=j

{mℓ→i(t − 1)} mi→j(0) = wij After T iterations, for each vertex i, keep the two largest incoming messages mℓ→i(T) and delete the rest.

BP is exact provided the solution is integral [Bayati-Borgs-Chayes-Zecchina

’11]

It can be shown that T = O(n2 log n) whp

Yihong Wu (Yale) Recovery Threshold for TSP LP 15

SLIDE 33

SDP relaxations for TSP

Add more constraints to F2F LP

SDP1 [Cvetkovi´

c et al ’99]: PSD constraint based on second largest

eigenvalue of cycle X 2 nJ + 2 cos 2π n

I − 1

nJ

Yihong Wu (Yale)

Recovery Threshold for TSP LP 16

SLIDE 34

SDP relaxations for TSP

Add more constraints to F2F LP

SDP1 [Cvetkovi´

c et al ’99]: PSD constraint based on second largest

eigenvalue of cycle X 2 nJ + 2 cos 2π n

I − 1

nJ

◮ provably weaker than Subtour LP [Goemans-Rendl ’00]

Yihong Wu (Yale) Recovery Threshold for TSP LP 16

SLIDE 35

SDP relaxations for TSP

Add more constraints to F2F LP

SDP1 [Cvetkovi´

c et al ’99]: PSD constraint based on second largest

eigenvalue of cycle X 2 nJ + 2 cos 2π n

I − 1

nJ

◮ provably weaker than Subtour LP [Goemans-Rendl ’00]
SDP2 [Zhao et al ’98]: Quadratic Assignment Problem

W, X = W, Π X0

fixed

cycle

Π⊤ =

W ⊗ X0, vec(Π)vec(Π)⊤
relax..
Yihong Wu (Yale)

Recovery Threshold for TSP LP 16

SLIDE 36

SDP relaxations for TSP

Add more constraints to F2F LP

SDP1 [Cvetkovi´

c et al ’99]: PSD constraint based on second largest

eigenvalue of cycle X 2 nJ + 2 cos 2π n

I − 1

nJ

◮ provably weaker than Subtour LP [Goemans-Rendl ’00]
SDP2 [Zhao et al ’98]: Quadratic Assignment Problem

W, X = W, Π X0

fixed

cycle

Π⊤ =

W ⊗ X0, vec(Π)vec(Π)⊤
relax..
◮ decision variable: n2 × n2 matrix

◮ provably stronger than SDP1 [de Klerk et al ’08] Yihong Wu (Yale) Recovery Threshold for TSP LP 16

SLIDE 37

Different relaxations

TSP Subtour LP SDP 2 SDP 1 F2F LP

F2F LP succeeds = ⇒ all other relaxations succeeed.

Yihong Wu (Yale) Recovery Threshold for TSP LP 17

SLIDE 38

Theoretical analysis of convex relaxation

SLIDE 39

Primal approach vs Dual approach: high level

Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

Yihong Wu (Yale) Recovery Threshold for TSP LP 19

SLIDE 40

Primal approach vs Dual approach: high level

Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

◮ Successful in proving SDP relaxation attaining sharp threshold for

graph partitions: community detection, densest subgraph, etc

[Abbe-Bandeira-Hall ’14,Hajek-W-Xu ’14,’15,Bandeira ’15,Perry-Wein ’15]

Yihong Wu (Yale) Recovery Threshold for TSP LP 19

SLIDE 41

Primal approach vs Dual approach: high level

Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

◮ Successful in proving SDP relaxation attaining sharp threshold for

graph partitions: community detection, densest subgraph, etc

[Abbe-Bandeira-Hall ’14,Hajek-W-Xu ’14,’15,Bandeira ’15,Perry-Wein ’15]

◮ Limitations: construction is ad hoc Yihong Wu (Yale) Recovery Threshold for TSP LP 19

SLIDE 42

Primal approach vs Dual approach: high level

Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

◮ Successful in proving SDP relaxation attaining sharp threshold for

graph partitions: community detection, densest subgraph, etc

[Abbe-Bandeira-Hall ’14,Hajek-W-Xu ’14,’15,Bandeira ’15,Perry-Wein ’15]

◮ Limitations: construction is ad hoc

Primal argument:

◮ No feasible solution other than the ground truth has a better

bjective value whp

Yihong Wu (Yale) Recovery Threshold for TSP LP 19

SLIDE 43

Primal approach vs Dual approach: high level

Dual argument:

◮ Construct dual witness that certify the ground truth whp (KKT

conditions)

◮ Successful in proving SDP relaxation attaining sharp threshold for

graph partitions: community detection, densest subgraph, etc

[Abbe-Bandeira-Hall ’14,Hajek-W-Xu ’14,’15,Bandeira ’15,Perry-Wein ’15]

◮ Limitations: construction is ad hoc

Primal argument:

◮ No feasible solution other than the ground truth has a better

bjective value whp

◮ Key: for LP, can restrict to extremal points (vertices of the feasible

polytope)

Yihong Wu (Yale) Recovery Threshold for TSP LP 19

SLIDE 44

Dual approach

KKT conditions (Farkas’ lemma):

XF2F = X∗ ⇐ ⇒ ∃u ∈ Rn (dual certificate): ui + uj ≤ Wij, for i ∼ j in C∗ ui + uj ≥ Wij, for i ∼ j in C∗

Yihong Wu (Yale) Recovery Threshold for TSP LP 20

SLIDE 45

Dual approach

KKT conditions (Farkas’ lemma):

XF2F = X∗ ⇐ ⇒ ∃u ∈ Rn (dual certificate): ui + uj ≤ Wij, for i ∼ j in C∗ ui + uj ≥ Wij, for i ∼ j in C∗

One feasible choice of dual:

ui = 1 2 min{Wij : j ∼ i}

Yihong Wu (Yale) Recovery Threshold for TSP LP 20

SLIDE 46

Dual approach

KKT conditions (Farkas’ lemma):

XF2F = X∗ ⇐ ⇒ ∃u ∈ Rn (dual certificate): ui + uj ≤ Wij, for i ∼ j in C∗ ui + uj ≥ Wij, for i ∼ j in C∗

One feasible choice of dual:

ui = 1 2 min{Wij : j ∼ i}

This certificate shows correctness if µ2 > 6 log n (same as greedy

merging)

Yihong Wu (Yale) Recovery Threshold for TSP LP 20

SLIDE 47

Synthetic data experiment

Yihong Wu (Yale) Recovery Threshold for TSP LP 21

SLIDE 48

Primal approach

Show whp for all extremal points X = X∗:

W, X < W, X∗

F2F polytope:

  X ∈ [0, 1]n×n :

n

j=1

Xij = 2   

The proof heavily exploits the characterization of extremal points

Yihong Wu (Yale) Recovery Threshold for TSP LP 22

SLIDE 49

Primal approach

Show whp for all extremal points X = X∗:

W, X < W, X∗

F2F polytope:

  X ∈ [0, 1]n×n :

n

j=1

Xij = 2   

The proof heavily exploits the characterization of extremal points

◮ F2F polytope is not integral: fractional vertices exist Yihong Wu (Yale) Recovery Threshold for TSP LP 22

SLIDE 50

Primal approach

Show whp for all extremal points X = X∗:

W, X < W, X∗

F2F polytope:

  X ∈ [0, 1]n×n :

n

j=1

Xij = 2   

The proof heavily exploits the characterization of extremal points

◮ F2F polytope is not integral: fractional vertices exist ◮ Characterization [Balinski ’65]: for any vertex X of F2F polytope

Half integrality

Xij ∈ {0, 1/2, 1}

Yihong Wu (Yale) Recovery Threshold for TSP LP 22

SLIDE 51

Primal approach

Show whp for all extremal points X = X∗:

W, X < W, X∗

F2F polytope:

  X ∈ [0, 1]n×n :

n

j=1

Xij = 2   

The proof heavily exploits the characterization of extremal points

◮ F2F polytope is not integral: fractional vertices exist ◮ Characterization [Balinski ’65]: for any vertex X of F2F polytope

Half integrality

Xij ∈ {0, 1/2, 1}

1/2’s form disjoint odd cycle connected by path of 1’s.

Yihong Wu (Yale) Recovery Threshold for TSP LP 22

SLIDE 52

Primal approach

Show whp for all extremal points X = X∗:

W, X < W, X∗

F2F polytope:

  X ∈ [0, 1]n×n :

n

j=1

Xij = 2   

The proof heavily exploits the characterization of extremal points

◮ F2F polytope is not integral: fractional vertices exist ◮ Characterization [Balinski ’65]: for any vertex X of F2F polytope

Half integrality

Xij ∈ {0, 1/2, 1}

1/2’s form disjoint odd cycle connected by path of 1’s.

Yihong Wu (Yale) Recovery Threshold for TSP LP 22

SLIDE 53

Why half integral?

Usual proofs:

combinatorial proof [Lovasz-Plummer ’86, Schrijver ’04]
linear-algebraic proof

◮ F2F polytope (in adjacency vector):

{x ∈ R(

n [2]) : Ax = 21}

◮ A is n ×

n

2

zero-one matrix: Aie = 1{i∈e}

◮ Each column of A has exactly two 1’s Yihong Wu (Yale) Recovery Threshold for TSP LP 23

SLIDE 54

Why half integral?

Extremal feasible solution x is of the following form x = ( xS

fractional

, xSc

integral

) for some S ⊂ n

[2]

f size n, where
xS is the solution to the following linear system:

ASxS = b′

Yihong Wu (Yale) Recovery Threshold for TSP LP 24

SLIDE 55

Why half integral?

Extremal feasible solution x is of the following form x = ( xS

fractional

, xSc

integral

) for some S ⊂ n

[2]

f size n, where
xS is the solution to the following linear system:

ASxS = b′

Cramer’s rule:

(xS)i = det(A(i)

S )

det(AS)

◮ A(i)

S

is obtained by substituting the ith colum by b′, hence det(A(i)

S ) ∈ Z.

◮ Each column of AS has two 1’s =

⇒ det(AS) ∈ {0, ±1, ±2} [Balinski

’65]

Yihong Wu (Yale) Recovery Threshold for TSP LP 24

SLIDE 56

Proof of correctness for F2F LP

SLIDE 57

Proof Outline

1 Encode the solution: for any extremal point X, represent

2(X − X∗) as a bicolored multigraph GX w(GX) = W, 2(X − X∗)

Yihong Wu (Yale) Recovery Threshold for TSP LP 26

SLIDE 58

Proof Outline

1 Encode the solution: for any extremal point X, represent

2(X − X∗) as a bicolored multigraph GX w(GX) = W, 2(X − X∗)

2 Divide and conquer: decompose GX as edge-disjoint union of

graphs in some family F w(GX) =

i

w(Fi), Fi ∈ F

Yihong Wu (Yale) Recovery Threshold for TSP LP 26

SLIDE 59

Proof Outline

1 Encode the solution: for any extremal point X, represent

2(X − X∗) as a bicolored multigraph GX w(GX) = W, 2(X − X∗)

2 Divide and conquer: decompose GX as edge-disjoint union of

graphs in some family F w(GX) =

i

w(Fi), Fi ∈ F

3 Counting: Show that whp w(F) < 0 for all F ∈ F

Yihong Wu (Yale) Recovery Threshold for TSP LP 26

SLIDE 60

Step 1: Bicolored multigraph representation

1 1 1 1 1 1 X∗: true cycle

Yihong Wu (Yale) Recovery Threshold for TSP LP 27

SLIDE 61

Step 1: Bicolored multigraph representation

1 1 1

1 2 1 2 1 2 1 2 1 2 1 2

X: extremal solution

Yihong Wu (Yale) Recovery Threshold for TSP LP 27

SLIDE 62

Step 1: Bicolored multigraph representation

1 1 1

1 2 1 2 1 2 1 2 1 2 1 2

X: extremal solution = ⇒ GX

Yihong Wu (Yale) Recovery Threshold for TSP LP 27

SLIDE 63

Step 1: Bicolored multigraph representation

1 1 1

1 2 1 2 1 2 1 2 1 2 1 2

X: extremal solution = ⇒ GX key observation GX is always balanced: red degree = blue degree

Yihong Wu (Yale) Recovery Threshold for TSP LP 27

SLIDE 64

1 2 1 2

1

1 2 1 2 1 2 1 2

1

1 2 1 2

1

1 2 1 2

1 1

⇓

Yihong Wu (Yale) Recovery Threshold for TSP LP 28

SLIDE 65

Step 2: Edge decomposition

Theorem (Kotzig ’68) Every connected balanced bicolored multigraph has an alternating Eulerian circuit.

Yihong Wu (Yale) Recovery Threshold for TSP LP 29

SLIDE 66

Step 2: Edge decomposition

Theorem (Kotzig ’68) Every connected balanced bicolored multigraph has an alternating Eulerian circuit. Remarks

An Eulerian circuit may traverse a double edge twice

“Dumbbell” structure

Yihong Wu (Yale) Recovery Threshold for TSP LP 29

SLIDE 67

Step 2: Edge decomposition

U: collection of graphs recursively constructed

1 Start with an even cycle in alternating colors 2 Blossoming procedure: At each step, contract an edge in any

cycle and attach a flower (path of double edges followed by an alternating odd cycle)

Obtained by starting with an 10-cycle and blossoming 4 times

Yihong Wu (Yale) Recovery Threshold for TSP LP 30

SLIDE 68

Step 2: Edge decomposition

U: collection of graphs recursively constructed

1 Start with an even cycle in alternating colors 2 Blossoming procedure: At each step, contract an edge in any

cycle and attach a flower (path of double edges followed by an alternating odd cycle)

Obtained by starting with an 10-cycle and blossoming 4 times

However, not every GX is of this form...

Yihong Wu (Yale) Recovery Threshold for TSP LP 30

SLIDE 69

Graph homomorphism φ : H → F is a vertex map that preserves

edges and edge multiplicity

2 1 3 9 8 11 10 7 12 4 5 6 H

φ

− − − →

2 1 3 9 8 11 10 7 4 5 6 F Yihong Wu (Yale) Recovery Threshold for TSP LP 31

SLIDE 70

Graph homomorphism φ : H → F is a vertex map that preserves

edges and edge multiplicity

2 1 3 9 8 11 10 7 12 4 5 6 H

φ

− − − →

2 1 3 9 8 11 10 7 4 5 6 F

Lemma (Decomposition) Every balanced bicolored multigraph G with edge multiplicity at most 2 can be decomposed as an union of elements in F = {F : V (F) ⊂ [n], H → F for some H ∈ U}

2 1 3 4 5 6 decompose

− − − − − − − − →

2 1 3 4 2 3 5 6

Yihong Wu (Yale) Recovery Threshold for TSP LP 31

SLIDE 71

Graph homomorphism φ : H → F is a vertex map that preserves

edges and edge multiplicity

2 1 3 9 8 11 10 7 12 4 5 6 H

φ

− − − →

2 1 3 9 8 11 10 7 4 5 6 F

Lemma (Decomposition) Every balanced bicolored multigraph G with edge multiplicity at most 2 can be decomposed as an union of elements in F = {F : V (F) ⊂ [n], H → F for some H ∈ U}

2 1 3 4 5 6 decompose

− − − − − − − − →

2 1 3 4 2 3 5 6

It remains to show minF∈F w(F) < 0 whp

Yihong Wu (Yale) Recovery Threshold for TSP LP 31

SLIDE 72

Step 3: Counting

Fk,ℓ = {F ∈ F : E(F) consists of k double edges and ℓ single edges } Lemma (Counting isomorphism classes) The number of distinct H ∈ U with k double edges and ℓ single edges is at most Ck+ℓ for universal constant C. Lemma (Counting homomorphisms) For each H ∈ U, there exists 0 ≤ r ≤ ℓ/2

Number of labelings for double edges:

≤ (Cn)k/2+r/2

Number of labelings for single edges conditioned on double edges

≤ (Cn)ℓ/2−r

Yihong Wu (Yale) Recovery Threshold for TSP LP 32

SLIDE 73

Step 4: Probabilistic arguments

Fk,ℓ = {F ∈ F : E(F) consists of k double edges and ℓ single edges } Lemma For any k ≥ 0 and ℓ ≥ 3. With probability at least 1 − n−Θ(k+ℓ), max

F∈Fk,ℓ (w(F) − E [w(F)]) ≤ (1 + ǫ) (2k + ℓ)

log n

Yihong Wu (Yale) Recovery Threshold for TSP LP 33

SLIDE 74

Step 4: Probabilistic arguments

Fk,ℓ = {F ∈ F : E(F) consists of k double edges and ℓ single edges } Lemma For any k ≥ 0 and ℓ ≥ 3. With probability at least 1 − n−Θ(k+ℓ), max

F∈Fk,ℓ (w(F) − E [w(F)]) ≤ (1 + ǫ) (2k + ℓ)

log n

Remarks

Total: 2k + ℓ edges, half red half blue. Weights on red edges

∼ N(µ, 1). Weights on blue edges ∼ N(0, 1). w(F) ∼ N(−(k + ℓ/2)µ, 4k + ℓ)

Proof: Counting Fk,ℓ and large deviation bounds

Yihong Wu (Yale) Recovery Threshold for TSP LP 33

SLIDE 75

Real-data experiment

1000 DNA contigs of size 100 kbps
0.45 million Chicago cross-links
Subsample each cross-link with probability p

Yihong Wu (Yale) Recovery Threshold for TSP LP 34

SLIDE 76

Homosapiens [Putnam et al 16, Genome Research]

Yihong Wu (Yale) Recovery Threshold for TSP LP 35

SLIDE 77

Aedes Aegypti (zika mosquito) [Dudchenko et al ’16, Science]

Yihong Wu (Yale) Recovery Threshold for TSP LP 36

SLIDE 78

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding

Yihong Wu (Yale) Recovery Threshold for TSP LP 37

SLIDE 79

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding Future work

More realistic models

◮ 2-NN graph: IT limit becomes √2 log n not achieved by LP. Yihong Wu (Yale) Recovery Threshold for TSP LP 37

SLIDE 80

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding Future work

More realistic models

◮ 2-NN graph: IT limit becomes √2 log n not achieved by LP. ◮ small-world graphs Yihong Wu (Yale) Recovery Threshold for TSP LP 37

SLIDE 81

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding Future work

More realistic models

◮ 2-NN graph: IT limit becomes √2 log n not achieved by LP. ◮ small-world graphs

Smarter rounding algorithm in practice

Yihong Wu (Yale) Recovery Threshold for TSP LP 37

SLIDE 82

Conclusion and remarks

µ2/ log n 4 IT limit/F2F 6 greedy 8 thresholding Future work

More realistic models

◮ 2-NN graph: IT limit becomes √2 log n not achieved by LP. ◮ small-world graphs

Smarter rounding algorithm in practice
Reduction from/to Hamiltonian cycle and path more elegantly

References

Vivek Bagaria, Jian Ding, David Tse, W. & Jiaming Xu (2018). Hidden

Hamiltonian Cycle Recovery via Linear Programming, https://arxiv.org/abs/1804.05436

Yihong Wu (Yale) Recovery Threshold for TSP LP 37