Sparse ( 0 , 1 ) array and perfect phylogeny Yanzhen Xiong Shanghai - - PowerPoint PPT Presentation

sparse 0 1 array and perfect phylogeny
SMART_READER_LITE
LIVE PREVIEW

Sparse ( 0 , 1 ) array and perfect phylogeny Yanzhen Xiong Shanghai - - PowerPoint PPT Presentation

Sparse ( 0 , 1 ) array and perfect phylogeny Yanzhen Xiong Shanghai Jiao Tong University Joint work with Yaokun Wu Yichang, August 23, 2019 1/34 Outline Sparse ( 0 , 1 ) array 1 Perfect phylogeny 2 Problems 3 2/34 2 / 34 For any


slide-1
SLIDE 1

Sparse (0, 1) array and perfect phylogeny

Yanzhen Xiong

Shanghai Jiao Tong University

Joint work with Yaokun Wu Yichang, August 23, 2019

1/34

slide-2
SLIDE 2

2/34

Outline

1

Sparse (0, 1) array

2

Perfect phylogeny

3

Problems

2 / 34

slide-3
SLIDE 3

3/34

For any positive integer n, let [n] denote the set {1, . . . , n}. Given positive integers a1, . . . , an, a map M ∈ {0, 1}[a1]×···×[an] is called an n-dimensional (0, 1) array (or tensor) of size a1 × · · · × an. For every nonempty subsets Si ⊆ [ai], i ∈ [n], the restriction of M to S1 × · · · × Sn is a subarray of M.

3 / 34

slide-4
SLIDE 4

4/34

Taking Boolean sum to get the projection

Let M be an n-dim’l (0, 1) array of size a1 × · · · × an. For {i1, . . . , ik} ∈ [n]

k

  • , let Mi1,...,ik be the k-dim’l (0, 1) array of size

ai1 × · · · × aik such that Mi1,...,ik(ti1, . . . , tik) = 0 if and only if

  • tj ∈ [aj]

j / ∈ {i1, . . . , ik} M(t1, . . . , tn) = 0 We call Mi1,...,ik the k-dim’l projection of M to {i1, . . . , ik}.

4 / 34

slide-5
SLIDE 5

5/34

Taking Boolean sum to get the projection

1 1 1 1 1 1 1 1 1 1

z y x 2-dim’l proj. to {x, y} 2-dim’l proj. to {y, z} 1-dim’l proj. to {y}     1 1 1 1 1 1 1       1 1 1 1 1 1  

  • 1

1 1

  • 5 / 34
slide-6
SLIDE 6

5/34

Taking Boolean sum to get the projection

1 1 1 1 1 1 1 1 1 1

z y x 2-dim’l proj. to {x, y} 2-dim’l proj. to {y, z} 1-dim’l proj. to {y}     1 1 1 1 1 1 1       1 1 1 1 1 1  

  • 1

1 1

  • 5 / 34
slide-7
SLIDE 7

5/34

Taking Boolean sum to get the projection

1 1 1 1 1 1 1 1 1 1

z y x 2-dim’l proj. to {x, y} 2-dim’l proj. to {y, z} 1-dim’l proj. to {y}     1 1 1 1 1 1 1       1 1 1 1 1 1  

  • 1

1 1

  • 5 / 34
slide-8
SLIDE 8

5/34

Taking Boolean sum to get the projection

1 1 1 1 1 1 1 1 1 1

z y x 2-dim’l proj. to {x, y} 2-dim’l proj. to {y, z} 1-dim’l proj. to {y}     1 1 1 1 1 1 1       1 1 1 1 1 1  

  • 1

1 1

  • 5 / 34
slide-9
SLIDE 9

5/34

Taking Boolean sum to get the projection

1 1 1 1 1 1 1 1 1 1

z y x 2-dim’l proj. to {x, y} 2-dim’l proj. to {y, z} 1-dim’l proj. to {y}     1 1 1 1 1 1 1       1 1 1 1 1 1  

  • 1

1 1

  • 5 / 34
slide-10
SLIDE 10

5/34

Taking Boolean sum to get the projection

1 1 1 1 1 1 1 1 1 1

z y x 2-dim’l proj. to {x, y} 2-dim’l proj. to {y, z} 1-dim’l proj. to {y}     1 1 1 1 1 1 1       1 1 1 1 1 1  

  • 1

1 1

  • 5 / 34
slide-11
SLIDE 11

5/34

Taking Boolean sum to get the projection

1 1 1 1 1 1 1 1 1 1

z y x 2-dim’l proj. to {x, y} 2-dim’l proj. to {y, z} 1-dim’l proj. to {y}     1 1 1 1 1 1 1       1 1 1 1 1 1  

  • 1

1 1

  • 5 / 34
slide-12
SLIDE 12

6/34

Sparsity, k-dimensional margins

Let M be an n-dim’l (0, 1) array of size a1 × · · · × an. We say that M has the sparse property Qn if for all nonempty subsets S1 ⊆ [a1], . . . , Sn ⊆ [an], it holds

  • (t1,...,tn)∈S1×···×Sn

M(t1, . . . , tn) ≤

n

  • j=1

(|Sj| − 1) + 1. We say that M has the sparse property Qk if every k-dim’l projection of M has the sparse property Qk for positive integer k ≤ n.

6 / 34

slide-13
SLIDE 13

7/34

Q2: For every subarray of size a × b, ♯1 ≤ (a − 1) + (b − 1) + 1. 1 1 1 1

  • not Q2

  1 1 1 1 1 1   not Q2     1 1 1 1 1 1 1     sparse Q2

7 / 34

slide-14
SLIDE 14

7/34

Q2: For every subarray of size a × b, ♯1 ≤ (a − 1) + (b − 1) + 1. 1 1 1 1

  • not Q2

  1 1 1 1 1 1   not Q2     1 1 1 1 1 1 1     sparse Q2

7 / 34

slide-15
SLIDE 15

7/34

Q2: For every subarray of size a × b, ♯1 ≤ (a − 1) + (b − 1) + 1. 1 1 1 1

  • not Q2

  1 1 1 1 1 1   not Q2     1 1 1 1 1 1 1     sparse Q2

7 / 34

slide-16
SLIDE 16

7/34

Q2: For every subarray of size a × b, ♯1 ≤ (a − 1) + (b − 1) + 1. 1 1 1 1

  • not Q2

  1 1 1 1 1 1   not Q2     1 1 1 1 1 1 1     sparse Q2

7 / 34

slide-17
SLIDE 17

7/34

Q2: For every subarray of size a × b, ♯1 ≤ (a − 1) + (b − 1) + 1. 1 1 1 1

  • not Q2

  1 1 1 1 1 1   not Q2     1 1 1 1 1 1 1     sparse Q2

7 / 34

slide-18
SLIDE 18

8/34

Q3: For every subarray of size a × b × c, ♯1 ≤ (a − 1) + (b − 1) + (c − 1) + 1.

1 1 1 1

Sparse Q3

8 / 34

slide-19
SLIDE 19

9/34

Theorem (2019+ Wu-X)

Let k and n be integers with 3 ≤ k ≤ n and let M be an n-dim’l (0, 1) array.

1

If M has the property Qk−1, then M has the property Qk.

2

It can happen that M has the property Qk but does not have the property Qk−1.

9 / 34

slide-20
SLIDE 20

10/34

1 1 1 1

It has the property Q3 but does not have the property Q2.

10 / 34

slide-21
SLIDE 21

11/34

Let M be an n-dim’l (0, 1) array of size a1 × · · · × an. We say that M has the sparse property Qn if

  • (t1,...,tn)∈[a1]×···×[an]

M(t1, . . . , tn) =

n

  • j=1

(aj − 1) + 1. We say that M has the sparse property Qk if every k-dim’l projection of M has the sparse property Qk.

11 / 34

slide-22
SLIDE 22

12/34

Sparse completion

Theorem (2019+ Wu-X)

Let n be an integer with n ≥ 3 and let M be an n-dimensional (0, 1) array satisfying the property Qn. Then, exact one of the following holds:

1

M has the properties Qk and Qk for all k ∈ [n].

2

M has the property Qk for no k ∈ {2, . . . , n − 1}. Let M and M′ be two n-dim’l (0, 1) arrays of equal size. If M ≤ M′ (entry-wise) and M′ has the sparse properties Q2 and Qn, then we call M′ a sparse completion of M.

12 / 34

slide-23
SLIDE 23

13/34

From array to partition system

    1 1 1 1 1 1 1         a ∅ ∅ ∅ b c ∅ ∅ ∅ d e f ∅ ∅ ∅ g     X = {a, b, c, d, e, f, g} {{ a|bc|def|g, ab|cd|e|fg }}

13 / 34

slide-24
SLIDE 24

14/34

From partition system to array

X = {a, b, c, d, e, f, g} C = {{ abc|defg, bde|acfg, ef|bd|acg }}

∅ ∅ b d ∅ f ∅ e a, c g ∅ ∅

14 / 34

slide-25
SLIDE 25

14/34

From partition system to array

X = {a, b, c, d, e, f, g} C = {{ abc|defg, bde|acfg, ef|bd|acg }}

∅ ∅ b d ∅ f ∅ e a, c g ∅ ∅

14 / 34

slide-26
SLIDE 26

14/34

From partition system to array

X = {a, b, c, d, e, f, g} C = {{ abc|defg, bde|acfg, ef|bd|acg }}

∅ ∅ b d ∅ f ∅ e a, c g ∅ ∅

14 / 34

slide-27
SLIDE 27

14/34

From partition system to array

X = {a, b, c, d, e, f, g} C = {{ abc|defg, bde|acfg, ef|bd|acg }}

∅ ∅ b d ∅ f ∅ e a, c g ∅ ∅

14 / 34

slide-28
SLIDE 28

14/34

From partition system to array

X = {a, b, c, d, e, f, g} C = {{ abc|defg, bde|acfg, ef|bd|acg }}

∅ ∅ b d ∅ f ∅ e a, c g ∅ ∅ 1 1 1 1 1 1

MC : Intersection array of C

14 / 34

slide-29
SLIDE 29

15/34

Perfect phylogeny

We call a partition system compatible if it can be displayed on a labeled tree.

a b, c d f g e

X = {a, b, c, d, e, f, g} {{ abc|de|fg, a|bc|dfg|e }}

15 / 34

slide-30
SLIDE 30

15/34

Perfect phylogeny

We call a partition system compatible if it can be displayed on a labeled tree.

a b, c d f g e

X = {a, b, c, d, e, f, g} {{ abc|de|fg, a|bc|dfg|e }}

15 / 34

slide-31
SLIDE 31

15/34

Perfect phylogeny

We call a partition system compatible if it can be displayed on a labeled tree.

a b, c d f g e

X = {a, b, c, d, e, f, g} {{ abc|de|fg, a|bc|dfg|e }}

15 / 34

slide-32
SLIDE 32

15/34

Perfect phylogeny

We call a partition system compatible if it can be displayed on a labeled tree.

a b, c d f g e

X = {a, b, c, d, e, f, g} {{ abc|de|fg, a|bc|dfg|e }}

15 / 34

slide-33
SLIDE 33

16/34

Theorem (2019+ Wu-X)

Let C be a partition system of size n and let MC be the intersection array of C. Then

1

C is compatible if and only if MC has a sparse completion.

2

When n = 2, C is compatible if and only if MC has the sparse property Q2.

16 / 34

slide-34
SLIDE 34

17/34

From a sparse completion to a tree

a ∅ b c ∅ ∅ ∅ ∅ ∅ ∅ ∅ d ∅ ∅ ∅ ∅ e f ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ g ∅ ∅ ∅ h ∅ ∅ ∅ i 1 1 1 1 1 1 1 1 1

17 / 34

slide-35
SLIDE 35

17/34

From a sparse completion to a tree

a ∅ b c ∅ ∅ ∅ ∅ ∅ ∅ ∅ d ∅ ∅ ∅ ∅ e f ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ g ∅ ∅ ∅ h ∅ ∅ ∅ i

17 / 34

slide-36
SLIDE 36

17/34

From a sparse completion to a tree

a ∅ b c ∅ ∅ ∅ ∅ ∅ ∅ ∅ d ∅ ∅ ∅ ∅ e f ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ g ∅ ∅ ∅ h ∅ ∅ ∅ i

17 / 34

slide-37
SLIDE 37

17/34

From a sparse completion to a tree

a ∅ b c ∅ ∅ ∅ ∅ ∅ ∅ ∅ d ∅ ∅ ∅ ∅ e f ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ g ∅ ∅ ∅ h ∅ ∅ ∅ i

17 / 34

slide-38
SLIDE 38

17/34

From a sparse completion to a tree

a ∅ b c ∅ ∅ ∅ ∅ ∅ ∅ ∅ d ∅ ∅ ∅ ∅ e f ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ g ∅ ∅ ∅ h ∅ ∅ ∅ i

17 / 34

slide-39
SLIDE 39

17/34

From a sparse completion to a tree

a ∅ b c ∅ ∅ ∅ ∅ ∅ ∅ ∅ d ∅ ∅ ∅ ∅ e f ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ g ∅ ∅ ∅ h ∅ ∅ ∅ i a b c d e f g h i

17 / 34

slide-40
SLIDE 40

18/34

Algorithm PG

From a partition system to a labeled graph (PG) Input: A partition system C = {{ π1, . . . , πn }} on X.

1

Let Y = π1 ∧ · · · ∧ πn.

2

For each i ∈ [n], let Ti be a forest satisfying that V(Ti) = Y and that a subset Y ′ ⊆ Y is a connected component of Ti if and only if

A∈Y ′ A is a state of the partition

π1 ∧ · · · ∧ πi−1 ∧ πi+1 ∧ · · · ∧ πn.

3

Let ¯ T be the graph such that V(¯ T) = Y and E(¯ T) = ∪i∈[n] E(Ti).

4

Add the minimum number of new edges to get a connected graph T such that V(T) = V(¯ T) and E(T) ⊇ E(¯ T). Output: The graph T and the surjective map ℓT from X to V(T) sending every element in a state A of π1 ∧ · · · ∧ πn to the vertex A of T.

18 / 34

slide-41
SLIDE 41

19/34

Minimum displaying tree

Theorem (2019+ Wu-X)

Let X be a finite set, and let C be a partition system on X of size n.

1

If n = 2 and C is compatible, then an X-tree T displaying C is minimum if and only if T is a possible output of PG applied to C.

2

If C has the properties Q2 and Qn, then an X-tree T displaying C is minimum if and only if T is a possible output of PG applied to C.

19 / 34

slide-42
SLIDE 42

20/34

Buneman graph

The Buneman graph of a partition system C on X is the graph B(C) satisfying V

  • B(C)
  • = {α ∈ (2X)C : α(π) ∈ π, α(π) ∩ α(ψ) = ∅ for all π, ψ ∈ C}

and E

  • B(C)
  • =
  • {α, β} : | {{ π ∈ C : α(π) = β(π) }} | = 1
  • .

There is a canonical labelling map φC from X to V

  • B(C)
  • that sends

x ∈ X to φC(x) = α ∈ (2X)C such that α(π) is the part of π that contains x for all π ∈ C.

20 / 34

slide-43
SLIDE 43

21/34

Buneman graph continued

Let C = {{ abcd|ef|ghi, ae|f|b|cdghi, abcef|g|dh|i }}.

a ∅ b c ∅ ∅ ∅ ∅ ∅ ∅ ∅ d ∅ ∅ ∅ ∅ e f ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ g ∅ ∅ ∅ h ∅ ∅ ∅ i

The partition system C

a b c d e f g h i

The Buneman graph B(C)

21 / 34

slide-44
SLIDE 44

21/34

Buneman graph continued

Let C = {{ abcd|ef|ghi, ae|f|b|cdghi, abcef|g|dh|i }}.

a ∅ b c ∅ ∅ ∅ ∅ ∅ ∅ ∅ d ∅ ∅ ∅ ∅ e f ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ g ∅ ∅ ∅ h ∅ ∅ ∅ i

The partition system C

a b c d e f g h i

The Buneman graph B(C)

21 / 34

slide-45
SLIDE 45

22/34

Theorem (2019+ Wu-X)

Let X be a finite set, and let C be a partition system on X of size n having the properties Q2 and Qn.

1

The Buneman graph B(C) is a connected graph.

2

The set of all possible outputs of PG applied to C consists of (T, φC) where T runs through the set of all spanning trees of B(C).

22 / 34

slide-46
SLIDE 46

23/34

Problem 1: Check the property Qk

Check the property Q2 in polynomial time   1 1 1 1 1 1       1 1 1 1 1 1 1    

23 / 34

slide-47
SLIDE 47

23/34

Problem 1: Check the property Qk

Check the property Q2 in polynomial time   1 1 1 1       1 1 1 1 1 1 1    

23 / 34

slide-48
SLIDE 48

23/34

Problem 1: Check the property Qk

Check the property Q2 in polynomial time   1 1 1 1       1 1 1 1 1 1 1    

23 / 34

slide-49
SLIDE 49

23/34

Problem 1: Check the property Qk

Check the property Q2 in polynomial time   1 1 1 1 1 1   it does not have the property Q2     1 1 1 1 1 1 1    

23 / 34

slide-50
SLIDE 50

23/34

Problem 1: Check the property Qk

Check the property Q2 in polynomial time   1 1 1 1 1 1   it does not have the property Q2     1 1 1 1 1    

23 / 34

slide-51
SLIDE 51

23/34

Problem 1: Check the property Qk

Check the property Q2 in polynomial time   1 1 1 1 1 1   it does not have the property Q2     1 1    

23 / 34

slide-52
SLIDE 52

23/34

Problem 1: Check the property Qk

Check the property Q2 in polynomial time   1 1 1 1 1 1   it does not have the property Q2        

23 / 34

slide-53
SLIDE 53

23/34

Problem 1: Check the property Qk

Check the property Q2 in polynomial time   1 1 1 1 1 1   it does not have the property Q2     1 1 1 1 1 1 1     it has the property Q2

23 / 34

slide-54
SLIDE 54

24/34

Check the property Qk, continued

What is the complexity of checking Qk for integers k ≥ 3? Is there a fast algorithm to check whether or not a given (0, 1) array has a sparse completion?

24 / 34

slide-55
SLIDE 55

25/34

Let M be an n dim’l (0, 1) array of size a1 × · · · × an. Denote by s the number of 1-entries of M and let r := max{a1, . . . , an}. Is there an algorithm to determine if M has a sparse completion which runs in the time better than O(r n+1nn+1 + sn2)? If yes, it will be a better algorithm for solving the perfect phylogeny problem.

25 / 34

slide-56
SLIDE 56

26/34

Problem 2: Submodularity

Let X = [7] and let π1 = 123|456|7, π2 = 12346|5|7, π3 = 1256|34|7. π1 ∧ π2 = 123|46|5|7 π1 ∧ π3 = 12|3|4|56|7 π2 ∧ π3 = 126|34|5|7 π1 ∧ π2 ∧ π3 = 12|3|4|5|6|7 Mπ1,π3 =   1 1 1 1 1   NO Q2 9 = 6 + 3 = |π1 ∧ π2 ∧ π3| + |π2| > |π1 ∧ π2| + |π2 ∧ π3| = 4 + 4 = 8. NO submodularity

26 / 34

slide-57
SLIDE 57

27/34

Submodularity, continued

Let X = [7] and let π1 = 123|45|67, π2 = 12|7|3456, π3 = 13|2|56|4|7. π1 ∧ π2 = 12|3|45|6|7 π1 ∧ π3 = 13|2|4|5|6|7 π2 ∧ π3 = 1|2|3|4|56|7 π1 ∧ π2 ∧ π3 = 1|2|3|4|5|6|7 Q2 10 = 7 + 3 = |π1 ∧ π2 ∧ π3| + |π2| ≤ |π1 ∧ π2| + |π2 ∧ π3| = 5 + 6 = 11. Q2 ⇒ Submodularity (Proved by Yinfeng Zhu). Can we characterize the submodular function f?

27 / 34

slide-58
SLIDE 58

28/34

Problem 3: From tree to partition system

a b c d e f

        a b c d e f a 1 b 1 1 1 c 1 d 1 e 1 1 1 f 1        

28 / 34

slide-59
SLIDE 59

28/34

Problem 3: From tree to partition system

a b c d e f

        a b c d e f a ab b ba bc be c cb d de e eb ed ef f fe        

28 / 34

slide-60
SLIDE 60

29/34

From tree to partition system, Contd.

        a b c d e f a ab b ba bc be c cb d de e eb ed ef f fe        

ab eb cb ef ed fe de be bc ba

29 / 34

slide-61
SLIDE 61

30/34

From tree to partition system, Contd.

What is the relationship between the topology of these displaying trees and the original tree?

30 / 34

slide-62
SLIDE 62

31/34

Problem 4: Other sparsity measures

We may consider the partition system which can be displayed on tree-like graphs, say a split network.

31 / 34

slide-63
SLIDE 63

32/34

Split network

Let X = {x, y, z, p, q}. C = {{ xy|q|zp, xq|y|zp, x|yz|pq }} is a partition system on X. Note that C has the sparse property Q2, but there is no sparse completion for it. Indeed, C does not have a perfect phylogeny, but it can be displayed on the split network depicted below.

z p y q x

32 / 34

slide-64
SLIDE 64

33/34

Other partition systems

Let C be a family of fuzzy partitions or Tverberg partitions or the partitions from a linear lattice. How to measure the tree-likeness of C? We can also consider the sparsity properties of those k-compatible or weakly compatible partition systems. It is expected that in this way some more general natural sparsity measures can be discovered.

33 / 34

slide-65
SLIDE 65

34/34

Thank you very much!

34 / 34