hybridization networks using a SAT-solver Vladimir Ulyantsev and - - PowerPoint PPT Presentation

hybridization networks using a
SMART_READER_LITE
LIVE PREVIEW

hybridization networks using a SAT-solver Vladimir Ulyantsev and - - PowerPoint PPT Presentation

Constructing parsimonious hybridization networks using a SAT-solver Vladimir Ulyantsev and Mikhail Melnik, presented by Alexey Sergushichev AlCoB 2015, Mexico Phylogenetic tree Binary tree with set of taxa as leaves Can be defined for a


slide-1
SLIDE 1

Constructing parsimonious hybridization networks using a SAT-solver

Vladimir Ulyantsev and Mikhail Melnik, presented by Alexey Sergushichev

AlCoB 2015, Mexico

slide-2
SLIDE 2

Phylogenetic tree

  • Binary tree with set of taxa as leaves
  • Can be defined for a particular gene

2

slide-3
SLIDE 3

Hybridization network

  • Directed acyclic graph with

a single root

  • Reticulation nodes:

in-degree=2, out-degree=1

  • Regular nodes:

in-degree=1, out-degree=2

  • Leaves: taxa

3

slide-4
SLIDE 4

Displaying a tree

  • Select direction a reticulation nodes
  • Collapse simple paths

4

slide-5
SLIDE 5

Hybridization network problem

5

slide-6
SLIDE 6
  • Find a hybridization network for a set of

phylogenetic trees T1, T2, .. Tt with the minimal number of reticulation nodes

  • Is NP-complete even for t=2

Most parsimonious network

6

slide-7
SLIDE 7

For two trees:

  • CASS (heuristic)
  • MURPAR (heurisic)

For multiple trees:

  • PIRNCH (heuristic)
  • PIRNC (exact)

Existing solutions

7

slide-8
SLIDE 8
  • Fix hybridization number k
  • Make Boolean formula f so that f ∈ SAT iff

there is a hybridization network for k

  • Check satisfiability with a SAT-solver
  • Find minimal k with satisfiable formula
  • Restore the network

Reduction to SAT

8

slide-9
SLIDE 9
  • Boolean formula f in CNF form:

𝑔(𝑤1, 𝑤2, … ) = 𝑤1 ∨ ¬𝑤2 ∨ . . . ∧ … ∧ . . .

  • Whether values for 𝑤1, 𝑤2, … exist that makes f

true

  • Can be seen as conjunction of multiple

constraints

  • Constraints can be of the form

𝑤1 ∧ ¬𝑤2 ∧ . . . → 𝑤3

SAT

9

slide-10
SLIDE 10
  • 2n+ 2k - 1 nodes

– [1, n] — leaves (L) – [n+1, 2n + k - 1] — regular nodes (V) – [2n+k, 2n+2k-1] — reticulation nodes (R)

Network structure

10

slide-11
SLIDE 11
  • 𝑚𝑤,𝑣 and 𝑠

𝑤,𝑣 — u is a left (right) child of v for v

in V

  • 𝑞𝑤,𝑣— u is parent of v for v in L+V
  • 𝑞𝑚, 𝑞𝑠 and c — parent child relations for

reticulation nodes

  • 𝑃(𝑜2) variables

Network structure variables

11

slide-12
SLIDE 12
  • Nodes have only one left child, right child,

parent

  • u is child of v → v is parent of u
  • u is parent of v → v is left of right child of u
  • 𝑃(𝑜3) constraints

Network consistency constraints

12

slide-13
SLIDE 13

Network consistency constraints: Actual clauses

13

slide-14
SLIDE 14
  • For a tree T
  • Choice of a parent for reticulation nodes
  • Variables for correspondence between

network and tree nodes

  • Collapsing non-branching paths

– Whether particular nodes were removed or not – Parent relations after collapsing

  • 𝑃(𝑢𝑜2) variables

Displaying structure

14

slide-15
SLIDE 15
  • All T nodes are uniquely mapped to network

nodes

  • Parent relations in the tree uniquely

correspond to the network structure after selecting directions at reticulation points and collapsing paths

  • Parent relations in the network are consistent
  • 𝑃(𝑢𝑜3) constraints

Displaying consistency constraints

15

slide-16
SLIDE 16

Displaying consistency constraints: Actual clauses (1)

16

slide-17
SLIDE 17

Displaying consistency constraints: Actual clauses (2)

17

slide-18
SLIDE 18

All clauses

18

slide-19
SLIDE 19

Additional optimizations

  • Splitting into independent problems
  • Symmetry breaking

19

slide-20
SLIDE 20
  • 57 grasses dataset by Group G.P.W. et al
  • CryptoMiniSAT solver
  • 1000 s time limit
  • Comparison with PIRNs

Experiments

20

slide-21
SLIDE 21
  • 57 grasses datasets by Group G.P.W. et al

Grass Phylogeny Working Group

  • CryptoMiniSAT solver
  • 1000s time limit
  • Comparison with PIRNs

Experiments

21

slide-22
SLIDE 22

Results

  • Exact solution (out of 57)

– PhyloSAT: 36 – PIRNC: 29

  • Non-exact

– PhyloSAT: 48 (40 optimal) – PIRNCH: 43 (36 optimal)

22

slide-23
SLIDE 23

Results for k >= 6

hybridization number (time in seconds)

23

slide-24
SLIDE 24
  • Different SAT-solvers
  • Improving reduction
  • Using upper and lower bounds on k
  • Searching for all minimal solutions

Future work

24

slide-25
SLIDE 25

Conclusions

  • Constructing parsimonious hybridization

networks can be approached with reducing to SAT

  • This approach outperforms known exact

solver and compares well with heuristic solver

  • Solving bigger instances is still challenging

25

slide-26
SLIDE 26

The End

https://github.com/ctlab/PhyloSAT Vladimir Ulyantsev (ulyntsev@rain.ifmo.ru)

26