hybridization networks using a
play

hybridization networks using a SAT-solver Vladimir Ulyantsev and - PowerPoint PPT Presentation

Constructing parsimonious hybridization networks using a SAT-solver Vladimir Ulyantsev and Mikhail Melnik, presented by Alexey Sergushichev AlCoB 2015, Mexico Phylogenetic tree Binary tree with set of taxa as leaves Can be defined for a


  1. Constructing parsimonious hybridization networks using a SAT-solver Vladimir Ulyantsev and Mikhail Melnik, presented by Alexey Sergushichev AlCoB 2015, Mexico

  2. Phylogenetic tree • Binary tree with set of taxa as leaves • Can be defined for a particular gene 2

  3. Hybridization network • Directed acyclic graph with a single root • Reticulation nodes: in-degree=2, out-degree=1 • Regular nodes: in-degree=1, out-degree=2 • Leaves: taxa 3

  4. Displaying a tree • Select direction a reticulation nodes • Collapse simple paths 4

  5. Hybridization network problem 5

  6. Most parsimonious network • Find a hybridization network for a set of phylogenetic trees T 1 , T 2 , .. T t with the minimal number of reticulation nodes • Is NP-complete even for t =2 6

  7. Existing solutions For two trees: • CASS (heuristic) • MURPAR (heurisic) For multiple trees: • PIRN CH (heuristic) • PIRN C (exact) 7

  8. Reduction to SAT • Fix hybridization number k • Make Boolean formula f so that f ∈ SAT iff there is a hybridization network for k • Check satisfiability with a SAT-solver • Find minimal k with satisfiable formula • Restore the network 8

  9. SAT • Boolean formula f in CNF form: 𝑔(𝑤 1 , 𝑤 2 , … ) = 𝑤 1 ∨ ¬𝑤 2 ∨ . . . ∧ … ∧ . . . • Whether values for 𝑤 1 , 𝑤 2 , … exist that makes f true • Can be seen as conjunction of multiple constraints • Constraints can be of the form 𝑤 1 ∧ ¬𝑤 2 ∧ . . . → 𝑤 3 9

  10. Network structure • 2n+ 2k - 1 nodes – [1, n] — leaves (L) – [n+1, 2n + k - 1] — regular nodes (V) – [2n+k, 2n+2k-1] — reticulation nodes (R) 10

  11. Network structure variables • 𝑚 𝑤,𝑣 and 𝑠 𝑤,𝑣 — u is a left (right) child of v for v in V • 𝑞 𝑤,𝑣 — u is parent of v for v in L + V • 𝑞 𝑚 , 𝑞 𝑠 and c — parent child relations for reticulation nodes • 𝑃(𝑜 2 ) variables 11

  12. Network consistency constraints • Nodes have only one left child, right child, parent • u is child of v → v is parent of u • u is parent of v → v is left of right child of u • 𝑃(𝑜 3 ) constraints 12

  13. Network consistency constraints: Actual clauses 13

  14. Displaying structure • For a tree T • Choice of a parent for reticulation nodes • Variables for correspondence between network and tree nodes • Collapsing non-branching paths – Whether particular nodes were removed or not – Parent relations after collapsing • 𝑃(𝑢𝑜 2 ) variables 14

  15. Displaying consistency constraints • All T nodes are uniquely mapped to network nodes • Parent relations in the tree uniquely correspond to the network structure after selecting directions at reticulation points and collapsing paths • Parent relations in the network are consistent • 𝑃(𝑢𝑜 3 ) constraints 15

  16. Displaying consistency constraints: Actual clauses (1) 16

  17. Displaying consistency constraints: Actual clauses (2) 17

  18. All clauses 18

  19. Additional optimizations • Splitting into independent problems • Symmetry breaking 19

  20. Experiments • 57 grasses dataset by Group G.P.W. et al • CryptoMiniSAT solver • 1000 s time limit • Comparison with PIRNs 20

  21. Experiments • 57 grasses datasets by Group G.P.W. et al Grass Phylogeny Working Group • CryptoMiniSAT solver • 1000s time limit • Comparison with PIRNs 21

  22. Results • Exact solution (out of 57) – PhyloSAT: 36 – PIRN C : 29 • Non-exact – PhyloSAT: 48 (40 optimal) – PIRN CH : 43 (36 optimal) 22

  23. Results for k >= 6 23 hybridization number (time in seconds)

  24. Future work • Different SAT-solvers • Improving reduction • Using upper and lower bounds on k • Searching for all minimal solutions 24

  25. Conclusions • Constructing parsimonious hybridization networks can be approached with reducing to SAT • This approach outperforms known exact solver and compares well with heuristic solver • Solving bigger instances is still challenging 25

  26. The End https://github.com/ctlab/PhyloSAT Vladimir Ulyantsev (ulyntsev@rain.ifmo.ru) 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend