BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome - - PowerPoint PPT Presentation

bayehem bayesian optimisation of genome assembly
SMART_READER_LITE
LIVE PREVIEW

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome - - PowerPoint PPT Presentation

DCSI 2018 Finlay Maguire Beiko Lab, FCS, Dalhousie University BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation 3. BayeHem 4. Conclusion 1 Table of contents Genome Assembly 2


slide-1
SLIDE 1

BayeHem: Bayesian Optimisation of Genome Assembly

DCSI 2018

Finlay Maguire

Beiko Lab, FCS, Dalhousie University

slide-2
SLIDE 2

Table of contents

  • 1. Genome Assembly
  • 2. Bayesian Optimisation
  • 3. BayeHem
  • 4. Conclusion

1

slide-3
SLIDE 3

Genome Assembly

slide-4
SLIDE 4

2nd Generation Genome Sequencing

https://www.abmgood.com/marketing/knowledge_base/next_generation_sequencing_data_analysis.php

2

slide-5
SLIDE 5

De Bruijn Graph Assembly

http://www.homolog.us/Tutorials/index.php?p=2.1&s=1

3

slide-6
SLIDE 6

Effect of K-mer Size: 51-mer

https://github.com/rrwick/Bandage/wiki/Effect-of-kmer-size

4

slide-7
SLIDE 7

Effect of K-mer Size: 61-mer

https://github.com/rrwick/Bandage/wiki/Effect-of-kmer-size

5

slide-8
SLIDE 8

Effect of K-mer Size: 71-mer

https://github.com/rrwick/Bandage/wiki/Effect-of-kmer-size

6

slide-9
SLIDE 9

Effect of K-mer Size: 81-mer

https://github.com/rrwick/Bandage/wiki/Effect-of-kmer-size

7

slide-10
SLIDE 10

Effect of K-mer Size: 91-mer

https://github.com/rrwick/Bandage/wiki/Effect-of-kmer-size

8

slide-11
SLIDE 11

Assessing Assemblies

[2]

9

slide-12
SLIDE 12

Bayesian Optimisation

slide-13
SLIDE 13

Gaussian Processes

  • Form of functional regression.
  • Powerful base for Sequential Model Based Optimisation [6].
  • Every draw is a multivariate Gaussian random variable.

f ∼ GP(0, K) K ∼ k(xi, xj) = exp(− 1 2d(xi/l, xj/l)2)

10

slide-14
SLIDE 14

Gaussian Process Prior

Visualisation code modified from http://katbailey.github.io/post/gaussian-processes-for-dummies

11

slide-15
SLIDE 15

Gaussian Process Prior

12

slide-16
SLIDE 16

Gaussian Process Prior

13

slide-17
SLIDE 17

Gaussian Process Prior

14

slide-18
SLIDE 18

Gaussian Process Posterior

15

slide-19
SLIDE 19

Gaussian Process Posterior

16

slide-20
SLIDE 20

Gaussian Process Posterior

17

slide-21
SLIDE 21

Acquistion Function

Adapted from code found here: https://github.com/fmfn/BayesianOptimization

18

slide-22
SLIDE 22

Acquistion Function

19

slide-23
SLIDE 23

Acquistion Function

20

slide-24
SLIDE 24

Acquistion Function

21

slide-25
SLIDE 25

Acquistion Function

22

slide-26
SLIDE 26

Acquistion Function

23

slide-27
SLIDE 27

Acquistion Function

24

slide-28
SLIDE 28

Acquistion Function

25

slide-29
SLIDE 29

BayeHem

slide-30
SLIDE 30

BayeHem

Trimmed Mycobacterium tuberculosis Reads Minia [1] Assembly Bowtie2 [4] SAM file CGAL [5] Assembly Likelihood GPyFlowOpt [3] Evaluate Acquisition Function Proposed Parameters Updated GP

26

slide-31
SLIDE 31

BayeHem Proves Very Efficient

27

slide-32
SLIDE 32

K Likelihood Surface

28

slide-33
SLIDE 33

Limitations and Future Work

  • Alternative GP covariance kernels
  • Tuning acquisition (and parametrisation)
  • Expand to other parameters in assembly pipelines
  • Potentially flawed objective function.
  • Multi-objective optimisation possible solution.

29

slide-34
SLIDE 34

Conclusion

slide-35
SLIDE 35

Summary

  • Proof of concept for effectiveness of BayeHem.
  • Assemblies are difficult to evaluate by a single metric.
  • Large scope for improvement and development of this approach.

30

slide-36
SLIDE 36

Questions?

30

slide-37
SLIDE 37

References i

  • R. Chikhi, G. Rizk, R. Idury, M. Waterman, M. Grabherr, Y. Peng,
  • H. Leung, S. Yiu, F. Chin, P. Peterlongo, N. Schnel, N. Pisanti,
  • M. Sagot, V. Lacroix, Z. Iqbal, M. Caccamo, I. Turner, P. Flicek,
  • G. McVean, G. Sacomoto, J. Kielbassa, R. Chikhi, R. Uricaru,
  • P. Antoniou, M. Sagot, P. Peterlongo, V. Lacroix, R. Li, H. Zhu,
  • J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen,
  • J. Simpson, K. Wong, S. Jackman, J. Schein, S. Jones, I. Birol,
  • T. Conway, A. Bromage, R. Warren, R. Holt, P. Peterlongo, R. Chikhi,
  • C. Ye, Z. Ma, C. Cannon, M. Pop, D. Yu, J. Pell, A. Hintze,
  • R. Canino-Koning, A. Howe, J. Tiedje, C. Brown, A. Kirsch,
  • M. Mitzenmacher, J. Miller, S. Koren, G. Sutton, R. Chikhi,
  • D. Lavenier, C. Kingsford, M. Schatz, M. Pop, G. Marçais,
  • C. Kingsford, G. Rizk, D. Lavenier, R. Chikhi, G. Rizk, D. Lavenier,
  • S. Salzberg, A. Phillippy, A. Zimin, D. Puiu, T. Magoc, S. Koren,
slide-38
SLIDE 38

References ii

  • T. Treangen, M. Schatz, A. Delcher, M. Roberts, G. Marçais, M. Pop,
  • J. Yorke, B. Chazelle, J. Kilian, R. Rubinfeld, A. Tal, A. Bowe,
  • T. Onodera, K. Sadakane, and T. Shibuya.

Space-efficient and exact de Bruijn graph representation based

  • n a Bloom filter.

Algorithms for Molecular Biology, 8(1):22, 2013.

  • M. Hunt, T. Kikuchi, M. Sanders, C. Newbold, M. Berriman, and T. D.

Otto. REAPR: A universal tool for genome assembly evaluation. Genome Biology, 14(5), 2013.

  • N. Knudde, J. van der Herten, T. Dhaene, and I. Couckuyt.

GPflowOpt: A Bayesian Optimization Library using TensorFlow. pages 0–1, 2017.

slide-39
SLIDE 39

References iii

  • B. Langmead and S. L. Salzberg.

Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4):357–9, apr 2012.

  • A. Rahman and L. Pachter.

CGAL: computing genome assembly likelihoods. Genome Biol, 14:R8, 2013.

  • J. Snoek, H. Larochelle, and R. P. Adams.

Practical Bayesian Optimization of Machine Learning Algorithms. In Advances in Neural Information Processing Systems, volume 25, pages 2951–2959, 2012.