Segment-based Multiple Sequence Alignment T . Rausch, A.-K. Emde, - - PowerPoint PPT Presentation

segment based multiple sequence alignment
SMART_READER_LITE
LIVE PREVIEW

Segment-based Multiple Sequence Alignment T . Rausch, A.-K. Emde, - - PowerPoint PPT Presentation

Knut Reinert May 2009 Segment-based Multiple Sequence Alignment T . Rausch, A.-K. Emde, D. Weese, A. Dring, C. Notredame and K. Reinert Knut Reinert 25.5.2009 (based on slides from Tobias Rausch) Algorithmische Bioinformatik Monday, May


slide-1
SLIDE 1

Algorithmische Bioinformatik

Knut Reinert May 2009 T . Rausch, A.-K. Emde, D. Weese, A. Döring, C. Notredame and K. Reinert

Segment-based Multiple Sequence Alignment

Knut Reinert 25.5.2009 (based on slides from Tobias Rausch)

Monday, May 25,

slide-2
SLIDE 2

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Agenda

  • Alignment Graph
  • Multiple Sequence Alignment Algorithm
  • Implementation and Results

Monday, May 25,

slide-3
SLIDE 3

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment-based Alignment Graph

  • Alignment Matrix
  • Alignment Graph

Monday, May 25,

slide-4
SLIDE 4

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Applications

  • Protein Alignment

Monday, May 25,

slide-5
SLIDE 5

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Applications

  • Protein Alignment
  • Genome Comparison

Monday, May 25,

slide-6
SLIDE 6

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Applications

  • Protein Alignment
  • Genome Comparison
  • Multi-Read Alignment

Monday, May 25,

slide-7
SLIDE 7

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Methods

Alignment Algorithm

Monday, May 25,

slide-8
SLIDE 8

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Components of the Algorithm

Monday, May 25,

slide-9
SLIDE 9

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Components of the Algorithm

Monday, May 25,

slide-10
SLIDE 10

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment-Match Generation

  • Global alignments [NW70, Got82]

Monday, May 25,

slide-11
SLIDE 11

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment-Match Generation

  • Global alignments [NW70, Got82]
  • Local alignments [SW81, WE87]

Monday, May 25,

slide-12
SLIDE 12

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment-Match Generation

  • Global alignments [NW70, Got82]
  • Local alignments [SW81, WE87]
  • Variants: Overlap, Banded

Monday, May 25,

slide-13
SLIDE 13

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment-Match Generation

  • Global alignments [NW70, Got82]
  • Local alignments [SW81, WE87]
  • Variants

– Overlap alignments – Banded alignments

  • Others:

– Longest-common subsequence [JV92]

S0:XMJYAUZ S1:MZJAWXUE

Monday, May 25,

slide-14
SLIDE 14

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment-Match Generation

  • Global alignments [NW70, Got82]
  • Local alignments [SW81, WE87]
  • Variants

– Overlap alignments – Banded alignments

  • Others:

– Longest-common subsequence [JV92]

S0:XMJYAUZ S1:MZJAWXUE

Monday, May 25,

slide-15
SLIDE 15

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment-Match Generation

  • Global alignments [NW70, Got82]
  • Local alignments [SW81, WE87]
  • Variants

– Overlap alignments – Banded alignments

  • Others:

– Longest-common subsequence [JV92] – External segment-matches

  • MUMmer, BLAST hits
  • External alignments

Monday, May 25,

slide-16
SLIDE 16

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Collect all Segment Matches IPPQFD IPEYVD VTP VKP YPQC WRQK

IPPQFDFRDEYPQC--VKP IPEYVD----WRQKGAVTP

Monday, May 25,

slide-17
SLIDE 17

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment Match Refinement

Monday, May 25,

slide-18
SLIDE 18

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment Match Refinement

Monday, May 25,

slide-19
SLIDE 19

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Segment Match Refinement

[HHR02, REW+08]

Monday, May 25,

slide-20
SLIDE 20

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Alignment Graph Construction

Monday, May 25,

slide-21
SLIDE 21

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Alignment Graph Construction

  • Only a subset of all the edges constitutes a

valid alignment

Monday, May 25,

slide-22
SLIDE 22

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Alignment Graph Construction

  • Only a subset of all the edges constitutes a

valid alignment

  • Select the alignment edges (trace edges) of

maximum weight = Maximal Trace [SK83,Kec93]

Monday, May 25,

slide-23
SLIDE 23

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Consistency Extension

GARFIELDTHELASTFA-TCAT GARFIELDTHE----FASTCAT

  • -------THE----FA-TCAT

[NHH00]

Monday, May 25,

slide-24
SLIDE 24

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Consistency Extension

GARFIELDTHELASTFATCAT GARFIELDTHEFAS---TCAT

  • -------THEFA----TCAT

Monday, May 25,

slide-25
SLIDE 25

Algorithmische Bioinformatik

Tobias Rausch, September 2008

LASTFATCAT FAS---TCAT FA----TCAT

Consistency Extension

GARFIELDTHE GARFIELDTHE

  • -------THE

Monday, May 25,

slide-26
SLIDE 26

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Consistency Extension

GARFIELDTHELASTFA GARFIELDTHEFAS---

  • -------THEFA----

TCAT TCAT TCAT

Monday, May 25,

slide-27
SLIDE 27

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Consistency Extension

GARFIELDTHE GARFIELDTHE

  • -------THE

TCAT TCAT TCAT LASTFA FAS--- FA----

Monday, May 25,

slide-28
SLIDE 28

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Consistency Extension

  • Increase the weight of clique edges

[NHH00]

Monday, May 25,

slide-29
SLIDE 29

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Consistency Extension

GARFIELDTHELASTFA-TCAT GARFIELDTHE----FASTCAT

  • -------THE----FA-TCAT

Monday, May 25,

slide-30
SLIDE 30

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Consistency Extension

GARFIELDTHE GARFIELDTHE

  • -------THE

TCAT TCAT TCAT LASTFA-

  • ---FAS
  • ---FA-

Monday, May 25,

slide-31
SLIDE 31

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Distance Matrix

Monday, May 25,

slide-32
SLIDE 32

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Guide Tree

Monday, May 25,

slide-33
SLIDE 33

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Graph-based Progressive Alignment

  • Progressive alignment

– Aligns strings / profiles of vertices – Heaviest common subsequence algorithm [JV92]

Monday, May 25,

slide-34
SLIDE 34

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Graph-based Progressive Alignment

Monday, May 25,

slide-35
SLIDE 35

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Configurable

Monday, May 25,

slide-36
SLIDE 36

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Implementation and Results

Implementation and Results

Monday, May 25,

slide-37
SLIDE 37

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Results

Protein Alignment DNA and Genome Alignment Multi-Read Alignment Deep Alignment

Monday, May 25,

slide-38
SLIDE 38

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Results

  • BAliBASE 3.0 (Protein Benchmark)
  • Alignment of 6 adenoviruses (DNA)
  • More detailed results are in the paper
  • Avg. Identity

CPU time (s) DIALIGN-T 48% 1259 MAFFT 62% 118 MUSCLE 38% 673 Our Tool 65% 328 RV11 RV12 RV20 RV30 RV40 RV50 CPU time (s) M-Coffee 42.74 85.86 44.78 56.1 55.8 54.69 27,730 Our Tool 46.89 86.16 46.56 58.9 62.39 58.94 12,455

Monday, May 25,

slide-39
SLIDE 39

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Part of SeqAn

  • Extendable

– Add your own algorithm

www.seqan.de

Monday, May 25,

slide-40
SLIDE 40

Algorithmische Bioinformatik

Tobias Rausch, September 2008

Thank You for Your Attention!

Visit: www.seqan.de/projects/msa.html

Andreas Anne-Katrin Cedric David Knut Tobias

Monday, May 25,

slide-41
SLIDE 41

Algorithmische Bioinformatik

Tobias Rausch, September 2008

References

[Got82] O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. Biol., 162(3): 705–708, Dec 1982. [HHR02] A. L. Halpern, D. H. Huson, and K. Reinert. Segment match refinement and applications. In WABI ’02: Proceedings of the Second International Workshop on Algorithms in Bioinformatics, pages 126–139, London, UK, 2002. Springer-Verlag. [JV92] G. Jacobson and K.-P . Vo. Heaviest increasing/common subsequence problems. In CPM ’92: Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching, pages 52–66, London, UK, 1992. Springer-Verlag. [Kec93]J. D. Kececioglu. The maximum weight trace problem in multiple sequence alignment. In CPM ’93: Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching, pages 106–119, London, UK, 1993. Springer-Verlag. [NHH00]C. Notredame, D.G. Higgins, and J. Heringa. TCoffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology., 302:205–217, 2000. [NW70]S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molecular Biol., 48:443–453, 1970. [REW+08]Tobias Rausch, Anne-Katrin Emde, David Weese, Andreas Doring, Cedric Notredame, and Knut Reinert. Segment-based multiple sequence alignment. Bioinformatics, 24(16):i187–192, 2008. [SK83]D. Sankoff and J. B. Kruskal. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA, 1983. [SW81]T . F . Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195–197, 1981. [WE87]M. S. Waterman and M. Eggert. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. Journal of Molecular Biology, 197(4):723–728, 1987. Monday, May 25,