1
A Method for Aligning RNA Secondary Structures
Jason T. L. Wang New Jersey Institute of Technology
J Liu, JTL Wang, J Hu and B Tian, BMC Bioinformatics, 2005
A Method for Aligning RNA Secondary Structures Jason T. L. Wang - - PowerPoint PPT Presentation
A Method for Aligning RNA Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BMC Bioinformatics, 2005 1 Outline Introduction Structural alignment of RNA (preliminaries, RSmatch
1
J Liu, JTL Wang, J Hu and B Tian, BMC Bioinformatics, 2005
2
3
4
secondary/tertiary structure
involve two bases
– Watson-Crick: AU or CG – Non-canonical: UG or AG
5
A U C G G G A U C G C G G A U A U G A G G C G C A U A G C G G U 5’ 3’ [1] Zuker, M. (1989) Science
6
(a) GOOD (b) BAD A U C G G G A U C G C G G A U A U G A GG C G C A U A GC G G U 5’ 3’ A U C G G G A U C G C G G A U A U G A GG C G C A G A GC G G U 5’ 3’ C CG AC Prohibited! [2] Hofacker, I.L. (2003) NAR
7
(a) GOOD (b) BAD A U C G G G A U C G C G G A U A U G A GG C G C A U A GC G G U 5’ 3’ A U A U C G C G G A U A U G A GG C G C A U A GC G G U 5’ 3’
hairpin Prohibited! [3] Zuker, M. (1991) NAR
8
(a) BAD (b) GOOD (nested structure) (c) GOOD (branching)
A U C G G G A U C G C G G A U A U G A GG C G C A U A GC G G U 5’ 3’ A U C G G G A U C G C G G A U AU G A G G C G C A U A G C G G U 5’ 3’ GG U A U C G G G A U C G C G G A U A U G A G G C G C A U A G C G G U 5’ 3’ A A AGG C Prohibited! [4] Mathews, D.H. (1999) JMB
9
[5] Shapiro, B. (1990) CABIOS [6] Zhang, K. (1999) CPM [7] Ma, B. (2002) TCS [8] Hofacker, I.L. (2002) JMB
10
11
G U C G A A A U U A A U G GA U C G C G C G C G C U A U U U A A C G 5’ 3’ A G
circle 0 circle 1 circle 2 circle 3 circle 4 circle 5 circle 6 circle 7 circle 8
[9] Liu, J. (2005) BMC Bioinformatics
12
G U C G A A A U U A A U G GA U C G C G C G C G C U A U U U A A C G 5’ 3’ A G
circle 0 circle 1 circle 2 circle 3 circle 4 circle 5 circle 6 circle 7 circle 8
circle 0 circle 1 circle 2 circle 3 circle 4 circle 5 circle 6 circle 7 circle 8
13
(1) the same circle: e.g. each pair from G, C, G, A-U, G-C, G, A-U (2) descendant/ancestor circles: e.g. pair (G, A-U) (3) cousin circles: e.g. pairs (U, C), (A-U, G-C) and (U, G-C)
(1) (2) (3) GU CG A A A U U A A U G G A U C G C G C G C G C UA U U U A A C G 5’ 3’ A G GU CG A A A U U A A U G G A U C G C G C G C G C UA U U U A A C G 5’ 3’ A G circle GU CG A A A U U A A U G G A U C G C G C G C G C UA U U U A A C G 5’ 3’ A G
14
GU CG A A A U U A A U G G A U C G C G C G C G C UA U U U A A C G 5’ 3’ A G GU CG A A A U U A A U G G C G C G C G C G C UA U U U 5’ 3’ GU CG A A A U U A A U G G C G C C G C G C UA U U U 5’ 3’
10 30
parent structure child structure
15
16
RNA 1 RNA 2 (a) (b) (c)
(a) Same loop relationship preserved: A1 is in the same loop as A2 iff B1 is in the same loop as B2 (b) Ancestor/descendant relationship preserved: A1 is ancestor of A2 iff B1 is ancestor of B2 (c) Cousin relationship preserved: A1 is cousin of A2 iff B1 is cousin of B2
17
rules must be satisfied for a valid alignment
can not be aligned with a base pair
GU CG A A A U U A G C A G C G C G C G C G C G C UA U U U A A U G 3’ A U 5’ G C U CU A U U A U A A GC G G C G A U G C U A U U U A A U 3’ U A GC 5’ First RNA Second RNA Alignment Result
..((...(((......)))((.(.....))).)).. GUACGCAGUAAGUCGAUACGCCGUAUUUCGCGGUAA ..((..((......))(((.......))).)).. GUUCGAUUUCUCUAAAGAGUAGCUUUCUCGGAAA
..((...(((......)))((.(.. ...))).)).. GUACGCAGUAAGUCGAUACGCCGUA—-UUUCGCGGUAA || || | || | | | ||| |||| ||| || GUUCGA-UU-UCUCUA-AAGA-GUAGCUUUCUCGGAAA ..((.. (( ...... ))(( (.......))).))..
18
A UA C A UG U U 5’ 3’ A UC U CA U A U GA G C U A G G 5’ 3’
First structure Second structure DP scoring table
A-U A U C A U G U A U U C A U C A G G U A-U A G C-G The best alignment between partial structures
19
5’ 3’ 5’ 3’
20
5’ 3’ 3’ 5’
21
5’ 5’ 3’ 3’
22
3’ 3’ 5’ 5’
23
5’ 5’ 3’ 3’
24
= = =
, and pairs base are and both , 2 and bases single are and both , 1 ) , (
b a b a b a b a b a
C C C C if C C C C if C C g
) , ( ) , (
2 1
i i
b a i
C C g R R f
=
25
A U A C A U G U U 5’ 3’ C A UC U CA U A U GA G C U A G G 5’ 3’
..(.....).
UCAUACAGGUUA ....(.....).
(A)
A U A C A U G U U 5’ 3’ C 5’
(B)
A UC U CA U A U GA G C U A G G 3’
..(.....) .
UCAUACAGGUUA- ....(.....).
A U A C A U G U U 5’ 3’ C A UC U CA U A U GA G C U A G G 5’ 3’
(C)
..(.....).
UCAUACAGGUU-A ....(.....) .
A U A C A U G U U 5’ 3’ C A UC U CA U A U GA G C U A G G 5’ 3’
AUACAUGUUC ..(.....). UCAUACAGGUUA ....(.....).
?
26
A UC U CA U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C A UC U CA U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C A U A C A U G U U 5’ 3’ C A UC U CA U A U GA G C U A G G 5’ 3’
first score second score ? ? ?
27
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
ACAUGUU (.....) UCAUACAGGUUA ....(.....).
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
(.....)
UCAUACAGGUUA ....(.....).
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
( ..... ) A-----CAUGU--U
....(.....).
28
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
AUACAUGUU ..(.....) UCAUACAGGUUA ....(.....).
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
..(.....)
UCAUACAGGUUA ....(.....).
(A)
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
.. (.....) AU----ACAUGUU-
....(.....).
(B)
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
(C)
.. (.....)
UCAUACAGGUUA------- ....(.....).
29
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
AUACAUGUU ..(.....) UCAUACAGGUU ....(.....)
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
? (A) (B) (C) (b1) (b2)
?
30
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
(.....) ACAUGUU (.....) ACAGGUU
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
(.....) ACAUGUU ACAGGUU (.....)
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
( ..... ) A-CAUGU-U
(.....)
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
(.....)
A CAGGU U ( ..... )
(A) (B) (C)
31
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
( ..... ) A-----CAUGU-U
....(.....)
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
(.....)
UCAUACAGGUU ....(.....)
A UC U CA U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
(.....) ACAUGUU ....(.....) UCAUACAGGUU (.....) ACAUGUU-------
.... (.....)
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C A UC U CA U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
..(.....) AUACAUGUU (.....) ACAGGUU
32
A UC U C A U A U G A G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
?
..(.....) AUACAUGUU ....(.....) UCAUACAGGUU
A UC U C A U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
..(.....)
UCAUACAGGUU ....(.....)
A UC U C A U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C A UC U C A U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C A UC U C A U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C A UC U C A U A U GA G C U A G G 5’ 3’ A U A C A U G U U 5’ 3’ C
.. (.....) AU----ACAUGUU
....(.....)
(A) (B) (C)
..(.....)
UCAU--ACAGGUU .... (.....)
(D)
.. (.....)
UCAUACAGGUU------- ....(.....) ..( .....) AUA-CAUGUU-------
.... (.....)
33
2+4NsNp+4Np 2 =O(N2)
2 score calculations are contributed by two single bases
pair
2 score calculations are contributed by two base pairs
34
35
36
37
motif structure subject structure
38
39
– M: A, T/U; – Y: C, T/U; – H: not G; – R: A, G; – W: A, T;
(a) HSL3 (b) IRE
40
[10] Pesole G. (2000) Bioinformatics
41
50 100 150 200
ORF
AAAAAAAAAAAAA
50 100 150 200
ORF
ORF: Open Reading Frame
42
RSmatch, but not by PatSearch.
43
[11] Klein, R.J. (2003) BMC Bioinformatics [12] Holms, I. (2002) PSB
44
45
46
47
search small database seed alignment profile expand seed alignment
score (best alignment) < δ OR non-expandable
expand best alignment pairwise match
NO YES
48
expand expand
49
50
51
52
53
54
55
56