RNA structure alignment by a unit-vector approach Emidio Capriotti - - PowerPoint PPT Presentation

rna structure alignment by a unit vector approach
SMART_READER_LITE
LIVE PREVIEW

RNA structure alignment by a unit-vector approach Emidio Capriotti - - PowerPoint PPT Presentation

RNA structure alignment by a unit-vector approach Emidio Capriotti Marc A. Marti-Renom http://sgu.bioinfo.cipf.es Structural Genomics Unit ECCB08 Bioinformatics Department Cagliari (Italy) Prince Felipe Resarch Center (CIPF), Valencia,


slide-1
SLIDE 1

Emidio Capriotti Marc A. Marti-Renom

http://sgu.bioinfo.cipf.es

Structural Genomics Unit Bioinformatics Department Prince Felipe Resarch Center (CIPF), Valencia, Spain

RNA structure alignment by a unit-vector approach

ECCB08

Cagliari (Italy) 22-26 September 2008

slide-2
SLIDE 2

RNA structure

2

http://www.pdb.org

− All − X-ray − NMR

The PDB database contains ~1,500 RNA structures.

slide-3
SLIDE 3

RNA structure datasets

3

RNA STRUCTURE* 1,101 RNA CHAINS 2,179 Non-Redundant RNA CHAINS** 708 RNA CHAINS (20≤ Length ≤310) 277 SCOR SET*** 60 HIGH RESOLUTION RNA SET**** 51

* from PDB November 06. ** non-redundant 95% sequence identity *** SCOR functions with at least two chains **** resolution below 4.0 Å and with no missing backbone atoms.

NR95 HR SCOR

slide-4
SLIDE 4

Dataset distribution

4

tRNA 20 of >1,000n 407 of <20n

slide-5
SLIDE 5

Unit Vector

5

i i i+1 i+2 i+1 i+2 i+3

Ortiz et al. Proteins 2002

10 7 5 7 10 4 5 4 10

slide-6
SLIDE 6

Atom selection

6

The best backbone atom that represents the RNA structure has been selected by evaluating the distribution of the distances between consecutive atoms in structures from the NR95 set.

slide-7
SLIDE 7

Background distribution

7

Considering a dataset of 300 random RNA structures, we have produced ~45,000 pairwise alignments that resulted in a empirical distribution. From such distribution we can then evaluate μ and σ needed to calculated the p-value for P(s≥x). Empirical Analytic

Karlin and Altschul PNAS 1990

P(s ≥ x) = 1− exp(−e−λ(s−µ))

slide-8
SLIDE 8

Random RNA

8 50 100 150 200 250 300 N (Length of the shorter RNA structure) 10 20 30 40 50 µ=763*N-0.896 =180* N-1.010

μ and σ

Murray et al PNAS 2003

The RNA backbone can be described given the 6 torsion angle (α,β,γ,δ,ε,ζ) for each nucleotide. The RNA backbone is rotameric and only 42 conformation have been described from a set o high resolution structures . We divided the resulting structural alignments (∼45,000) in 30 bins according to the minimum sequence length of the two random structures (N). For each bin the μ and σ values are evaluated fitting the data to an EVD. The relations between N and μ, σ values are extrapolate fitting them to a power low function (r≈0.99).

slide-9
SLIDE 9

Optimization

9

The accuracy of SARA method depends of a large number of parameters.

  • C3ʼ and P backbone atoms for the unit vectors evaluation,
  • k number of consecutive unit vectors, spamming from 3 to 9 and,
  • values of gap opening from -9 to 0 and gap extension for -0.8 to 0
  • Secondary structure information

Gap opening Gap extension k Secondary structure

  • 7.0
  • 0.6

3 No secondary structure

  • 8.0
  • 0.2

7

slide-10
SLIDE 10

PSI distribution

10

tRNA

all-against-all comparison of structures in the NR95 set

slide-11
SLIDE 11

Statistical significance

11

all-against-all comparison of structures in the NR95 set

slide-12
SLIDE 12

Comparison with ARTS

12

>1q96 Chain:A

  • -------------------gugcucaguaugaga-----aga-accgcacc--------

>1un6 Chain:E ccggccacaccuacggggccugguuaguaccugggaaaccugggaauaccaggugccggc

Percentage of structure identity (PSI) 76.9% Percentage of sequence identity 20.0% Percentage of SSE identity 79.2% RMSD 1.66Å

ARTS

>1q96 Chain:A

  • ------------------ggugcucaguaugag--------aagaaccgcacc-------

>1un6 Chain:E gccggccacaccuacggggccugguuaguaccugggaaaccugggaauaccaggugccggc

Percentage of structure identity (PSI) 92.6% Percentage of sequence identity 48.0% Percentage of SSE identity 100.0% RMSD 1.78 Å

SARA PSI: % of structure identity PSS: % of secondary structure identity Cut-off distance: 4.0 Å all-against-all comparison of structures in the HR set

slide-13
SLIDE 13

Function assignment

13

Rank of deepest SCOR function Rank of related SCOR function all-against-all comparison of structures in the SCOR set

slide-14
SLIDE 14

SARA server

14

http://sgu.bioinfo.cipf.es/services/SARA/

slide-15
SLIDE 15
  • LNE>5
  • LNE≤5

All against all alignments

15

A set of 829 RNA chain structures from PDB (Jan 08) has been selected to study the relationship between sequence and structure similarity.

N PSI N %ID

slide-16
SLIDE 16

Sequence similarity distribution

16

Using the subset of alignments with -LNE≤5 we evaluate the background distribution for the percentage of sequence identity (%ID)

N %ID N

− μ = 271.4∗N -0.8862 − σ = 114.7∗N -0.8591

μ and σ

  • LNE>5
  • LNE≤5
slide-17
SLIDE 17

RNA sequence and structure

17

The plot shows that tertiary structure is more conserved than sequence.

r=0.84

y = -0.013x2+2.24x+6.34

PSI %ID

slide-18
SLIDE 18

Conclusions and future directions

18

  • The SARA method is a good alternative to other RNA structure alignment

methods.

  • The statistics obtained using the alignments between random generated

structures have allowed to select high quality alignment.

  • The subset of alignments with log(p-value) ≤ 5 has been used to evaluate the

minimum level of sequence identity that corresponds to the conservation of the 3D structure.

  • The RNA tertiary structure is more conserved than sequence.
  • Develop new strategies to represent RNA secondary structure to improve the

quality of the alignments

  • A set of high quality alignments will be selected to derive the rules for the

prediction of new RNA structures relying on sequence-structure alignment information.

slide-19
SLIDE 19

Acknowledgments

FUNDING Prince Felipe Research Center Marie Curie Reintegration Grant STREP EU Grant Generalitat Valenciana MEC-BIO Structural Genomics Unit (CIPF) Marc A. Marti-Renom Davide Bau Emidio Capriotti

http://sgu.bioinfo.cipf.es

MAMMOTH ALGORITHM Angel Ortiz ARTS PROGRAM Oranit Dror Ruth Nussinov Haim J. Wolfson ECCB08 Travel Fellowship granted by BIOSAPIENS Network of Excellence

Ángel Ramirez Ortiz, June 30th 1966 - May 5th 2008