Part 2 Comparative Analysis of RNAs S.Will, 18.417, Fall 2011 - PowerPoint PPT Presentation

→ Part 2 Comparative Analysis of RNAs S.Will, 18.417, Fall 2011

→ Example Given: set of related RNA sequences >AF008220 GGAGGAUUAGCUCAGCUGGGAGAGCAUCUGCCUUACAAGCAGAGGGUCGGCGGUUCGAGCCCGUCAUCCUCCA >M68929 GCGGAUAUAACUUAGGGGUUAAAGUUGCAGAUUGUGGCUCUGAAAACACGGGUUCGAAUCCCGUUAUUCGCC >X02172 GCCUUUAUAGCUUAGUGGUAAAGCGAUAAACUGAAGAUUUAUUUACAUGUAGUUCGAUUCUCAUUAAGGGCA >Z11880 GCCUUCCUAGCUCAGUGGUAGAGCGCACGGCUUUUAACCGUGUGGUCGUGGGUUCGAUCCCCACGGAAGGCG >D10744 GGAAAAUUGAUCAUCGGCAAGAUAAGUUAUUUACUAAAUAAUAGGAUUUAAUAACCUGGUGAGUUCGAAUCUCACAUUUUCCG Wanted: learn about evolutionary relation AF008220 GGAGGAUU-AGCUCAGCUGGGAGAGCAUCUGCCUUACAAGC---------AGAGGGUCGGCGGUUCGAGCCCGUCAUCCUCCA M68929 GCGGAUAU-AACUUAGGGGUUAAAGUUGCAGAUUGUGGCUC---------UGAAAA-CACGGGUUCGAAUCCCGUUAUUCGCC X02172 GCCUUUAU-AGCUUAG-UGGUAAAGCGAUAAACUGAAGAUU---------UAUUUACAUGUAGUUCGAUUCUCAUUAAGGGCA Z11880 GCCUUCCU-AGCUCAG-UGGUAGAGCGCACGGCUUUUAACC---------GUGUGGUCGUGGGUUCGAUCCCCACGGAAGGCG D10744 GGAAAAUUGAUCAUCGGCAAGAUAAGUUAUUUACUAAAUAAUAGGAUUUAAUAACCUGGUGAGUUCGAAUCUCACAUUUUCCG consensus (((((((...((((........))))((((((.......)).........))))....(((((.......)))))))))))). Remarks S.Will, 18.417, Fall 2011 • Usually, we only know the sequences of RNAs. Why? • Important for evolution: sequence AND structure. Why?

→ Comparative RNA Analysis A: B: S.Will, 18.417, Fall 2011 A: adopted from: B: [Gardner & Giegerich BMC 2004] consensus: consensus structure:

→ Comparative RNA Analysis A: B: A: adopted from: B: [Gardner & Giegerich BMC 2004] consensus: consensus structure: Remarks S.Will, 18.417, Fall 2011 • Here, Comparative RNA Analysis refers to this problem: given a set of RNA sequences, how to match them (alignment) and what’s their common structure (consensus structure). • in general: multiple sequences, here: only pairwise

→ Comparative RNA Analysis A: B: S.Will, 18.417, Fall 2011 A: adopted from: B: [Gardner & Giegerich BMC 2004] consensus: consensus structure:

→ Comparative RNA Analysis Plan A A: B: ALIGN single sequences A: B: FOLD alignment consensus structure S.Will, 18.417, Fall 2011 A: B: consensus: consensus structure:

→ Comparative RNA Analysis Plan A A: B: ALIGN single sequences A: B: FOLD alignment consensus structure A: B: consensus: consensus structure: Remarks • first, simplest way. We will see two further plans. S.Will, 18.417, Fall 2011 • ALIGN: sequence alignment • FOLD: we will generalize prediction for single sequences

→ Sequence Alignment, a slightly new definition Example In: A =ACGTAA, B =ACCCT Out: AC-GTAA ACCCT-- “match/mismatch”, “insertion”, “deletion” Definition (Alignment (as set of alignment edges)) An alignment of two (RNA) sequences A and B , n = | A | , m = | B | , is a set A of alignment edges, where 1. for 1 ≤ i ≤ n and 1 ≤ j ≤ m , an alignment edge is either a matching edge ( i , j ) or a gap edge ( i , − ) or ( − , j ). 2. matching edges do not conflict ∀ ( i , j ) , ( i ′ , j ′ ) ∈ A : i < i ′ = S.Will, 18.417, Fall 2011 ⇒ j < j ′ 3. “degree is 1”: • ∀ i : ( i , − ) ∈ A ∨ ∃ ! j : ( i , j ) ∈ A • ∀ j : ( − , j ) ∈ A ∨ ∃ ! i : ( i , j ) ∈ A

→ Sequence Alignment, a slightly new definition Definition (Alignment (as set of alignment edges)) An alignment of two (RNA) sequences A and B , n = | A | , m = | B | , is a set A of alignment edges, where 1. for 1 ≤ i ≤ n and 1 ≤ j ≤ m , an alignment edge is either a matching edge ( i , j ) or a gap edge ( i , − ) or ( − , j ). 2. matching edges do not conflict ∀ ( i , j ) , ( i ′ , j ′ ) ∈ A : i < i ′ = ⇒ j < j ′ 3. “degree is 1”: • ∀ i : ( i , − ) ∈ A ∨ ∃ ! j : ( i , j ) ∈ A • ∀ j : ( − , j ) ∈ A ∨ ∃ ! i : ( i , j ) ∈ A Remark S.Will, 18.417, Fall 2011 New definition equivalent to previous one via alignment strings ≡ { (1 , 1) , (2 , 2) , ( − , 3) , (3 , 4) , (4 , 5) , (5 , − ) , (6 , − ) } AC-GTAA ACCCT--

→ Recall: The Best Sequence Alignment Idea: define best alignment as alignment with minimal edit distance Definition (Sequence Alignment Problem) Given two (RNA) sequences A and B , find the alignment A of A and B with minimal edit distance � dist A , B ( A ) = d ( i , j ) , ( i , j ) ∈A  i = − or j = − γ   where d ( i , j ) = w m A i � = B j  0 A i = B j .  • idea: how can we transform A into B ? Find sequence of edit S.Will, 18.417, Fall 2011 operations (match/mismatch, insertion, deletion) with minimal weight • d ( i , j ) weights the edit operation from positions i to j

→ Recall: Needleman-Wunsch Algorithm Idea: Minimize edit distance by DP. Get best alignment by traceback. Definition (Needleman-Wunsch Matrix) Define the matrix D = ( D ij ) 0 ≤ i ≤ n , 0 ≤ j ≤ m by D ij := min { dist A , B ( A ) | A alignment of A 1 , . . . , A i and B 1 , . . . , B j } . for 1 ≤ i ≤ n , 1 ≤ j ≤ m : Init: D 00 = 0, D i 0 = i γ , D 0 j = j γ ,  D i − 1 j − 1 + d ( i , j )   Recurse: D ij = D i − 1 j + d ( i , − )  S.Will, 18.417, Fall 2011 D ij − 1 + d ( − , j )  Remarks: • recursively compute edit distances of prefix alignments • obtain alignment by trace-back

→ Recall: From Pairwise to Multiple Problem: Given set of k RNA sequences, find best multiple alignment Definition (Multiple Alignment) Define a multiple alignment A of K (RNA) sequences S 1 , . . . , S K as a matrix of a ℓ i ∈ { A , C , G , U , −} (1 ≤ ℓ ≤ K , 1 ≤ i ≤ m ), s.t. • for ℓ : deleting each occurrence of − from a ℓ 1 . . . a ℓ m yields S ℓ . • for i : a 1 i . . . a Ki � = − · · · − . Call m the length of A . Recall: Progressive Alignment • pairwise alignments all-vs-all S.Will, 18.417, Fall 2011 • construct guide tree • progressivly construct multiple alignment following guide tree

→ You are here Plan A A: B: ALIGN single sequences A: B: FOLD alignment consensus structure A: B: consensus: consensus structure: Example: S 1 =CGAUACG, S 2 =CGAAUACG, S 3 =CCGAUUCGG C-GA-UAC-G S.Will, 18.417, Fall 2011 C-GAAUAC-G CCGA-UUCGG Next: fold the alignment

→ How to fold an alignment The Idea of RNAalifold Given a K -way multiple alignment of length m . Goal: predict the (non-crossing) consensus structure of the alignment. A consensus structure is a (non-crossing) RNA structure of length m . An optimal consensus structure minimizes a combination of • sum of free energy over all K RNA sequences and • a conservation score (= evidence for base pairing). Remarks • Think of the alignment as sequence of alignment columns. Folding of this sequence is analogous to folding of an RNA sequence. The consensus structure is a structure of the alignment. S.Will, 18.417, Fall 2011 • Thus, same decomposition as Zuker; except modified scoring: sum loop energies for all sequences & add conservation score • Conservation score γ ( i , j ) for each base pair ( i , j ), awards mutation — penalizes non-complementarity

→ RNAalifold — Example AF008220 GGAGGAUU-AGCUCAGCUGGGAGAGCAUCUGCCUUACAAGC---------AGAGGGUCGGCGGUUCGAGCCCGUCAUCCUCCA M68929 GCGGAUAU-AACUUAGGGGUUAAAGUUGCAGAUUGUGGCUC---------UGAAAA-CACGGGUUCGAAUCCCGUUAUUCGCC X02172 GCCUUUAU-AGCUUAG-UGGUAAAGCGAUAAACUGAAGAUU---------UAUUUACAUGUAGUUCGAUUCUCAUUAAGGGCA Z11880 GCCUUCCU-AGCUCAG-UGGUAGAGCGCACGGCUUUUAACC---------GUGUGGUCGUGGGUUCGAUCCCCACGGAAGGCG D10744 GGAAAAUUGAUCAUCGGCAAGAUAAGUUAUUUACUAAAUAAUAGGAUUUAAUAACCUGGUGAGUUCGAAUCUCACAUUUUCCG alifold (((((((...((((........))))((((((.......)).........))))....(((((.......)))))))))))). (-49.58 = -17.46 + -32.12) S.Will, 18.417, Fall 2011

→ RNAalifold Recursions � W ij − 1 W ij = min min i ≤ k < j − m W ik − 1 + V kj  � 1 ≤ ℓ ≤ K eH( i , j , S ℓ )   1 ≤ ℓ ≤ K min i < i ′ < j ′ < j V i ′ j ′ + eSBI( i , j , i ′ , j ′ , S ℓ ) V ij = βγ ( i , j ) + min �  min i < k < j WM i +1 k + WM k +1 j − 1 + aK  � WM ij − 1 + cK , WM i +1 j + cK , V ij + bK WM ij = min min i < k < j WM ik + WM k +1 j Remarks S.Will, 18.417, Fall 2011 • eH( i , j , S ℓ ) and eSBI( i , j , i ′ , j ′ , S ℓ ) yield energy contributions for the respective S ℓ .

→ RNAalifold Recursions � W ij − 1 W ij = min min i ≤ k < j − m W ik − 1 + V kj  � 1 ≤ ℓ ≤ K eH( i , j , S ℓ )   1 ≤ ℓ ≤ K min i < i ′ < j ′ < j V i ′ j ′ + eSBI( i , j , i ′ , j ′ , S ℓ ) V ij = βγ ( i , j ) + min �  min i < k < j WM i +1 k + WM k +1 j − 1 + aK  � WM ij − 1 + cK , WM i +1 j + cK , V ij + bK WM ij = min min i < k < j WM ik + WM k +1 j Remarks • eH( i , j , S ℓ ) and eSBI( i , j , i ′ , j ′ , S ℓ ) yield energy contributions for the respective S ℓ . S.Will, 18.417, Fall 2011 • RNAalifold implements an unambiguous variant of these recursions for computing partition function and base pair probabilities for the consensus structure. • β weights conservation score vs. sum of free energy. For γ see next slide.

Part 2 Comparative Analysis of RNAs S.Will, 18.417, Fall 2011 - PowerPoint PPT Presentation

Part 2 Comparative Analysis of RNAs S.Will, 18.417, Fall 2011 Example Given: set of related RNA sequences >AF008220 GGAGGAUUAGCUCAGCUGGGAGAGCAUCUGCCUUACAAGCAGAGGGUCGGCGGUUCGAGCCCGUCAUCCUCCA >M68929

Small RNAs and how to analyze them using sequencing Johan

mi micr cro-RNAs RNAs as bio s bioma marker rkers s in in childr chi ldren en wh who

Current Trends: Non-coding RNAs Central Dogma of molecular biology Reverse RNA virus

De novo prediction of structural noncoding RNAs Stefan Washietl 18.417 - Fall 2011 1/ 38

RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in

RNA Interference and Small RNAs RNAi is an ancient mechanism. Current work is being done on

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Ribo-gnome: The Big World of Small RNAs Phillip D. Zamore and Benjamin Haley Presentation by:

Brief introduction to non- protein-coding RNAs Mihaela Zavolan Biozentrum, Basel Swiss

Small RNAs and how to analyze them using sequencing RNA-seq Course November 8th 2017 Marc

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

Comparative Genomics Comparative Genomics Common Themes Gene and functional pathway

Comparative analysis of HIV- - 1 1 Comparative analysis of HIV attachment and fusion efficiency

LCA COMPARATIVE ANALYSIS LCA COMPARATIVE ANALYSIS OF DIFFERENT TECHNOLOGIES OF DIFFERENT

a comparative analysis of rural and urban a comparative analysis of rural and urban societies

International Comparative Assessments 1 05/06/2015 1 International Comparative Assessments Key

CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley www. www.cse cse.

INF 111 / CSE 121: Software Tools and Methods Lecture Notes for Fall Quarter, 2007 Michele

Information & Entropy Comp 595 DM Professor Wang Information & Entropy Information

Random Walk Inference and Learning in A Large Scale Knowledge Base Anshul Bawa Adapted from

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Outline What is EMBOSS? Major programs Running EMBOSS Programs from the Unix

Sequence Analysis with TraMineR Gilbert Ritschard Institute for Demographic and Life Course

GPU accelerated partial order multiple sequence alignment for long reads self-correction

Part 2 Comparative Analysis of RNAs S.Will, 18.417, Fall 2011 - PowerPoint PPT Presentation

Part 2 Comparative Analysis of RNAs S.Will, 18.417, Fall 2011 Example Given: set of related RNA sequences >AF008220 GGAGGAUUAGCUCAGCUGGGAGAGCAUCUGCCUUACAAGCAGAGGGUCGGCGGUUCGAGCCCGUCAUCCUCCA >M68929

Small RNAs and how to analyze them using sequencing Johan

mi micr cro-RNAs RNAs as bio s bioma marker rkers s in in childr chi ldren en wh who

Current Trends: Non-coding RNAs Central Dogma of molecular biology Reverse RNA virus

De novo prediction of structural noncoding RNAs Stefan Washietl 18.417 - Fall 2011 1/ 38

RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in

RNA Interference and Small RNAs RNAi is an ancient mechanism. Current work is being done on

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Ribo-gnome: The Big World of Small RNAs Phillip D. Zamore and Benjamin Haley Presentation by:

Brief introduction to non- protein-coding RNAs Mihaela Zavolan Biozentrum, Basel Swiss

Small RNAs and how to analyze them using sequencing RNA-seq Course November 8th 2017 Marc

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

Comparative Genomics Comparative Genomics Common Themes Gene and functional pathway

Comparative analysis of HIV- - 1 1 Comparative analysis of HIV attachment and fusion efficiency

LCA COMPARATIVE ANALYSIS LCA COMPARATIVE ANALYSIS OF DIFFERENT TECHNOLOGIES OF DIFFERENT

a comparative analysis of rural and urban a comparative analysis of rural and urban societies

International Comparative Assessments 1 05/06/2015 1 International Comparative Assessments Key

CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley www. www.cse cse.

INF 111 / CSE 121: Software Tools and Methods Lecture Notes for Fall Quarter, 2007 Michele

Information &amp; Entropy Comp 595 DM Professor Wang Information &amp; Entropy Information

Random Walk Inference and Learning in A Large Scale Knowledge Base Anshul Bawa Adapted from

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Outline What is EMBOSS? Major programs Running EMBOSS Programs from the Unix

Sequence Analysis with TraMineR Gilbert Ritschard Institute for Demographic and Life Course

GPU accelerated partial order multiple sequence alignment for long reads self-correction

Information & Entropy Comp 595 DM Professor Wang Information & Entropy Information