Assembly Assembly Computational Challenge: assemble individual short - PowerPoint PPT Presentation

Assembly

Assembly Computational Challenge: assemble individual short fragments (reads) into a single genomic sequence (“superstring”)

Shortest common superstring Problem: Given a set of strings, find a shortest string that contains all of them Input: Strings s 1 , s 2 ,…., s n Output: A string s that contains all strings s 1 , s 2 , …., s n as substrings, such that the length of s is minimized

Shortest common superstring

Overlap Graph

De Bruijn Graph

Overlap graph vs De Bruijn graph CG GT TG CA AT GC Path visited every EDGE once GG

Some definitions

Eulerian walk/path zero or

Assume all nodes are balanced a. Start with an arbitrary vertex v and form an arbitrary cycle with unused edges until a dead end is reached. Since the graph is Eulerian this dead end is necessarily the starting point, i.e., vertex v .

b. If cycle from (a) is not an Eulerian cycle, it must contain a vertex w , which has untraversed edges. Perform step (a) again, using vertex w as the starting point. Once again, we will end up in the starting vertex w.

c. Combine the cycles from (a) and (b) into a single cycle and iterate step (b).

Eulerian path • A vertex v is � semibalanced � if | in-degree( v ) - out-degree( v )| = 1 • If a graph has an Eulerian path starting from s and ending at t , then all its vertices are balanced with the possible exception of s and t • Add an edge between two semibalanced vertices: now all vertices should be balanced (assuming there was an Eulerian path to begin with). Find the Eulerian cycle, and remove the edge you had added. You now have the Eulerian path you wanted.

Complexity?

Hidden Markov Models

Markov Model (Finite State Machine with Probs) Modeling a sequence of weather observations

Hidden Markov Models Assume the states in the machine are not observed and we can observe some output at certain states.

Hidden Markov Models Assume the states in the machine are not observed and we can observe some output at certain states. Hidden: Sunny Hidden: Rainy Observation: Clean Observation: Walk Observation: Shop

Generate a sequence from a HMM HHHHHHCCCCCCCHHHHHH Hidden: temperature 3323332111111233332 Observed: number of ice creams

Hidden Markov Models: Applications Speech recognition Action recognition

Motif Finding Problem: Find frequent motifs with length L in a sequence dataset ATCGCGCGGCGCGGAATCGDTATCGCGCGCC CAGGTAAGT GCGCGCG CAGGTAAGG TATTATGCGAGACGATGTGCTATT GTAGGCTGATGTGGGGGG AAGGTAAGT CGAGGAGTGCATG CTAGGGAAACCGCGCGCGCGCGAT AAGGTGAGT GGGAAAG Assumption: the motifs are very similar to each other but look very different from the rest part of sequences

Motif: a first approximation Assumption 1: lengths of motifs are fixed to L Assumption 2: states on different positions on the sequence are independently distributed N i ( A ) p i ( A ) = N i ( A ) + N i ( T ) + N i ( G ) + N i ( C ) L Y p ( x ) = p i ( x ( i )) i =1

Motif: (Hidden) Markov models Assumption 1: lengths of motifs are fixed to L Assumption 2: future letters depend only on the present letter p i ( A | G ) = N i − 1 ,i ( G, A ) N i − 1 ( G ) L Y p ( x ) = p 1 ( x (1)) p i ( x ( i ) | x ( i − 1)) i =2

Motif Finding Problem: We don’t know the exact locations of motifs in the sequence dataset ATCGCGCGGCGCGGAATCGDTATCGCGCGCC CAGGTAAGT GCGCGCG CAGGTAAGG TATTATGCGAGACGATGTGCTATT GTAGGCTGATGTGGGGGG AAGGTAAGT CGAGGAGTGCATG CTAGGGAAACCGCGCGCGCGCGAT AAGGTGAGT GGGAAAG Assumption: the motifs are very similar to each other but look very different from the rest part of sequences

Hidden state space null start end

Hidden Markov Model (HMM) 0.9 null 0.99 0.02 start end 0.08 0.95 0.01 0.05

How to build HMMs?

Computational problems in HMMs

Hidden Markov Models

Hidden Markov Model Hidden q(i-1) q(i) q(i+1) Observed o(i-1) o(i) o(i+1)

Conditional Probability of Observations Example:

Joint and marginal probabilities Joint: Marginal:

How to compute the probability of observations

Forward algorithm

Decoding: finding the most probable states Similar to the forward algorithm, we can define the following value:

Viterbi algorithm

Assembly Assembly Computational Challenge: assemble individual short - PowerPoint PPT Presentation

Assembly Assembly Computational Challenge: assemble individual short fragments (reads) into a single genomic sequence (superstring) Shortest common superstring Problem: Given a set of strings, find a shortest string that contains all of

#join Y assembly to Box JellyBox Build: 15_Y-Assembly Join (link directly to the y assembly part

Assembly Language Introduction Learning Objectives Explain what assembly language is

#join X assembly to Box JellyBox Build: 16_X-Assembly Join In this video, we incorporate X

Bioinformatics Seminars Series: Assembly Validation Francesco Vezzi KTH: Royal Institute of

4 - #join Y Assembly to the Box JellyBox Build: 15_Y-Assembly Join (link directly to the y

Assembly Language Programming Assembler and assembly language Zbigniew Jurkiewicz, Instytut

Universal Network Design and Assembly Introduction DNA Assembly This year, we improved upon

Spring 2014 Assembly Presentation / Fear-Faith-Fun In Area Service This Fall Assembly will be our

Joint Informational Hearing of the Assembly Water, Parks & Wildlife Committee Assembly

Winstronics ELECTRONIC ASSEMBLY MANUFACTURING 2016 PRESENTATION W W W . W I N S T R O N I C S .

Chamber Insert Assembly and Avionics Package Chamber Insert Assembly and Avionics Package MDCA

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

MEDIA AWARDS 1 ITA GENERAL ASSEMBLY | 190531 BEST NEWS ARTICLE 2 ITA GENERAL ASSEMBLY |

ASNMU December Board of Trustees Report Current Assembly Executive Board Vice President:

THE NATIONAL ASSEMBLY OF THE REPUBLIC OF KOREA THE NATIONAL ASSEMBLY OF THE REPUBLIC OF KOREA

www.inter -fab.com X-stream Slide Assembly and Installation Instructions Assembly/Installation

Identifying Negation in the DGS Corpus Graz, 2019-05-03 Marc Schulder, Thomas Hanke

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

BBN-ANG-243 Advanced Phonology: Phonological Analysis 1. Introduction Kiss Zoltn / Starcevic

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations Fei

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE Bioinformatics Senior Project Wasay Hussain Spring

Inferring Hierarchical Motifs from Execution Traces Saba Alimadadi , Ali Mesbah, Karthik

ChIP-seq analysis Morgane Thomas-Chollier Samuel Collombet

Outline Applications of Random Networks Random Networks Applications of Random Networks

Sambuz

Useful Links

Newsletter

Mail Us

Assembly Assembly Computational Challenge: assemble individual short - PowerPoint PPT Presentation

Assembly Assembly Computational Challenge: assemble individual short fragments (reads) into a single genomic sequence (superstring) Shortest common superstring Problem: Given a set of strings, find a shortest string that contains all of

#join Y assembly to Box JellyBox Build: 15_Y-Assembly Join (link directly to the y assembly part

Assembly Language Introduction Learning Objectives Explain what assembly language is

#join X assembly to Box JellyBox Build: 16_X-Assembly Join In this video, we incorporate X

Bioinformatics Seminars Series: Assembly Validation Francesco Vezzi KTH: Royal Institute of

4 - #join Y Assembly to the Box JellyBox Build: 15_Y-Assembly Join (link directly to the y

Assembly Language Programming Assembler and assembly language Zbigniew Jurkiewicz, Instytut

Universal Network Design and Assembly Introduction DNA Assembly This year, we improved upon

Spring 2014 Assembly Presentation / Fear-Faith-Fun In Area Service This Fall Assembly will be our

Joint Informational Hearing of the Assembly Water, Parks &amp; Wildlife Committee Assembly

Winstronics ELECTRONIC ASSEMBLY MANUFACTURING 2016 PRESENTATION W W W . W I N S T R O N I C S .

Chamber Insert Assembly and Avionics Package Chamber Insert Assembly and Avionics Package MDCA

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

MEDIA AWARDS 1 ITA GENERAL ASSEMBLY | 190531 BEST NEWS ARTICLE 2 ITA GENERAL ASSEMBLY |

ASNMU December Board of Trustees Report Current Assembly Executive Board Vice President:

THE NATIONAL ASSEMBLY OF THE REPUBLIC OF KOREA THE NATIONAL ASSEMBLY OF THE REPUBLIC OF KOREA

www.inter -fab.com X-stream Slide Assembly and Installation Instructions Assembly/Installation

Identifying Negation in the DGS Corpus Graz, 2019-05-03 Marc Schulder, Thomas Hanke

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

BBN-ANG-243 Advanced Phonology: Phonological Analysis 1. Introduction Kiss Zoltn / Starcevic

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations Fei

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE Bioinformatics Senior Project Wasay Hussain Spring

Inferring Hierarchical Motifs from Execution Traces Saba Alimadadi , Ali Mesbah, Karthik

ChIP-seq analysis Morgane Thomas-Chollier Samuel Collombet

Outline Applications of Random Networks Random Networks Applications of Random Networks

Sambuz

Useful Links

Newsletter

Mail Us

Joint Informational Hearing of the Assembly Water, Parks & Wildlife Committee Assembly