Shotgun Assembly of Labelled Graphs Charles Bordenave 3 , Uri Feige 3 - PowerPoint PPT Presentation

Shotgun Assembly of Labelled Graphs Charles Bordenave 3 , Uri Feige 3 , Elchanan Mossel 1 , 2 , 3 , Nathan Ross 1 , Nike Sun 2 1 Shotgun assembly of Labelled Graphs (arxiv.org/abs/1504.07682) 2 Shotgun Assembly of Random Regular Graphs, (arxiv.org/abs/1512.08473) 3 Shotgun Assembly of Random Jigsaw Puzzles, in progress. Simons Conference on Random Graph Processes Elchanan Mossel Shotgun Assembly of Labelled Graphs

Graph Shotgun Problem Can one reconstruct a graph from collection of subgraphs? Reconstruction Conjecture (Kelley, Harary 50s): Any two graphs on 3 or more vertices that have the same multi-set of vertex-deleted subgraphs are isomorphic. Figure: From Topology and Combinatorics Blog by Max F. Pitz Elchanan Mossel Shotgun Assembly of Labelled Graphs

Graph Shotgun Problem Can one reconstruct a graph from collection of subgraphs? Reconstruction Conjecture (Kelley, Harary 50s): Any two graphs on 3 or more vertices that have the same multi-set of vertex-deleted subgraphs are isomorphic. Mossel-Ross-15: What if Graphs are Random or have random labels? ( easier ) And given only local neighborhoods of each vertex ( harder )? Elchanan Mossel Shotgun Assembly of Labelled Graphs

DNA Shotgun Sequencing Figure: From “Whole genome shotgun sequencing versus Hierarchical shotgun sequencing” by Commins, Toft, and Fares (2009). Elchanan Mossel Shotgun Assembly of Labelled Graphs

Q1: Deterministic Sequence of letters (A, C, G, T or other) of length N . All “reads” of length r are given. Example: N = 14, r = 3: ATGGGCACTGAGCC Reads: { ATG , TGG , GGG , GGC , GCA , CAC , ACT , CTG , TGA , GAG , AGC , GCC } Combinatorial Question: When does this multi-set uniquely determine the sequence? Elchanan Mossel Shotgun Assembly of Labelled Graphs

Q1: Deterministic Ans (Ukkonen-Pevzner): Identifiability is possible if and only if none of the following blocking patterns appear: Rotation: x α y β x ⇐ ⇒ y β x α y Triple repeat: · · · x α x β x · · · ⇐ ⇒ · · · x β x α x · · · Interleaved repeat: · · · x α y · · · x β y · · · ⇐ ⇒ · · · x β y · · · x α y · · · [ x , y are ( r − 1)-tuples and α, β are non-equal strings] Elchanan Mossel Shotgun Assembly of Labelled Graphs

Q1: Deterministic Proof is based on creating a de Bruijn graph: DNA Physical Mapping and Alternating Eulerian Cycles in Colored Graphs 87 q-gram composition 9 AC ATG CT ( AGC ACT TGG .? ~ T~ GAG GGG GGC GCC D CAC GA AG CTG Figure: From “DNA Physical Mapping and Alternating Eulerian Cycles in AC Colored Graphs” by Pevzner (1996). CA c3 c 9 ATGGGCACTGAGCC O AT GG. CC .... I ) D* A G GA Elchanan Mossel Shotgun Assembly of Labelled Graphs AC AC o ,c ~ ~--e o AT TG GG CC C CC i i ( order exchange (~ transposition.__ GA GA AG AG Y= ATGGGCACTGAGCC Y=A:TGAGCACTGGGCC Yll zll Y~J z~ Y3 I Zll Yd z~ Y5 Yll zll Y4 z~ Y3 I Zll Y~ z2J Y5 Fig. 7. All words with given q-gram composition correspond to Eulerian paths in directed graph D. D*-bicolored undirected graph obtained from D. Order exchanges in D* correspond to Ukkonen's transpositions.

Q1: Deterministic Proof is based on creating a de Bruijn graph: DNA Physical Mapping and Alternating Eulerian Cycles in Colored Graphs 87 q-gram composition 9 AC ATG AGC CT ( ACT TGG .? GAG ~ T~ GGG GGC GCC D CAC GA AG CTG Figure: From “DNA Physical Mapping and Alternating Eulerian Cycles in AC Colored Graphs” by Pevzner (1996). CA c3 c 9 Identifiability is possible if and only if a unique Eulerian path O AT GG. CC (though not circuit). I ) .... D* A G GA Elchanan Mossel Shotgun Assembly of Labelled Graphs AC AC o ,c ~ ~--e o AT TG GG CC C CC i i ( order exchange (~ transposition.__ GA GA AG AG Y= ATGGGCACTGAGCC Y=A:TGAGCACTGGGCC Yll zll Y~J z~ Y3 I Zll Yd z~ Y5 Yll zll Y4 z~ Y3 I Zll Y~ z2J Y5 Fig. 7. All words with given q-gram composition correspond to Eulerian paths in directed graph D. D*-bicolored undirected graph obtained from D. Order exchanges in D* correspond to Ukkonen's transpositions.

Setup Q2: Randomized Random sequence, entries independent and uniform on q letters. What is the probability of identifiability? Criteria on growth of r = r N as N → ∞ such that the chance sequence is identifiable tends to zero or one? Ukkonen-Pevzner useful – understand the probability of the appearance of the blocking patterns. If r / log( N ) > 2 / log( q ) eventually, then probability of identifiability tends to one. If r / log( N ) < 2 / log( q ) eventually, then probability of identifiability tends to zero. Dyer-Frieze-Suen-94,.... Still active area of research: e.g.: reads with errors, e.g: Ganguly-M-Racz-16. What about other Graphs?? Elchanan Mossel Shotgun Assembly of Labelled Graphs

Graph Shotgun Sequencing Paninski et al. (2013) : How to reconstruct neural network from subnetworks? Figure: wiki commons Elchanan Mossel Shotgun Assembly of Labelled Graphs

Random Puzzle Problem Figure: wiki commons Math Question: For an n × n puzzle with q types of random jigs, how large should q ( n ) be so that the puzzle can be assembled uniquely?? Elchanan Mossel Shotgun Assembly of Labelled Graphs

A general setup 1 G is a (fixed or random) graph, 2 Possibly with random labeling of the vertices, 3 For each vertex v , given a rooted neighborhood N r ( v ) of “radius” r . Elchanan Mossel Shotgun Assembly of Labelled Graphs

Random jigsaw Puzzle Puzzle = [ n ] × [ n ] grid with uniform q -coloring of the edges of the grid. Piece = vertex along with 4 adjacent colored half edges. Given: n 2 pieces. Goal: Recover the puzzle. Assume pieces at the edges also have 4 colors (harder). For presentation purposes: colored edges vs. Real Puzzle: colored half edges and a compatibility involution. ι ← → ˇ e e ι ← → Elchanan Mossel Shotgun Assembly of Labelled Graphs Figure: A puzzle with n = 3, q = 4 and the involution ι .

The unique Assembly Question A feasible assembly is a permutation of the pieces such that adjacent two half-edges have the same color. A puzzle has unique vertex assembly (UVA) if (up to rotations) it has only one feasible assembly. A puzzle has unique edge assembly (UEA) if for every feasible assembly, every edge has the same color as in the planted solution (up to rotations). Question: How large should q be to ensure unique edge/vertex assembly with high probability ( → 1 as n → ∞ ) ? Elchanan Mossel Shotgun Assembly of Labelled Graphs

Bounds on puzzle assembly From M-Ross: q << n = ⇒ P ( UVA ) → 0. Elchanan Mossel Shotgun Assembly of Labelled Graphs

Bounds on puzzle assembly From M-Ross: q << n = ⇒ P ( UVA ) → 0. q << n 2 / 3 = ⇒ P ( UEA ) → 0. Elchanan Mossel Shotgun Assembly of Labelled Graphs

Bounds on puzzle assembly From M-Ross: q << n = ⇒ P ( UVA ) → 0. q << n 2 / 3 = ⇒ P ( UEA ) → 0. q >> n 2 = ⇒ P ( UVA ) → 1. Elchanan Mossel Shotgun Assembly of Labelled Graphs

Bounds on puzzle assembly From M-Ross: q << n = ⇒ P ( UVA ) → 0. q << n 2 / 3 = ⇒ P ( UEA ) → 0. q >> n 2 = ⇒ P ( UVA ) → 1. Intuition: use unique colors. Elchanan Mossel Shotgun Assembly of Labelled Graphs

Bounds on puzzle assembly From M-Ross: q << n = ⇒ P ( UVA ) → 0. q << n 2 / 3 = ⇒ P ( UEA ) → 0. q >> n 2 = ⇒ P ( UVA ) → 1. Intuition: use unique colors. Theorem (Bordenave-Feige-M) For all ε > 0 , If q ≥ n 1+ ε then P ( UVA ) → 1 . Open Problem 1: Zoom in on threshold? Open Problem 2: Threshold for UEA. Elchanan Mossel Shotgun Assembly of Labelled Graphs

Assembly algorithm We use a simple assembly algorithm: A feasible k -neighborhood of piece v is map f from [ − k , k ] 2 → pieces such that f (0) = v and if x ∼ y ∈ [ − k , k ] 2 then the corresponding half-edges in f ( x ) and f ( y ) have the same color. Algorithm: find all feasible k -neighborhoods for each vertex v . Declare piece u to be a neighbor of v if it is its neighbor of v in each k -neighborhood. We take k = O (1 /ε ). How to analyze? Elchanan Mossel Shotgun Assembly of Labelled Graphs

Analysis 1 Note: impossible to hope to recover k -neighborhood exactly, e.g - corners are often wrong. Fix f : [ − k , k ] 2 → [ n ] 2 with f (0) = v . What is the probability that f is feasible? If f ( x ) = v + x then probability 1. If f is random then probability q − 8 k 2 (1+ o (1)) . Elchanan Mossel Shotgun Assembly of Labelled Graphs

Analysis 2 Define a tile of f to be a connected component of f ([ − k , k ] 2 ). Let v ∈ T 0 , T 1 , . . . , T r be the tiles of f . Elchanan Mossel Shotgun Assembly of Labelled Graphs

Analysis 2 Define a tile of f to be a connected component of f ([ − k , k ] 2 ). Let v ∈ T 0 , T 1 , . . . , T r be the tiles of f . Then: γ = 1 P [ f feasible ] = q − γ , � 2( | ∂ T i | − 8 k ) Elchanan Mossel Shotgun Assembly of Labelled Graphs

Analysis 2 Define a tile of f to be a connected component of f ([ − k , k ] 2 ). Let v ∈ T 0 , T 1 , . . . , T r be the tiles of f . Then: γ = 1 P [ f feasible ] = q − γ , � 2( | ∂ T i | − 8 k ) Isoperimetric lemma: If f separates v from its neighbors then: n 2 n 2 r q − γ = n 2 n 2 r n − γ (1+ ε ) << 1 E.g: many small tiles - each contributed at least 2 to γ . Elchanan Mossel Shotgun Assembly of Labelled Graphs

Shotgun Assembly of Labelled Graphs Charles Bordenave 3 , Uri Feige 3 - PowerPoint PPT Presentation

Shotgun Assembly of Labelled Graphs Charles Bordenave 3 , Uri Feige 3 , Elchanan Mossel 1 , 2 , 3 , Nathan Ross 1 , Nike Sun 2 1 Shotgun assembly of Labelled Graphs (arxiv.org/abs/1504.07682) 2 Shotgun Assembly of Random Regular Graphs,

Assembly Assembly Assembling with Repeats Assembling with Repeats Mate Pairs Mate Pairs Whole

On Hypersequents and Labelled Sequents Translating Labelled Sequent Proofs to Hypersequent Proofs

Labelled transition systems Labelled transition systems are relations of the form a Q P

Today Perceptron. Today Perceptron. Support Vector Machine. Labelled points with x 1 ,..., x n

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Whol e Gen ome Sh ot gun S equencing Whol e Gen ome Sh ot gun S equencing Shotgun DNA

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not

In Search Of Shotgun Parsers Katie Underwood University of Calgary Michael Locasto SRI

Skolem labelled graphs, old and new results Nabil Shalaby Department of Mathematics and

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

#join Y assembly to Box JellyBox Build: 15_Y-Assembly Join (link directly to the y assembly part

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Outline Basic Input / Output Interrupt hardware architecture Prioritized Interrupts

Disclaimer Many of these slides are mine But, some are stolen from various places on the

Disclaimer Many of these slides are mine But, some are stolen from various places on the web

Assignment 1 Design module FA(a,b,c,sum,carry) //inputs input a,b,c //outputs output

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

SPAdes: a New Genome Assembler for Single-Cell Sequencing Algorithmic Biology Lab St. Petersburg

On the Parikh-de-Bruijn grid P eter Burcsi Zsuzsanna Lipt ak W. F. Smyth ELTE Budapest

Peer-to-Peer Networks 07 Degree Optimal Networks Christian Ortolf Technical Faculty

Sambuz

Useful Links

Newsletter

Mail Us

Shotgun Assembly of Labelled Graphs Charles Bordenave 3 , Uri Feige 3 - PowerPoint PPT Presentation

Shotgun Assembly of Labelled Graphs Charles Bordenave 3 , Uri Feige 3 , Elchanan Mossel 1 , 2 , 3 , Nathan Ross 1 , Nike Sun 2 1 Shotgun assembly of Labelled Graphs (arxiv.org/abs/1504.07682) 2 Shotgun Assembly of Random Regular Graphs,

Assembly Assembly Assembling with Repeats Assembling with Repeats Mate Pairs Mate Pairs Whole

On Hypersequents and Labelled Sequents Translating Labelled Sequent Proofs to Hypersequent Proofs

Labelled transition systems Labelled transition systems are relations of the form a Q P

Today Perceptron. Today Perceptron. Support Vector Machine. Labelled points with x 1 ,..., x n

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Whol e Gen ome Sh ot gun S equencing Whol e Gen ome Sh ot gun S equencing Shotgun DNA

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens &amp; Grant 5.1 Math 186: Not

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens &amp; Grant 5.1 Math 186: Not

In Search Of Shotgun Parsers Katie Underwood University of Calgary Michael Locasto SRI

Skolem labelled graphs, old and new results Nabil Shalaby Department of Mathematics and

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

#join Y assembly to Box JellyBox Build: 15_Y-Assembly Join (link directly to the y assembly part

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Outline Basic Input / Output Interrupt hardware architecture Prioritized Interrupts

Disclaimer Many of these slides are mine But, some are stolen from various places on the

Disclaimer Many of these slides are mine But, some are stolen from various places on the web

Assignment 1 Design module FA(a,b,c,sum,carry) //inputs input a,b,c //outputs output

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

SPAdes: a New Genome Assembler for Single-Cell Sequencing Algorithmic Biology Lab St. Petersburg

On the Parikh-de-Bruijn grid P eter Burcsi Zsuzsanna Lipt ak W. F. Smyth ELTE Budapest

Peer-to-Peer Networks 07 Degree Optimal Networks Christian Ortolf Technical Faculty

Sambuz

Useful Links

Newsletter

Mail Us

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not