Optical Mapping Data: Data Generation and Algorithms Sample - PowerPoint PPT Presentation

Optical Mapping Data: Data Generation and Algorithms

Sample Preparation Fragments Sequencing Reads Assembly Contigs Analysis

What is an Optical Map? Optical maps are ordered, genome-wide, high- resolution restriction maps. GGCTT CCGA CCACCACAA CCGA ATTATGAAGGATA CCGA A 6,19,35 - Much longer than reads. For example, the average map size for goat covers 360,000 bp - Now commercially available

Microfludic device Isolated DNA DNA is elongated and cleaved on the optical mapping surface . Epiflourescence microscope with CCD camera

6 3 3 9 4

3 9 4 6 3 6 3 9 4 Genome wide optical map

“There is [..] a critical need for the continued development and public release of software tools for processing optical mapping data ..” -GigaScience 2014

Sample Preparation Optical Map Data Genome-wide optical map Sequencing Goal: tool to align the contig to a segment of an contigs optical map Assembly Analysis

Challenges • Previous approaches use dynamic programming • Burrows-Wheeler Transform (BWT) would improve time efficiency • Challenges in applying BWT: (1) Sizing error and (2) alphabet size Actual optical map 6 9 4 3 values Optical map obtained 9.5 6 5 4 from experiment 0.5 2 1 1 SIZING ERROR

� � Challenges • Previous approaches use dynamic programming • Burrows-Wheeler Transform (BWT) would improve time efficiency • Challenges in applying BWT: (1) Sizing error and (2) alphabet size ! 𝑣𝑜𝑗𝑟𝑣𝑓 𝑔𝑠𝑏𝑕𝑛𝑓𝑜𝑢 𝑡𝑗𝑨𝑓𝑡 > 16,000

Sample Preparation Optical Map Data Genome-wide optical map Twin Sequencing Contigs Assembly Alignment of contigs to optical map Analysis

Contig 2 Contig 4 Contig 1 Contig 3 Contig 5

Twin Algorithm 1. In silico digest contigs into optical maps. TTT CCGA CCACTTTT CCGA ATTATGA CCGA A 4,13,24

Twin Algorithm 1. In silico digest contigs into optical maps. 2. Build FM-index* and auxiliary data structures on the genome-wide optical map. * a data structure that allows compression of the input text while still permitting fast substring queries

BWT and FM-index A suffix array ( SA ) of string S is an array of the suffixes of S sorted into alphabetical order. 3 aaacg n 1 acaaacg n 4 aacg n 2 caaacg n 1 acaaacg n 3 aaacg n acaaacg n 5 acg n 4 aacg n 2 caaacg n 5 acg n 6 cg n 6 cg n 7 g n 7 g n 8 n 8 n

BWT and FM-index A suffix array ( SA ) of string S is an array of the suffixes of S sorted into alphabetical order. 3 aaacg n 1 acaaacg n 4 aacg n 2 caaacg n 1 acaaacg n 3 aaacg n acaaacg n 5 acg n 4 aacg n 2 caaacg n 5 acg n 6 cg n 6 cg n 7 g n 7 g n 8 n 8 n The suffix array clusters all the occurrences of every pattern together into a contiguous range!

BWT and FM-index The Burrows-Wheeler Transform ( BWT ) is a permutation of the string such that BWT[i] = S[SA[i] - 1]. 3 aaacg n ac c 4 aacg n aca a Extract last column of SA 1 acaaacg n n acaaacg n 5 acg n acaa a 2 caaacg n a a 6 cg n acaaa a 7 g n acaaac c 8 n acaaacg g

BWT and FM-index The Burrows-Wheeler Transform ( BWT ) is a permutation of the string such that BWT[i] = S[SA[i] - 1]. 3 aaacg n ac c 0 4 aacg n aca a 0 1 acaaacg n n 0 acaaacg n 5 acg n acaa a 1 2 caaacg n a a 2 6 cg n acaaa a 3 7 g n acaaac c 1 8 n acaaacg g 0 BWT rank rank K (i): return the number of K ’s in S[1,i]

BWT and FM-index The Burrows-Wheeler Transform ( BWT ) is a permutation of the string such that BWT[i] = S[SA[i] - 1]. 3 aaacg n ac c 0 4 aacg n aca a 0 1 acaaacg n n 0 acaaacg n 5 acg n acaa a 1 rank a [5] = 2 2 caaacg n a a 2 6 cg n acaaa a 3 7 g n acaaac c 1 8 n acaaacg g 0 BWT rank rank K (i): return the number of K ’s in S[1,i]

BWT and FM-index The Burrows-Wheeler Transform ( BWT ) is a permutation of the string such that BWT[i] = S[SA[i] - 1]. BWT rank 3 aaacg n ac c 0 4 aacg n aca a 0 1 acaaacg n n 0 acaaacg n 5 acg n acaa a 1 2 caaacg n a a 2 6 cg n acaaa a 3 7 g n acaaac c 1 8 n acaaacg g 0 FM-index is the compressed version of the BWT and rank .

Twin Algorithm 1. In silico digest contigs into optical maps. 2. Build FM-index and auxiliary data structures on the genome-wide optical map. 3. Using the FM-index we find all alignments between the optical map and the in silico digested contigs. - Modified FM-index Backward Search Algorithm

FM-Index Backward Search A recursive algorithm for finding substrings using rank and BWT rank[a] rank[a] rank[c]

Modified FM-Index Backward Search • Sizing error and alphabet size are challenges to overcome • We cannot afford a brute force enumeration of the alphabet at each step in the backward search • Novelty for optical maps: Wavelet Tree

Wavelet Tree A Wavelet Tree converts a string into a balanced binary-tree of bit vectors, where a 0 replaces half of the symbols, and a 1 replaces the other half. This definition is applied recursive

Wavelet Tree {A,C,G,T} is encoded as {0,0,1,1} ACGTATATAGGAAGA 001101010110010

Wavelet Tree {A,C} is encoded as {0,1} ACGTATATAGGAAGA 001101010110010 0 ACAAAAAA 01000000 No ambiguity!

Wavelet Tree {G,T} is encoded as {0,1} ACGTATATAGGAAGA 001101010110010 1 0 ACAAAAAA GTTTGGG 01000000 0111000 Which symbols in {A, G} exist in input string?

Modified FM-Index Backward Search To match x we need to find all the substrings within the range x +/- y, for tolerance y.

Modified FM-Index Backward Search To match 9 we need to find all the substrings within the range [6, 12] , for tolerance 3. Genome wide 2,11,10,23,53,3,5,10,14,9,11 optical map 0, 1, 0, 1, 1,0,0, 0, 1,0, 1

Modified FM-Index Backward Search To match 9 we need to find all the substrings within the range [6, 12] , for tolerance 3. 2,11,10,23,53,3,5,10,14,9,11 0, 1, 0, 1, 1,0,0, 0, 1,0, 1 2,10,3,5,10,9 11,23,53,14,11 0, 1,0,0, 1,1 0, 1, 1, 0, 0 11,14,11 23,53 2,3,5 10,9,10 0, 1, 0 0, 1 0,0,1 0,1, 0 2,3 5 0,1 1

Modified FM-Index Backward Search A recursive algorithm for finding substrings using rank and BWT rank[a] rank[a] rank[c] Wavelet Tree Query

Twin Algorithm 1. In silico digest contigs into optical maps. 2. Build FM-index and auxiliary data structures on the genome-wide optical map. 3. Using the FM-index we find all alignments between the optical map and the in silico digested contigs. 4. Output the alignments in PSL format.

TWIN Test Datasets

TWIN Results

TWIN: Optical Map Aligner Twin is the first alignment method that is capable of handling large genome sizes The only index-based tool and is orders of • magnitude faster than existing approaches (patent pending) Pine tree (20 Gb) would take ~84 machine years • with SOMA but a couple hours with Twin

CORRECTING ERRORS IN GENOMES

Mis-assembly in Genomes Mis-assembly: Significantly large insertion, deletion, inversion, or rearrangement that is the result of decisions made by the assembly program Correct assembly A R R B Rearrangement B A R R Deletion A R B Insertion A R R R B

Extensive vs. Local Mis-assemblies Extensive Mis-assembly: 1 kbp in size and regions align to different strands or different chromosomes. Local Mis-assembly: smaller in size and on the same strand and same chromosome.

De Bruijn Graph of a Genome Example Genome: ABCDEFGHICDEFGKL Example Genome: ABCDEFGHICDEFGKL GHI HIC 2 ICD FGH CDE ABC BCD DEF EFG FGK GKL 1 3

Optical Mapping Data: Data Generation and Algorithms Sample - PowerPoint PPT Presentation

Optical Mapping Data: Data Generation and Algorithms Sample Preparation Fragments Sequencing Reads Assembly Contigs Analysis What is an Optical Map? Optical maps are ordered, genome-wide, high- resolution restriction maps. GGCTT CCGA

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Optical Rings and Hybrid Mesh Rings Optical Networks draft-papadimitriou-optical-rings-00.txt

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Optical Recording and Optical Recording and That audio or video is of the highest quality

Experiment 3 Optical Rotation Optical rotation or optical activity The rotation of the plane

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

USING OPTICAL FLOW CMPS261 Project Shweta Philip OPTICAL FLOW Assumptions made by optical

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Paper Review Introduction Optical Generation of Microwave Signals All-Optical

Mapping data Representing data with maps Geographic analysis tasks Mapping where things are

Optical Recording and Optical Recording and and tilt it just right, the watchs face appears to

Optical Fiber Madhuri Jash 07/03/2015 What is Optical Fiber? An optical fiber is a flexible,

NVIDIA OPTICAL FLOW Abhijit Patait, 3/18/2019 Optical Flow in Turing GPUs NVIDIA Optical Flow

Back Then When There Was No Sky: The Antiquity of Celestial References in Classical Yucatecan

Analysing re-sequencing samples Anna Johansson Anna.johansson@scilifelab.se WABI / SciLifeLab

Discovering dark matter Di Subir Sarkar University of Oxford & Niels Bohr Institute,

Mechanisms of Meaning Autumn 2010 Raquel Fernndez Institute for Logic, Language &

Implications of the Yukawas textures of the neutral Higgs bosons in the context of the THDM

Programmation de contraintes ou programmation automatique ? Constraint propagation or automatic

Panel 3 International Mathematical Knowledge Trust IMKT: Update on the Global Digital

Macro Dark Matter David M. Jacobs Claude Leon Postdoctoral Fellow University of Cape Town SLAC

Optical Mapping Data: Data Generation and Algorithms Sample - PowerPoint PPT Presentation

Optical Mapping Data: Data Generation and Algorithms Sample Preparation Fragments Sequencing Reads Assembly Contigs Analysis What is an Optical Map? Optical maps are ordered, genome-wide, high- resolution restriction maps. GGCTT CCGA

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Optical Rings and Hybrid Mesh Rings Optical Networks draft-papadimitriou-optical-rings-00.txt

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Optical Recording and Optical Recording and That audio or video is of the highest quality

Experiment 3 Optical Rotation Optical rotation or optical activity The rotation of the plane

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

USING OPTICAL FLOW CMPS261 Project Shweta Philip OPTICAL FLOW Assumptions made by optical

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Paper Review Introduction Optical Generation of Microwave Signals All-Optical

Mapping data Representing data with maps Geographic analysis tasks Mapping where things are

Optical Recording and Optical Recording and and tilt it just right, the watchs face appears to

Optical Fiber Madhuri Jash 07/03/2015 What is Optical Fiber? An optical fiber is a flexible,

NVIDIA OPTICAL FLOW Abhijit Patait, 3/18/2019 Optical Flow in Turing GPUs NVIDIA Optical Flow

Back Then When There Was No Sky: The Antiquity of Celestial References in Classical Yucatecan

Analysing re-sequencing samples Anna Johansson Anna.johansson@scilifelab.se WABI / SciLifeLab

Discovering dark matter Di Subir Sarkar University of Oxford &amp; Niels Bohr Institute,

Mechanisms of Meaning Autumn 2010 Raquel Fernndez Institute for Logic, Language &amp;

Implications of the Yukawas textures of the neutral Higgs bosons in the context of the THDM

Programmation de contraintes ou programmation automatique ? Constraint propagation or automatic

Panel 3 International Mathematical Knowledge Trust IMKT: Update on the Global Digital

Macro Dark Matter David M. Jacobs Claude Leon Postdoctoral Fellow University of Cape Town SLAC

Discovering dark matter Di Subir Sarkar University of Oxford & Niels Bohr Institute,

Mechanisms of Meaning Autumn 2010 Raquel Fernndez Institute for Logic, Language &