comp omputation ional p pan an gen enomics omics wit with
play

Comp omputation ional P Pan- an-Gen enomics omics wit with - PowerPoint PPT Presentation

Comp omputation ional P Pan- an-Gen enomics omics wit with Ela last stic ic-D -Deg egen ener erate e Strin ings (a (a ca case se st study of of my my resea research rch) NAD NADIA IA PIS ISANT ANTI (this Department)


  1. Comp omputation ional P Pan- an-Gen enomics omics wit with Ela last stic ic-D -Deg egen ener erate e Strin ings (a (a ca case se st study of of my my resea research rch) NAD NADIA IA PIS ISANT ANTI (this Department) 5/12/2019 PhD Day 1

  2. The e pan-Gen enome ome Some definitions of pan- pan-gen enome ome: - ... describes the full complement of genes [...] which can have large variation in gene content among closely related strains [Wikipedia] - a collection of genomic sequences to be analyzed jointly or to be used as a reference [The Computational Pan-Genomics Consortium, 2016] Tradit ition ionall lly, , a ref referen erence ce gen enome ome is is: - a genome of a single selected individual, or - a consensus drawn from a population, or - a "functional" genome, or - a maximal genome capturing all ever- detected sequences - ... 5/12/2019 PhD Day 2

  3. ED- ED-st strin ings Ela last stic ic Deg egen ener erate e st strin ing as as a a natur natural al rep represen resentation ion of of a pan-gen enome ome It cor It corresp respon onds to o the e Varia iant Call ll For Forma mat (. (.vcf cf) ) st standard rd [e.g. data from rom the e 1000 1000 Gen enomes omes project roject] 5/12/2019 PhD Day 3

  4. Ref eferen erence ce Pan-Gen enome ome - Chea eaper er seq sequen encin cing: re- re-seq sequen encin cing beca ecame me a common common task sk. - In In gen enome ome analysis ysis wor orkflo flows ws, down wnst strea ream m of of re- re-seq sequen ences ces there ere is is the e task sk of of mappin ma ing rea reads (a (a st strin ing) ) on on a ref referen erence ce gen enome ome (a (a lon longer er st strin ing) It's PATTERN It N MATCHING ING: rea read is is P , ref , referen erence ce gen enome ome is is T T 5/12/2019 PhD Day 4

  5. EDSM EDSM prob roblem lem ELASTIC DEGENERATE STRING MATCHING (EDSM) ~ Input : a string P of length m, an ED string T of length n and total size N ~ Output : all positions in T where at least one occurrence of P ends P = CGGGT = CGGGTATA 5/12/2019 PhD Day 5

  6. Lower er bou ounds & & upper er bou ounds [IC ICAL ALP 2019] 2019] In [CPM 2017] we solved EDSM in O(N + n* m 2 ) time In [CPM 2018] they solve it in O(N + n* m 1.5 √(log m) ) time Can EDSM be improved further? In [ICALP 2019] we solve EDSM in O(N + n* m 1.381 ) time ... with an algebraic method! We show one can’t do better with combinatorial methods 5/12/2019 PhD Day 6

  7. Patter ern Match chin ing on on ED-st strin ing wit with er error rors [SPIRE 2017] Rea eads ca carry seq sequen encin cing er error rors: : ho how ca can we we rep represen resent them em? Hammin mming Dist istance ce: : Giv iven en tw two st strin ings X X and Y Y on on the e sa same me alp lphabet et and and havin ing the e sa same me len length, the e Ham amming ng Dist stanc ance d H (X, (X,Y) Y) bet etween een X X and Y Y is is the e numb mber er of of posit osition ions s in in wh which ich they ey dif iffer er. . X X = CGGG GGTATA A d H (X, (X,Y)= Y)=2 Y Y = CAGG GGCATA A Edit Distance: Giv iven en tw two st strin ings X X and Y Y on on the e sa same me alp lphabet et, the e edit edit Dist stanc ance d E (X, (X,Y) Y) is is the e numb mber er of of su subst stit itution ions, , in inser sertion ions, or or delet eletion ion of of a let letter er need eeded ed to o transf sfor orm X X in into Y Y (or (or vicev icever ersa sa, as as d E (X, (X,Y)= Y)=d E (Y (Y,X)). X)). X = CGGG X GGTAT AT-- --A A d E (X, (X,Y)= Y)=3 Y Y = CCGG GG-- --AT ATTA A 5/12/2019 PhD Day 7

  8. Deg egen ener erate e Strin ings Comp omparison ison STRING ING COMPAR ARIS ISON among (E)D-strings is a basic sic tool ool for or ma many ot other er prob roblems lems: Are two degenerate strings the same? Or similar? Or share sub-(E)D-strings? Motifs? Is one (E)D-string a substring of another (E)D-string? A Reverse? A Palindrome? 5/12/2019 PhD Day 8

  9. Deg egen ener erate e Strin ings Comp omparison ison � our result [WABI 2018] A definition of a match among D-strings (a step into formal languages and automata problems) A linear (O(N+M)) algorithm to tell whether two D-strings X (of size N) and Y (of size N) do match (“accidentally” solving an open formal languages and automata problem) An application of such D-strings comparison to the design of two algorithms to decompose a D-string into palindromes (a proof-of-concept on real RNA data) 5/12/2019 PhD Day 9

  10. Ref eferen erences ces The Computational Pan-Genomics Consortium: Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics 19(1): 118-135 (2018) R.Grossi, C.S.Iliopoulos, C.Liu, N.Pisanti, S.P. Pissis, A.Retha, G.Rosone, F.Vayani, L.Versari: On-Line Pattern Matching on Similar Texts. CPM 2017 : 9:1-9:14 G.Bernardini, N.Pisanti, S.P. Pissis, G.Rosone: Pattern Matching on Elastic-Degenerate Text with Errors. SPIRE 2017 : 74-90 [extended version in press in Theoretical Computer Science journal] M.Alzamel, L.A.K. Ayad, G.Bernardini, R.Grossi, C.S.Iliopoulos, N.Pisanti, S.P.Pissis, G.Rosone: Degenerate String Comparison and Applications. WABI 2018 : 21:1-21:14 G.Bernardini, P.Gawrychowski, N.Pisanti, S.P.Pissis, G.Rosone: Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication. ICALP 2019 : 21:1-21:15 5/12/2019 PhD Day 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend