Space-efficient construction of succinct de Bruijn graphs Felipe A. - PowerPoint PPT Presentation

Space-efficient construction of succinct de Bruijn graphs Felipe A. Louza University of S˜ ao Paulo, Brazil Joint work with Lavinia Egidi and Giovanni Manzini. LSD/LAW London, 6-7 Feb. 2019

Outline 1. Introduction 2. BOSS construction 3. Merging dBGs 4. Space-efficient BOSS construction 5. References Felipe A. Louza (USP) Space-efficient construction of dBGs 2 / 21

de Bruijn graphs (dBGs) Definitions: ◮ Given a collection of strings S , a de Bruijn graph of order k is a directed graph containing: ◮ a node v for every unique k -mer v [1] ... v [ k ] in S . ◮ an edge ( u , v ) with label v [ k ] if there is a ( k + 1) -mer u [1] ... u [ k ] v [ k ] in S . Example: ◮ S = { TACACT, TACTCA, GACTCG } C ACA CAC TCG A T G T C TAC ACT CTC T A GAC TCA Felipe A. Louza (USP) Space-efficient construction of dBGs 3 / 21

Succinct representation of dBGs: BOSS ∗ : ◮ In [Bowe et al. , WABI 2012] introduced a succinct representation for dBGs in space O ( | E | log σ ) bits. ◮ BOSS representation: ◮ Outgoing edges of each v i : are encoded into the substring W i = v j [ k ] . . . v k [ k ] W i = AT , W j = AG ◮ W i are concatenated considering the order of the reversed labels ← − v i = v i [ k ] ... v i [1] TAC ≺ ← ← − − − − CTC Example: ◮ S = { TACACT, TACTCA, GACTCG } T C $ ACA CAC TCG $$$ $$T · · · A A T G C T C G $TA TAC ACT CTC T A $ A C $$G $GA GAC TCA · · · ∗ for the authors’ initials Felipe A. Louza (USP) Space-efficient construction of dBGs 4 / 21

Succinct representation of dBGs: BOSS ∗ : ◮ In [Bowe et al. , WABI 2012] introduced a succinct representation for dBGs in space O ( | E | log σ ) bits. ◮ BOSS representation: ◮ Outgoing edges of each v i : are encoded into the substring W i = v j [ k ] . . . v k [ k ] W i = AT , W j = AG ◮ W i are concatenated considering the order of the reversed labels ← − v i = v i [ k ] ... v i [1] TAC ≺ ← ← − − − − CTC Example: ◮ S = { $$$TACACT, $$$TACTCA, $$$GACTCG } T C $$$ $$T ACA CAC TCG · · · A A T G C T C $TA TAC ACT CTC G T A A C $$G $GA GAC TCA For convenience, we add k copies of a symbol $ at the beginning of each string s i . Felipe A. Louza (USP) Space-efficient construction of dBGs 4 / 21

Succinct representation of dBGs: BOSS ∗ : ◮ In [Bowe et al. , WABI 2012] introduced a succinct representation for dBGs in space O ( | E | log σ ) bits. ◮ BOSS representation: ◮ Outgoing edges of each v i : are encoded into the substring W i = v j [ k ] . . . v k [ k ] W i = AT , W j = AG ◮ W i are concatenated considering the order of the reversed labels ← − v i = v i [ k ] ... v i [1] TAC ≺ ← ← − − − − CTC Example: ◮ S = { $$$TACACT, $$$TACTCA, $$$GACTCG } T C $$$ $$T ACA CAC TCG · · · A A T G C T C $TA TAC ACT CTC G T A A C $$G $GA GAC TCA The label of every node can be recovered. Felipe A. Louza (USP) Space-efficient construction of dBGs 4 / 21

Succinct representation of dBGs: BOSS ∗ : ◮ In [Bowe et al. , WABI 2012] introduced a succinct representation for dBGs in space O ( | E | log σ ) bits. ◮ BOSS representation: ◮ Outgoing edges of each v i : are encoded into the substring W i = v j [ k ] . . . v k [ k ] W i = AT , W j = AG ◮ W i are concatenated considering the order of the reversed labels ← − v i = v i [ k ] ... v i [1] TAC ≺ ← ← − − − − CTC Example: ◮ S = { $$$TACACT, $$$TACTCA, $$$GACTCG } T C $ ACA CAC TCG $$$ $$T · · · A A T G C T C G $TA TAC ACT CTC T A $ A C $$G $GA GAC TCA · · · Felipe A. Louza (USP) Space-efficient construction of dBGs 4 / 21

Succinct representation of dBGs: BOSS: ◮ Nodes v i = v i [1] ... v i [ k ] are sorted by their reversed labels ← − v i = v i [ k ] ... v i [1] ◮ We mark the position of the last outgoing edge of each node. ◮ We mark as negative ( − ) incoming edges with the same label (except the first). last Nodes W 0 $ $ $ G 1 $ $ $ T 1 AC A C 1 T C A $ 1 $ GA C T C $$$ $$T ACA CAC TCG 1 $ T A C · · · 1 C AC T A A T G 1 GAC T- C T C $TA TAC ACT CTC G 0 T AC A T A 1 T AC T- A C 0 C T C A $$G $GA GAC TCA 1 C T C G 1 $ $ G A 1 T C G $ 1 $ $ T A 1 AC T C Felipe A. Louza (USP) Space-efficient construction of dBGs 5 / 21

Succinct representation of dBGs: BOSS: ◮ LF-mapping between the positive symbols in W and the Nodes [ k ] (with last = 1). ◮ Fast navigation operations: Outdegree, Outgoing, Indegree and Incoming. ◮ Small space: O ( m log σ ) + m + o ( m ) bits for rank and select operations. last Nodes W 0 $ $ $ G 1 $ $ $ T 1 AC A C 1 T C A $ 1 $ GA C T C $$$ $$T ACA CAC TCG 1 $ T A C · · · 1 C AC T A A T G 1 GAC T- C T C $TA TAC ACT CTC G 0 T AC A T A 1 T AC T- A C 0 C T C A $$G $GA GAC TCA 1 C T C G 1 $ $ G A 1 T C G $ 1 $ $ T A 1 AC T C Similar to the BWT and XBW. Felipe A. Louza (USP) Space-efficient construction of dBGs 6 / 21

Succinct representation of dBGs: BOSS: ◮ LF-mapping between the positive symbols in W and the Nodes [ k ] (with last = 1). ◮ Fast navigation operations: Outdegree, Outgoing, Indegree and Incoming. ◮ Small space: O ( m log σ ) + m + o ( m ) bits for rank and select operations. last Nodes W 0 $ $ $ G 1 $ $ $ T 1 AC A C 1 T C A $ 1 $ GA C T C $$$ $$T ACA CAC TCG 1 $ T A C · · · 1 C AC T A A T G 1 GAC T- C T C $TA TAC ACT CTC G 0 T AC A T A 1 T AC T- A C 0 C T C A $$G $GA GAC TCA 1 C T C G 1 $ $ G A 1 T C G $ 1 $ $ T A 1 AC T C Select operation. Felipe A. Louza (USP) Space-efficient construction of dBGs 6 / 21

Space-efficient construction of succinct de Bruijn graphs Felipe A. - PowerPoint PPT Presentation

Space-efficient construction of succinct de Bruijn graphs Felipe A. Louza University of S ao Paulo, Brazil Joint work with Lavinia Egidi and Giovanni Manzini. LSD/LAW London, 6-7 Feb. 2019 Outline 1. Introduction 2. BOSS construction 3.

In-memory processing of big data via succinct data structures Rajeev Raman University of

Orthogonal labelings in de Bruijn graphs Luca Mariot L.Mariot@tudelft.nl IWOCA 2020 Open

De Bruijn graphs and their foldings Peter J. Cameron University of St Andrews (Joint work with

On the representation of de Bruijn Graphs Rayan Chikhi joint work with P . Medvedev, A.

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Statistical Encoding of Succinct Data Structures alez 1 Gonzalo Navarro 1 Rodrigo Gonz 1

in Succinct Games Hesam Nikpey Pooya Shati Social and Economical Networks Dr. Fazli Spring

Succinct 2D Dictionary Matching with No Slowdown Shoshana Neuburger and Dina Sokol City

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett BUILDING BETTER TOOLS Cache-Oblivious

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

PI is not at least as succinct as MODS Nikolay Kaleyski July 7, 2017 Nikolay Kaleyski PI is not

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

A Practical Succinct Data Structure for Tree-Like Graphs Johannes Fischer TU Dortmund, Germany

Telescope Array search for EeV photons Yana Zhezher (on behalf of Mikhail Kuznetsov), Oleg

Jeta Gardens 4H Model Operationalising Compassion in Aged Care Leong Hiew Operations Manager

Observations COTS software in government systems is generally out of date at IOC and falls

Aged Care Assessment Team (ACAT) & Transition Care Program (TCP) staff feedback forum June

Reproducibility and Open Science Follow along at: https://gordonwatts.github.io/ros-roadshow 1 /

fwcp.ca Technology 2 Welcome and Introductions Chelsea Coady , FWCP Peace Region Manager Deepa

Multi-Layer Networks & Back-Propagation M. Soleymani Deep Learning Sharif University of

Detection of unusual words Detection of unusual words GIVEN GIVEN a text a text x x

Space-efficient construction of succinct de Bruijn graphs Felipe A. - PowerPoint PPT Presentation

Space-efficient construction of succinct de Bruijn graphs Felipe A. Louza University of S ao Paulo, Brazil Joint work with Lavinia Egidi and Giovanni Manzini. LSD/LAW London, 6-7 Feb. 2019 Outline 1. Introduction 2. BOSS construction 3.

In-memory processing of big data via succinct data structures Rajeev Raman University of

Orthogonal labelings in de Bruijn graphs Luca Mariot L.Mariot@tudelft.nl IWOCA 2020 Open

De Bruijn graphs and their foldings Peter J. Cameron University of St Andrews (Joint work with

On the representation of de Bruijn Graphs Rayan Chikhi joint work with P . Medvedev, A.

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Statistical Encoding of Succinct Data Structures alez 1 Gonzalo Navarro 1 Rodrigo Gonz 1

in Succinct Games Hesam Nikpey Pooya Shati Social and Economical Networks Dr. Fazli Spring

Succinct 2D Dictionary Matching with No Slowdown Shoshana Neuburger and Dina Sokol City

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett BUILDING BETTER TOOLS Cache-Oblivious

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

PI is not at least as succinct as MODS Nikolay Kaleyski July 7, 2017 Nikolay Kaleyski PI is not

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

A Practical Succinct Data Structure for Tree-Like Graphs Johannes Fischer TU Dortmund, Germany

Telescope Array search for EeV photons Yana Zhezher (on behalf of Mikhail Kuznetsov), Oleg

Jeta Gardens 4H Model Operationalising Compassion in Aged Care Leong Hiew Operations Manager

Observations COTS software in government systems is generally out of date at IOC and falls

Aged Care Assessment Team (ACAT) &amp; Transition Care Program (TCP) staff feedback forum June

Reproducibility and Open Science Follow along at: https://gordonwatts.github.io/ros-roadshow 1 /

fwcp.ca Technology 2 Welcome and Introductions Chelsea Coady , FWCP Peace Region Manager Deepa

Multi-Layer Networks &amp; Back-Propagation M. Soleymani Deep Learning Sharif University of

Detection of unusual words Detection of unusual words GIVEN GIVEN a text a text x x

Aged Care Assessment Team (ACAT) & Transition Care Program (TCP) staff feedback forum June

Multi-Layer Networks & Back-Propagation M. Soleymani Deep Learning Sharif University of