Full Compressed Affix Tree Representations L.I.R.M.M. Universit e - - PowerPoint PPT Presentation

full compressed affix tree representations
SMART_READER_LITE
LIVE PREVIEW

Full Compressed Affix Tree Representations L.I.R.M.M. Universit e - - PowerPoint PPT Presentation

Full Compressed Affix Tree Representations L.I.R.M.M. Universit e de Montpellier Institut Biologie Computationnelle Introduction Basic Concepts A Classification Asynchronous Approaches Synchronous Approaches Results Conclusions &


slide-1
SLIDE 1

Full Compressed Affix Tree Representations

L.I.R.M.M. Universit´ e de Montpellier Institut Biologie Computationnelle

slide-2
SLIDE 2

Introduction Basic Concepts A Classification Asynchronous Approaches Synchronous Approaches Results Conclusions & Future Work

slide-3
SLIDE 3

Motivation

Bidirectional Search Example: Harpins

slide-4
SLIDE 4

Introduction Basic Concepts A Classification Asynchronous Approaches Synchronous Approaches Results Conclusions & Future Work

slide-5
SLIDE 5

Suffix Tree

slide-6
SLIDE 6

Suffix Tree Operations

slide-7
SLIDE 7

Suffix Arrays and Suffix Tree

slide-8
SLIDE 8

Suffix Arrays and Suffix Tree

slide-9
SLIDE 9

Burrows and Wheeler Transform (BWT)

slide-10
SLIDE 10

BWT: backward search

backwardSearch(c, [i, j]): i′ ← C[c] + Occ(c, i − 1) + 1 j′ ← C[c] + Occ(c, j)

slide-11
SLIDE 11

Affix Tree

◮ Combines Suffix Tree of T with the Suffix Tree T r ◮ Introduced by Stoye (2000) and Maaß (2003) ◮ Problem: Complexity of the structures presented and that it

uses about 45n bytes

slide-12
SLIDE 12

Introduction Basic Concepts A Classification Asynchronous Approaches Synchronous Approaches Results Conclusions & Future Work

slide-13
SLIDE 13

Asynchronous vs Synchronous

◮ Forward Structure (FOS) and the Backward Structure (BAS)

slide-14
SLIDE 14

Introduction Basic Concepts A Classification Asynchronous Approaches Synchronous Approaches Results Conclusions & Future Work

slide-15
SLIDE 15

Affix Array (AfA)

◮ Proposed by Strothmann (2007) ◮ Suffix Trees are stored using Suffix Arrays in addition with

extra data

◮ Connections between the trees are also stored (Affix links) ◮ Does not support all tree operations ◮ Total: around 18–22n bytes.

slide-16
SLIDE 16

Compressed Affix Tree (ACAT)

◮ Compressed Suffix Trees data structure ◮ Supports all tree operations ◮ Connections between the trees are also stored (Affix links)

slide-17
SLIDE 17

Affix Link

ALink(v) = Child(Alink(SLink(v)), c)

slide-18
SLIDE 18

Affix Link

ALink(v) = Child(Alink(SLink(v)), c)

slide-19
SLIDE 19

Affix Link

ALink(v) = Child(Alink(SLink(v)), c)

slide-20
SLIDE 20

Sampled Affix Link

slide-21
SLIDE 21

Compressed Affix Tree Sampled (ACATS)

◮ Compressed Suffix Trees data structure ◮ Sampled Affix links

slide-22
SLIDE 22

Compressed Affix Tree Non-Sampled

◮ Extreme case ACATS ◮ Albrecht and Heun (2012). Optimal computation of Affix

links using binary search

◮ Gog et al. (2014). Faster solution (ACATN)

slide-23
SLIDE 23

ACATN

slide-24
SLIDE 24

ACATN

slide-25
SLIDE 25

RACATN

slide-26
SLIDE 26

Introduction Basic Concepts A Classification Asynchronous Approaches Synchronous Approaches Results Conclusions & Future Work

slide-27
SLIDE 27

Bidirectional Wavelet Tree (BidWT)

◮ Proposed by Schnattinger et al. (2010 − 2012) and Lam et al.

(2009)

◮ Uses backward index for the input text T and for T r ◮ Easy transition between the data structures ◮ Reduce space in a factor of 23 compared to the Affix Array ◮ Main operation: extend in one character

slide-28
SLIDE 28

Bidirectional Wavelet Tree

slide-29
SLIDE 29

Bidirectional Wavelet Tree

slide-30
SLIDE 30

Bidirectional Wavelet Tree

slide-31
SLIDE 31

SCAT

slide-32
SLIDE 32

SCAT

slide-33
SLIDE 33

Summary

Approach Category Full Tree Description Operations Space AfA Asynchronous No Strothmann’s Affix Array 2 · (SA + LCP + child tables + ALink) ACAT Asynchronous Yes Asynchronous Affix Tree implementation 2 · (CST + ALink) ACATS Asynchronous Yes Asynchronous Affix Tree implementation 2 · (CST + Alinksampled) ACATN Asynchronous Yes Gog et al. Affix Tree 2 · (CST + rminq + rmaxq) RACATN Asynchronous Yes reduced of ACATN 2 · (CST + rminq) BidWT Synchronous No Bidirectional BWT 2 · (FM-Index) SCAT Synchronous Yes Synchronous Affix Tree implementation 2 · (CST)

Table: Compressed Affix Tree approaches studied in this work.

slide-34
SLIDE 34

Introduction Basic Concepts A Classification Asynchronous Approaches Synchronous Approaches Results Conclusions & Future Work

slide-35
SLIDE 35

Construction

DNA-50MB ENGLISH-50MB

10000 100000 1e+06 1e+07 1 10 100 Time in milliseconds Number of bytes per character AFA ACAT ACATS ACATN RACATN BidWT SCAT 10000 100000 1e+06 1e+07 1 10 100 Time in milliseconds Number of bytes per character AFA ACAT ACATS ACATN RACATN BidWT SCAT

slide-36
SLIDE 36

Forward-Backward

DNA-50MB ENGLISH-50MB

0.1 1 10 100 1 10 100 Time in microseconds Number of bytes per character AFA ACAT ACATS ACATN RACATN BidWT SCAT 1 10 100 1 10 100 Time in microseconds Number of bytes per character AFA ACAT ACATS ACATN RACATN BidWT SCAT

slide-37
SLIDE 37

Suffix-Children

DNA-50MB ENGLISH-50MB

1 10 100 1000 1 10 100 Time in microseconds Number of bytes per character AFA ACAT ACATS ACATN RACATN BidWT SCAT 1 10 100 1000 1 10 100 Time in microseconds Number of bytes per character AFA ACAT ACATS ACATN RACATN BidWT SCAT

slide-38
SLIDE 38

Slink

DNA-50MB ENGLISH-50MB

10 100 1 10 100 Time in microseconds Number of bytes per character ACAT ACATS ACATN RACATN SCAT 1 10 100 1 10 100 Time in microseconds Number of bytes per character ACAT ACATS ACATN RACATN SCAT

slide-39
SLIDE 39

Introduction Basic Concepts A Classification Asynchronous Approaches Synchronous Approaches Results Conclusions & Future Work

slide-40
SLIDE 40

Conclusions & Future Work

◮ Asynchronous and Synchronous classification ◮ Benchmark for the Compressed Affix Tree approaches ◮ Create a public library containing all the tools ◮ Still missing: pattern search with errors