Cache-Oblivious String Dictionaries Gerth Stlting Brodal University - - PowerPoint PPT Presentation

cache oblivious string dictionaries
SMART_READER_LITE
LIVE PREVIEW

Cache-Oblivious String Dictionaries Gerth Stlting Brodal University - - PowerPoint PPT Presentation


slide-1
SLIDE 1

Cache-Oblivious String Dictionaries

Gerth Stølting Brodal

University of Aarhus Joint work with Rolf Fagerberg

✂✁ ✄ ☎✝✆ ✞ ✁ ✟ ✁ ✠ ✡ ✟ ☛☞✌ ✍ ✎ ✁ ✠ ✆ ☞ ✏ ✁ ✑ ✆ ✒ ✓✕✔ ✁ ✄ ✏ ✖ ✞✕✗ ✆✙✘ ✚ ✑ ✠ ✌ ✛✜ ✘ ✢✣ ✣✤ ✘ ✥ ✌ ✄ ✖ ✏ ✠ ✁ ✄ ✁ ✘ ✦ ✖ ☛ ✓★✧ ✩
slide-2
SLIDE 2

Outline of Talk

  • Cache-oblivious model
  • Basic cache-oblivious techniques
  • Cache-oblivious string algorithms
  • Cache-oblivious string dictionaries

– Cache-oblivious tries and blind tries

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙
slide-3
SLIDE 3

Hierarchical Memory Models

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑
slide-4
SLIDE 4

Hierarchical Memory

✂ ✄ ✂ ☎ ✆ ✝ ✞ ✟✡✠ ☛ ☞ ✌✍ ✎ ✏ ✠ ✑ ✍ ☛ ☛ ✌ ✎ ✎ ✒ ✏ ✓ ✌ ✍ ✠ ✔ ✎ ✕ ✍ ☛ ✌ ✂ ✖ ✗✙✘ ✗✛✚ ✗✙✜ ✗✛✢ ✘ ☞ ✑ ✣ ✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✤
slide-5
SLIDE 5

I/O Model

Aggarwal and Vitter 1988

✂ ✄✆☎ ✝ ✞ ✟✠ ✡ ☛ ☞ ✌ ✍ ✎ ✞ ✏ ✑ ✟ ✒ ✒ ✟ ✑ ✏ ✞ ✎ ✓

= problem size = memory size

= I/O block size

  • One I/O moves

consecutive records from/to disk

  • Complexity measure = number of I/Os
✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✕
slide-6
SLIDE 6

Ideal Cache Model — no parameters!?

Frigo, Leiserson, Prokop, Ramachandran 1999

  • Program with only one memory
  • Analyze in the I/O model for
✂ ✄ ☎ ☎ ✂ ☎ ✆ ✝ ✞ ✝✝✆ ✎ ✔✟✞
  • Optimal off-line cache replacement

strategy arbitrary

and

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✠
slide-7
SLIDE 7

Ideal Cache Model — no parameters!?

Frigo, Leiserson, Prokop, Ramachandran 1999

  • Program with only one memory
  • Analyze in the I/O model for
✂ ✄ ☎ ☎ ✂ ☎ ✆ ✝ ✞ ✝✝✆ ✎ ✔✟✞
  • Optimal off-line cache replacement

strategy arbitrary

and Advantages

  • Optimal on arbitrary level
  • ptimal on all levels
  • Portability,

and not hard-wired into algorithm

  • Dynamic changing

(and

)

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✠
slide-8
SLIDE 8

Cache-Oblivious Preliminaries

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑
slide-9
SLIDE 9

Cache-Oblivious Scanning

✓ ✔
✠ ☛ ✍ ☛ ✁ ✌ ✂ ✓ ✔

I/Os

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✄
slide-10
SLIDE 10

Cache-Oblivious Scanning

✓ ✔
✠ ☛ ✍ ☛ ✁ ✌ ✂ ✓ ✔

I/Os Corollary Cache-oblivious selection requires

✁ ✔ ✂

I/Os

Hoare 1961 / Blum et al. 1973

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✄
slide-11
SLIDE 11

Cache-Aware B-trees

  • ✁✂✁✄✁☎✁✂✁✄✁☎✁✂✁✄✁☎✁✂✁✄✁☎✁✂✁✄✁☎✁✂✁✄✁☎✁✂✁✄✁☎✁✂✁✄✆
✁☎✁✄✁✂✁☎✁✄✁✂✁☎✁✄✁✂✁☎✁✄✁✂✁☎✁✄✁✂✁☎✁✄✁✂✁☎✁✄✁✂✁☎✁✄✝ ✞ ✟ ✠✡ ☛ ☞ ✌ ✌ ✌ ✍ ✎✏ ✑ ✒ ✌✍ ☞ ☛ ✁ ✕ ✍ ✒ ✁ ✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✓
slide-12
SLIDE 12

Static Cache-Oblivious B-Tree

✁ ✂ ☎✄ ✂ ☎✄ ✁ ✌ ✌ ✌ ✌ ✌ ✌ ✆ ✝ ✆ ✞ ✟ ✠ ✡ ✆ ✞ ✟ ☛

Recursive layout of binary tree

van Emde Boas layout

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩✌
slide-13
SLIDE 13

Static Cache-Oblivious B-Tree

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✩
slide-14
SLIDE 14

Static Cache-Oblivious B-Tree

  • ✁✄✂
☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✩
slide-15
SLIDE 15

Static Cache-Oblivious B-Tree

  • ✁✄✂
☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✩
slide-16
SLIDE 16

Static Cache-Oblivious B-Tree

  • ✁✄✂
☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✩
slide-17
SLIDE 17

Static Cache-Oblivious B-Tree

  • Each green tree has height between
✂ ✟ ✔ ✂ ✁ ✄

and

✂ ✟ ✔
  • Searches visit between

and

green trees, i.e. perform at most

I/Os (misalignment)

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✩
slide-18
SLIDE 18

Summary Cache-Oblivious Tools

Scanning :

✁ ✔ ✂

B-tree searching :

Sorting

  • :
✂ ✓ ✔
✂ ✁ ✞
  • requires a tall cache assumption
✁ ✔ ✂ ✄ ☎

Frigo, Leiserson, Prokop, Ramachandran 1999 Brodal and Fagerberg 2002, 2003

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✙
slide-19
SLIDE 19

Cache-Oblivious String Algorithms

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩
slide-20
SLIDE 20

Knuth-Morris-Pratt String Matching

Knuth, Morris, Pratt 1977

☛ ✍
☛ ☛ ☛ ✍ ✍ ✍ ✁ ✍
  • Time
✂ ✄ ✁ ✄ ✂
  • Scans text left-to-right
  • Accesses the pattern (and failure function) like a stack
✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✤
slide-21
SLIDE 21

Knuth-Morris-Pratt String Matching

Knuth, Morris, Pratt 1977

☛ ✍
☛ ☛ ☛ ✍ ✍ ✍ ✁ ✍
  • Time
✂ ✄ ✁ ✄ ✂
  • Scans text left-to-right
  • Accesses the pattern (and failure function) like a stack
  • KMP is cache-oblivious and uses
✂ ✄ ✁ ✄ ✁ ✔ ✂

I/Os

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✤
slide-22
SLIDE 22

Suffix Tree/Suffix Array Construction

Farach et al. 2000

b $ a abacdacabab$ a b$ cdacabab$ abab$ dacabab$ c a b $ $ dacabab$ cdacabab$ b$ c abab$ dacabab$

aabacdacabab$

  • Reduces to sorting, i.e.
✁ ✂

I/Os

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✕
slide-23
SLIDE 23 ✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✠
slide-24
SLIDE 24

String Dictionaries

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩
slide-25
SLIDE 25

Tries vs Blind Tries

✄ ✁

Trie

✁✆☎ ✝
✞ ✂ ☎ ✟
✟ ✠ ☎ ✝

Blind trie Searches take

✂ ✄ ✂ ✄ ✂

time in internal memory for constant sized alphabets and

✂ ✡ ☛ ✄ ✂ ✄ ✂

time for comparison based alphabets

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✄
slide-26
SLIDE 26

The Trouble Starts...

– Tries cannot be stored cache-aware to support top-down searches in

☛ ✄ ✂ ✄ ✁ ✔ ✂

I/Os

Demaine et al 2004

– Can construct suffix trees cache-obliviously using

✁ ✂
✂ ✂

I/Os, but cannot search in it efficiently... + Cache-aware string B trees support searches in a set of strings in

☛ ✄ ✂ ✄ ✁ ✔ ✂

I/Os

Ferragina and Grossi 1999

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✩ ✓
slide-27
SLIDE 27

String Dictionary

✂✁
✍ ✍
✂✁
✍ ✍ ✔ ☛
✂ ✍☎✄ ✆
✝ ☛ ✄ ✞
✞ ✟ ✄ ✆

Queries: Search blind trie + Verify one string

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✌
slide-28
SLIDE 28

String Dictionary

✂✁
✍ ✍
✂✁
✍ ✍ ✔ ☛
✂ ✍☎✄ ✆
✝ ☛ ✄ ✞
✞ ✟ ✄ ✆

Queries: Search blind trie + Verify one string

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✌
slide-29
SLIDE 29

Suffix Tree

✂✁ ✄ ☎ ✆ ✁ ✝ ✞ ✁ ✄ ✆ ✁ ✝
✟ ✠ ✁ ✄
✄ ✡ ✁ ✄ ✡ ✁ ✄ ✠ ✁ ☎ ✞ ✁ ☛
✄ ✠ ✁ ✄
✄ ✠ ✁ ☎ ✞ ✁ ☛ ✡ ✁ ✄ ✞ ✁ ✄
✟ ✆ ✁ ✝ ✂✁ ✄ ☎
✡ ☞
✄ ☎ ✄✍ ✎ ✏ ✏ ✏ ✏ ✏ ✏ ✞
☎ ✍ ✑ ✟ ✒ ✎ ✝ ☛ ✄✓ ✄ ✄ ✄ ☎ ✄✍

Queries: Search blind trie + Verify one suffix

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✩
slide-30
SLIDE 30

Suffix Tree

✂✁ ✄ ☎ ✆ ✁ ✝ ✞ ✁ ✄ ✆ ✁ ✝
✟ ✠ ✁ ✄
✄ ✡ ✁ ✄ ✡ ✁ ✄ ✠ ✁ ☎ ✞ ✁ ☛
✄ ✠ ✁ ✄
✄ ✠ ✁ ☎ ✞ ✁ ☛ ✡ ✁ ✄ ✞ ✁ ✄
✟ ✆ ✁ ✝ ✂✁ ✄ ☎
✡ ☞
✄ ☎ ✄✍ ✎ ✏ ✏ ✏ ✏ ✏ ✏ ✞
☎ ✍ ✑ ✟ ✒ ✎ ✝ ☛ ✄✓ ✄ ✄ ✄ ☎ ✄✍

Queries: Search blind trie + Verify one suffix

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✩
slide-31
SLIDE 31

Tries

1 2 3 3 2 1

✂ ✄ ✂ ✁
✄ ✁ ✁✆☎ ✝
✞ ✂ ☎ ✟
✟ ✠ ☎ ✝

Queries: Search blind trie + Verify prefix of one path

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✙
slide-32
SLIDE 32

Tries

1 2 3 3 2 1

✂ ✄ ✂ ✁
✄ ✁ ✁✆☎ ✝
✞ ✂ ☎ ✟
✟ ✠ ☎ ✝

Queries: Search blind trie + Verify prefix of one path

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✙
slide-33
SLIDE 33

Verifying a Prefix of a Path in a Tree

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙
slide-34
SLIDE 34

Verifying Paths in Giraffe Trees is Easy

Definition A tree is a giraffe tree if all root-to-leaf paths share at least half

  • f the nodes of the tree (long neck)
✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✤
slide-35
SLIDE 35

Verifying Paths in Giraffe Trees is Easy

Definition A tree is a giraffe tree if all root-to-leaf paths share at least half

  • f the nodes of the tree (long neck)
  • A prefix of length
  • f a path in a giraffe tree using a BFS

layout can be traversed in

✔ ✂

I/Os

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✤
slide-36
SLIDE 36

Giraffe Cover of a Tree

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✕
slide-37
SLIDE 37

Giraffe Cover of a Tree

  • Uses space

and can be constructed greedily from left-to-right using

✁ ✔ ✂

I/Os by an Euler traversal of

  • BFS layout of each giraffe
  • A prefix of length
  • f a path in a known giraffe can be

traversed in

✔ ✂

I/Os

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✕
slide-38
SLIDE 38

Summary so far...

String dictionary search Suffix tree search Trie search

  • ✁✄✁✂✁
✁✂✁☎✄

reduce to blind trie search Query : Blind trie search +

✂ ✆ ☛ ✄ ✂ ✄ ✔

I/Os

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✠
slide-39
SLIDE 39

Cache-Oblivious (Blind) Tries

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙
slide-40
SLIDE 40

Cache-Oblivious (Blind) Tries

✂ ✄☎ ✆ ✝ ✞✠✟ ✡☞☛ ✌ ✍ ✎ ✡✠✏ ✑ ✒ ✟ ✡☞☛ ✓ ✔ ✓ ✆ ✕✗✖✘ ✙ ✖✚
✂ ✄☎ ✆ ✝
  • Partition input trie

into components (generalization of heavy paths)

= collapse components in

into high degree nodes and replace by weight balanced trees

  • Apply van Emde Boas layout out to
✁ ✛ ✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✄
slide-41
SLIDE 41

Cache-Oblivious (Blind) Tries

✂ ✄☎ ✆ ✝ ✞✠✟ ✡☞☛ ✌ ✍ ✎ ✡✠✏ ✑ ✒ ✟ ✡☞☛ ✓ ✔ ✓ ✆ ✕✗✖✘ ✙ ✖✚
✂ ✄☎ ✆ ✝
  • Partition input trie

into components (generalization of heavy paths)

= collapse components in

into high degree nodes and replace by weight balanced trees

  • Apply van Emde Boas layout out to
✁ ✛

Search:

I/O

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✄
slide-42
SLIDE 42

Cache-Oblivious (Blind) Tries

✂ ✄☎ ✆ ✝ ✞✠✟ ✡☞☛ ✌ ✍ ✎ ✡✠✏ ✑ ✒ ✟ ✡☞☛ ✓ ✔ ✓ ✆ ✕✗✖✘ ✙ ✖✚
✂ ✄☎ ✆ ✝
  • Partition input trie

into components (generalization of heavy paths)

= collapse components in

into high degree nodes and replace by weight balanced trees

  • Apply van Emde Boas layout out to
✁ ✛

Search:

I/O — ignoring searching inside components

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✄
slide-43
SLIDE 43

Decomposition into Components

✩ ✩ ✙
✩ ✙
✕ ✠
✓ ✩✌ ✩ ✙ ✩ ✩ ✩
✤ ✩ ✕ ✩ ✠ ✩
✄ ✟ ✞ ☛ ✡ ✝ ✁✄✂ ✞ ☎ ✞ ✆ ✝ ✝ ✝ ✑
✩ ✤ ✩ ✤ ✩ ✙ ✩
✩ ✩
✠ ✠ ✠ ✩ ✙ ✙ ✙ ✤
✩ ✤ ✩ ✤ ✩ ✤ ✙ ✩ ✠ ✩ ✠ ✩ ✠ ✩ ✩
✩ ✩
✙ ✩ ✤ ✤ ✞ ✜
✟✠ ✞ ✘
✡ ✒ ✞ ✟ ✌ ✡ ✡ ✒ ✞ ✟ ✩ ✡ ✡ ✒ ✞ ✟ ✙ ✡ ✡ ✒ ✞ ✟
☞ ✌ ✍ ✎✑✏ ✒ ✓ ☛ ☞ ✌ ✍ ✎✑✔ ✒ ✕ ✖ ✗ ✘ ✙ ✚ ✛ ✜ ✢ ✣✥✤ ✦ ✁ ✜ ✄ ✁ ✧ ★ ✩
✂ ✢ ✁ ✧ ★ ✩ ✥✪ ✂ ✫ ✬✮✭ ✯ ✂ ✰
✂ ✱ ✬✮✭ ✯ ✂ ✰
✂ ✲ ✄ ✟ ✳✵✴ ✚ ✘ ✜ ✢ ✣✥✤ ✦ ✁ ✜ ✄ ✁ ✧ ★ ✩ ✥✪ ✂ ✱ ✁ ✧ ★ ✩
✂ ✲ ✶ ✄ ✘ ✫ ✄ ✟ ✷✹✸ ✄ ✺ ✬ ✭ ✯ ✂ ✰
✂ ✱ ✬ ✭ ✯ ✂ ✰ ✥✪ ✂ ✲ ✄ ✟ ✷ ✴ ✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑ ✙ ✓
slide-44
SLIDE 44

Storing and Searching Components

✩ ✩ ✙
✩ ✙
✕ ✠
✓ ✩✌ ✩ ✙ ✩ ✩ ✩
✤ ✩ ✕ ✩ ✠ ✩
✄ ✟ ✞ ☛ ✡ ✝ ✁ ✂ ✞ ☎ ✞ ✆ ✝ ✝ ✝ ✑
  • ✩✌
✩ ✤ ✩ ✤ ✩ ✙ ✩
✩ ✩
✠ ✠ ✠ ✩ ✙ ✙ ✙ ✤
✩ ✤ ✩ ✤ ✩ ✤ ✙ ✩ ✠ ✩ ✠ ✩ ✠ ✩ ✩
✩ ✩
✙ ✩ ✤ ✤ ✞ ✜
  • Store each layer
✚ ✘ ✜

separately

  • Make a giraf-decompostition of
✚ ✘ ✜
  • For
✚ ✘ ✜

have a blind trie

  • f size
☎ ✟ ✷ ✂

(using BFS layout) to select the right giraffe-tree

  • Search:
✚ ✘ ✜

search the blind trie + search in one giraffe-tree

  • Distribute
✚ ✛ ✜
✂ ✜
✟ ✜
✁ ✁

in the van Emde Boas layout of

✁ ✛
  • Analysis:

– Search in blind trie for

✚ ✘ ✄ ✂ ✜

dominated by the matched characters in

✚ ✘ ✜

– Space in van Emde Boas layout for a subtree of size

becomes

✁ ✂ ✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑
slide-45
SLIDE 45

Cache-Oblivious Tries

There exists a cache-oblivious trie supporting prefix queries in

✡ ✄ ☛ ✄ ✂ ✄ ✁ ✔ ✂

I/Os

  • where

is the query string, and

is the number of leaves in the trie. It can be constructed in

✁ ✂
✂ ✂

time, where

is the total number of characters in the input. The space required is

. The structure assumes

✁ ✔ ✟ ✄
  • .
✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑
slide-46
SLIDE 46

Conclusion

  • A string dictionary (trie data structure) was presented that

supports queries in

☛ ✄ ✂ ✄ ✁ ✔ ✂

I/Os. The data structure uses

space and can be constructed using

✁ ✂
✂ ✂

I/Os.

  • Lookahead in the query string is crucial

(both cache-aware and cache-oblivious)

  • A giraffe cover is a simple construction allowing topdown

path traversals in a tree using

✂ ✄ ✂ ✄ ✁ ✔ ✂

I/Os

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑
slide-47
SLIDE 47

Open problems

  • Prove a lower bound trade-off between the number of I/Os

required for a query and the lookahead used

  • Implementation: compare with string B-trees, tries, ternary

trees, different trie layouts, ...

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑
slide-48
SLIDE 48

The End

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✡☛ ☞✍✌ ☞✞✎ ✏ ✑ ✒ ✓ ✔ ☞✖✕ ✗ ✘ ☞ ☎ ✓ ☞✞✎ ✕ ✂ ✔ ☞ ✝ ✑