Tries Data Structures and Algorithms for CL III, WS 2019-2020 - - PowerPoint PPT Presentation

tries
SMART_READER_LITE
LIVE PREVIEW

Tries Data Structures and Algorithms for CL III, WS 2019-2020 - - PowerPoint PPT Presentation

Department of General and Computational Linguistics Tries Data Structures and Algorithms for CL III, WS 2019-2020 Corina Dima corina.dima@uni-tuebingen.de M ICHAEL G OODRICH Data Structures & Algorithms in Python R OBERTO T AMASSIA M ICHAEL


slide-1
SLIDE 1

Corina Dima corina.dima@uni-tuebingen.de

Department of General and Computational Linguistics

Data Structures and Algorithms for CL III, WS 2019-2020

Tries

slide-2
SLIDE 2

Tries | 2

Data Structures & Algorithms in Python

MICHAEL GOODRICH ROBERTO TAMASSIA MICHAEL GOLDWASSER

13.5 Tries v Standard Tries v Compressed Tries v Suffix Tries

slide-3
SLIDE 3

Standard Tries

Tries | 3

slide-4
SLIDE 4

Standard Tries

  • a trie (pronounced „try“) is a tree-based data structure for storing strings in order to

support fast pattern matching

  • Main application: information retrieval
  • Primary query operations supported by tries: pattern matching, prefix matching
  • Approach suitable for applications where a series of queries is performed on a fixed text,

such that the initial cost of preprocessing the text is compensated by a speedup in each subsequent query

  • Example:
  • Website that offers pattern matching in works by Shakespeare
  • Text is large, immutable and often searched for
  • Trie: compact data structure for representing a set of strings, e.g. all the words in a text
  • Supports pattern-matching queries in time proportional to the pattern size

Tries | 4

slide-5
SLIDE 5

Standard Tries – Formal Definition

  • Let ! be a set of " strings from alphabet Σ such that no string in ! is a prefix for another

string

  • A standard trie for the set of strings ! is an ordered tree $ such that:
  • Each node of $, except the root, is labeled with a character from Σ
  • The children of an internal node of $ have distinct labels and are alphabetically
  • rdered
  • $ has " leaves, each associated with a string of !, such that the concatenation of the

labels of the nodes on the path from the root to a leaf & of $ yields the string of ! associated with &

Tries | 5

slide-6
SLIDE 6

Standard Tries - Example

  • Standard trie for the set of strings ! = {$%&', $%)), $*+, $,)), $,-, .%)), ./012, ./03}

Tries | 6

a e b r l l s u l l y e t l l

  • c

k p i d

slide-7
SLIDE 7

Standard Tries - Properties

  • An internal node can have anywhere between 1 and |Σ| children
  • In practice the average degree of internal nodes is small
  • On larger datasets, the average degree of nodes decreases with the depth of the tree

(fewer strings sharing a common prefix)

  • In many languages there are character combinations that are unlikely to occur
  • There is an edge connecting the root node to a child node for every character from Σ that

is the first character of a string from #

  • A path connecting the root node to an internal node $ at depth % corresponds to a %-

character prefix &[0: %] of a string & of #

  • A trie stores the common prefixes in a set of strings

Tries | 7

slide-8
SLIDE 8

Standard Tries – Properties (cont’d)

  • The following properties hold for a standard trie ! storing a collection " containing #

strings of total length $ from an alphabet Σ:

  • The height of the trie ! is equal to the length of the longest string in "
  • Every internal node of ! has at most |Σ| children
  • ! has # leaves
  • The number of nodes of ! is at most $ + 1
  • Worst case: no two strings share a common, non-empty prefix – i.e. except for the root, all

internal nodes have only one child

Tries | 8

slide-9
SLIDE 9

Trie Application: Map with String Keys

  • A search in a trie ! for the string " can be performed by tracing down from the root the

path indicated by the characters of "

  • If the path can be traced and terminates in a leaf node - " is a key in the map
  • If the path cannot be traced, or it can be traced but terminates at an internal node – "

is not a key in the map

Tries | 9

a e b r l l s u l l y e t l l

  • c

k p i d

  • bear
  • big
  • be

:bear :bell :bid :bull :buy :sell :stock :stop

slide-10
SLIDE 10

Trie Application: Map with String Keys (cont’d)

  • Running time for searching for a string ! of length "
  • At most " + 1 nodes of % are visited (the root + each of the characters)
  • At each node we spend at most &(|Σ|) time determining what edge to follow next – that is –

finding the child node which has the next character as its label

  • &(|Σ|) is achievable even if the children are unordered – each node has at most |Σ| children
  • Time can be improved by mapping characters to children by using at each node:
  • a secondary search table - & log Σ
  • a hash table - &(1)
  • a direct lookup table of size |Σ|, if |Σ| is small enough - & 1
  • Typically, the search for a string of length " runs in &(") time

Tries | 10

slide-11
SLIDE 11

Word Matching with a Trie

Tries | 11

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-12
SLIDE 12

Word Matching with a Trie (cont’d)

Tries | 12

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-13
SLIDE 13

Word Matching with a Trie (cont’d)

Tries | 13

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-14
SLIDE 14

Word Matching with a Trie (cont’d)

Tries | 14

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-15
SLIDE 15

Word Matching with a Trie (cont’d)

Tries | 15

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-16
SLIDE 16

Word Matching with a Trie (cont’d)

Tries | 16

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-17
SLIDE 17

Word Matching with a Trie (cont’d)

Tries | 17

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-18
SLIDE 18

Word Matching with a Trie (cont’d)

Tries | 18

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-19
SLIDE 19

Word Matching with a Trie (cont’d)

Tries | 19

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-20
SLIDE 20

Word Matching with a Trie (cont’d)

Tries | 20

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-21
SLIDE 21

Word Matching with a Trie (cont’d)

Tries | 21

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-22
SLIDE 22

Word Matching with a Trie (cont’d)

Tries | 22

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-23
SLIDE 23

Word Matching with a Trie (cont’d)

Tries | 23

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-24
SLIDE 24

Word Matching with a Trie (cont’d)

Tries | 24

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-25
SLIDE 25

Word Matching with a Trie (cont’d)

Tries | 25

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-26
SLIDE 26

Word Matching with a Trie (cont’d)

Tries | 26

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-27
SLIDE 27

Word Matching with a Trie (cont’d)

Tries | 27

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-28
SLIDE 28

Word Matching with a Trie (cont’d)

Tries | 28

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-29
SLIDE 29

Word Matching with a Trie (cont’d)

Tries | 29

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-30
SLIDE 30

Word Matching with a Trie (cont’d)

Tries | 30

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-31
SLIDE 31

Word Matching with a Trie (cont’d)

Tries | 31

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-32
SLIDE 32

Word Matching with a Trie (cont’d)

Tries | 32

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-33
SLIDE 33

Word Matching with a Trie (cont’d)

Tries | 33

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-34
SLIDE 34

Word Matching with a Trie (cont’d)

Tries | 34

a e b l s u l e t e 0, 24

  • c

i l r 6 l 78 d 47, 58 l 30 y 36 l 12 k 17, 40, 51, 62 p 84 h e r 69 a

s e e b e a r ? s e l l s t

  • c

k ! s e e b u l l ? b u y s t

  • c

k ! b i d s t

  • c

k ! a a h e t h e b e l l ? s t

  • p

! b i d s t

  • c

k !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r

87 88

slide-35
SLIDE 35

Standard Trie construction – Running Time

  • construct a standard trie ! for a set " of strings of total length # over the alphabet Σ
  • Incremental algorithm
  • Insert strings one at a time
  • no string of " is a substring of another string
  • append $ at the end of each string, where $ ∉ |Σ|, if there are prefixes
  • To insert the string ( of size ) into !: trace the path associated with ( in !, creating

new nodes if stuck

  • Running time for each insertion is similar to search: *() , Σ ) worst-case

performance, *()) when using a secondary hash table at each node

  • Total expected running time: *(#) - # is the total length of the strings in "

Tries | 35

slide-36
SLIDE 36

Better Tries?

  • Standard tries have a potential space inefficiency
  • There can be many nodes with only one child, which is wasteful
  • Solution
  • compressed tries (aka Patricia tries)

Tries | 36

a e b s u e t e

  • c

i r d y k h e r a

slide-37
SLIDE 37

Compressed Tries

Tries | 37

slide-38
SLIDE 38

Compressed Tries

  • Similar to a standard trie, but each internal node must have at least two children
  • Rule enforced by compressing chains of single-child nodes into individual edges
  • consider !, a standard trie; an internal node " of ! is redundant if " has only one child

and it is not the root

  • A chain of # ≥ 2 edges, "&, "(

"(, ") … (",-(, ",) is redundant if:

  • "0 is redundant for 1 = 1, … , # − 1
  • "& and ", are not redundant
  • A standard trie can be transformed into a compressed trie by replacing each redundant

chain of # ≥ 2 edges, "&, "( "(, ") … (",-(, ",), into a single edge "&, ", and relabeling ", with the concatenation of the labels of the nodes "(, … , ",

Tries | 38

slide-39
SLIDE 39

a e b r l l s u l l y e t l l

  • c

k p i d

Standard Trie → Compressed Trie Example

Tries | 39

e b ar ll s u ll y ell to ck p id

slide-40
SLIDE 40

Properties of Compressed Tries

  • in a compressed trie ! storing a collection " of # strings from an alphabet of size |Σ|
  • Every internal node of ! has at least two children and at most |Σ| children
  • ! has # leaves
  • The number of nodes in ! is '(#)
  • Remember from the standard trie: the number of nodes of ! was at most * + 1
  • Worst case: no two strings share a common, non-empty prefix – i.e. except for the root, all

internal nodes have only one child

  • The worst-case scenario corresponds then to a compressed trie where all the chains from

the root are redundant - and are therefore compressed – leading to '(#) nodes

Tries | 40

slide-41
SLIDE 41

Compact Representation

  • A compressed trie stores the node labels, which can be large (and have redundancy)
  • Does the compression of the paths really provide an advantage?
  • Yes, when using the compressed trie as an auxiliary index structure – not storing the

actual characters, only indices into an already stored collection of strings

Tries | 41

slide-42
SLIDE 42

Compact Representation - Example

Tries | 42

s e e b e a r s e l l s t o c k b u l l b u y b i d h e b e l l s t o p

0 1 2 3 4

a r

S[0] = S[1] = S[2] = S[3] = S[4] = S[5] = S[6] = S[7] = S[8] = S[9] = 0 1 2 3 0 1 2 3

slide-43
SLIDE 43

Compact Representation – Example (cont’d)

Tries | 43

a e b l s u l e t e

  • c

i l r l d l y l k p h e r a e b s u e to e ar ll id ll y ll ck p hear

remove redundant chains & relabel

slide-44
SLIDE 44

b u l l b u y b i d

S[4] = S[5] = S[6] = 0 1 2 3

Compact Representation – Example (cont’d)

Tries | 44

1, 1, 1 1, 0, 0 0, 0, 0 4, 1, 1 0, 2, 2 3, 1, 2 1, 2, 3 8, 2, 3 6, 1, 2 4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3 7, 0, 3 0, 1, 1

e b s u e to e ar ll id ll y ll ck p hear

h e b e l l s t o p a r

S[7] = S[8] = S[9] = 0 1 2 3

s e e b e a r s e l l s t o c k

0 1 2 3 4 S[0] = S[1] = S[2] = S[3] =

slide-45
SLIDE 45

Compact representation gains

  • Using the indexing method the total space required for storing the trie is reduced from

!(#) – where # is the total length of the strings in % - to !(&) – where & is the number of strings in %

  • The strings themselves must be stored in an additional structure, but we reduce the

space required for the trie - which we need for searching

  • Searching in a compressed trie is not necessarily faster than in a standard trie
  • Every character of the searched string must be compared with the character labels

while traversing the trie

  • Labels can be multi-character

Tries | 45

slide-46
SLIDE 46

Tree Traversal

  • to compress the nodes, one would need to traverse the trie (which is a tree)
  • a traversal visits the nodes of a tree in a systematic manner
  • types of traversals for n-ary trees
  • preorder traveral
  • postorder traversal

Tries | 46

slide-47
SLIDE 47

Preorder Traversal

  • in preorder traversal, a node is visited before its descendants

Tries | 47

Algorithm preOrder(v) visit(v) for each child w of v preorder (w)

e b s u e to e ar ll id ll y ll ck p hear

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

slide-48
SLIDE 48

Postorder Traversal

  • in postorder traversal, a node is visited after its descendants

Tries | 48

e b s u e to e ar ll id ll y ll ck p hear

17 8 3 1 2 4 7 5 6 9 16 12 10 11 15 13 14

Algorithm postOrder(v) for each child w of v postOrder (w) visit(v)

slide-49
SLIDE 49

Suffix Tries

Tries | 49

slide-50
SLIDE 50

Suffix Tries

  • The strings in the collection ! are all

suffixes of a string "

  • Called a suffix trie, suffix tree or position

tree

  • Example " = pet on carpet
  • ! was supposed to be a set of % strings

from alphabet Σ such that no string in ! is a prefix for another string

  • Add an artificial character $, which is not

part of Σ, at the end of each suffix

  • For " of length (, we build a trie using the

set of ( strings "[*: (], * = 0 … ( − 1

Tries | 50

p e t ‘ ’

  • n

‘ ’ c a r p e t $ e t ‘ ’

  • n

‘ ’ c a r p e t $ t ‘ ’

  • n

‘ ’ c a r p e t $ ‘ ’

  • n

‘ ’ c a r p e t $

  • n

‘ ’ c a r p e t $ n ‘ ’ c a r p e t $ ‘ ’ c a r p e t $ c a r p e t $ a r p e t $ p e t $ e t $ t $ $ Collection ! used to build the suffix trie

slide-51
SLIDE 51

Tries | 51

p e t "

  • n

“ c a r p e t

Suffix Trie Construction

$

slide-52
SLIDE 52

Tries | 52

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t

Suffix Trie Construction

$ $

slide-53
SLIDE 53

Tries | 53

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t

Suffix Trie Construction

$ $ $

slide-54
SLIDE 54

Tries | 54

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

Suffix Trie Construction

$ $ $ $

slide-55
SLIDE 55

Tries | 55

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t

Suffix Trie Construction

$ $ $ $ $

slide-56
SLIDE 56

Tries | 56

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t

Suffix Trie Construction

$ $ $ $ $ $

slide-57
SLIDE 57

Tries | 57

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t

Suffix Trie Construction

$ $ $ $ $ $ $

slide-58
SLIDE 58

Tries | 58

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t c a r p e t

Suffix Trie Construction

$ $ $ $ $ $ $ $

slide-59
SLIDE 59

Tries | 59

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t c a r p e t a r p e t

Suffix Trie Construction

$ $ $ $ $ $ $ $ $

slide-60
SLIDE 60

Tries | 60

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t c a r p e t a r p e t r p e t

Suffix Trie Construction

$ $ $ $ $ $ $ $ $ $

slide-61
SLIDE 61

Tries | 61

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t c a r p e t a r p e t r p e t

Suffix Trie Construction

$ $ $ $ $ $ $ $ $ $ $

slide-62
SLIDE 62

Tries | 62

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t c a r p e t a r p e t r p e t

Suffix Trie Construction

$ $ $ $ $ $ $ $ $ $ $ $

slide-63
SLIDE 63

Tries | 63

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t c a r p e t a r p e t r p e t

Suffix Trie Construction

$ $ $ $ $ $ $ $ $ $ $ $ $

slide-64
SLIDE 64

Tries | 64

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t c a r p e t a r p e t r p e t

Suffix Trie Construction

$ $ $ $ $ $ $ $ $ $ $ $ $ $

slide-65
SLIDE 65

Tries | 65

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t c a r p e t a r p e t r p e t

Suffix Trie: Compress

$ $ $ $ $ $ $ $ $ $ $ $ $ $

slide-66
SLIDE 66

Tries | 66

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t

  • n

“ c a r p e t n “ c a r p e t c a r p e t a r p e t r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ $ $ $ $ $ $ $ carpet$

slide-67
SLIDE 67

Tries | 67

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t n “ c a r p e t c a r p e t a r p e t r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ $ $ $ $ $ $ carpet$

  • n” ”carpet$
slide-68
SLIDE 68

Tries | 68

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t n “ c a r p e t c a r p e t r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ $ $ $ $ $ carpet$

  • n” ”carpet$

arpet$

slide-69
SLIDE 69

Tries | 69

p e t "

  • n

“ c a r p e t e t "

  • n

“ c a r p e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t n “ c a r p e t r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ $ $ $ $ carpet$

  • n” ”carpet$

carpet$ arpet$

slide-70
SLIDE 70

Tries | 70

p e t "

  • n

“ c a r p e t e t t "

  • n

“ c a r p e t "

  • n

“ c a r p e t n “ c a r p e t r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ arpet$

slide-71
SLIDE 71

Tries | 71

p e t "

  • n

“ c a r p e t et t "

  • n

“ c a r p e t "

  • n

“ c a r p e t n “ c a r p e t r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ arpet$

slide-72
SLIDE 72

Tries | 72

p e t "

  • n

“ c a r p e t et t "

  • n

“ c a r p e t "

  • n

“ c a r p e t r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ n” ”carpet$ arpet$

slide-73
SLIDE 73

Tries | 73

p e t "

  • n

“ c a r p e t et t "

  • n

“ c a r p e t " r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ n” ”carpet$ “ ”on” ”carpet$ arpet$

slide-74
SLIDE 74

Tries | 74

pet et t "

  • n

“ c a r p e t " r p e t

Suffix Trie: Compress (cont’d)

$ $ $ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ n” ”carpet$ “ ”on” ”carpet$ “ ”on” ”carpet$ arpet$

slide-75
SLIDE 75

Tries | 75

pet et t "

  • n

“ c a r p e t "

Suffix Trie: Compress (cont’d)

$ $ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ n” ”carpet$ “ ”on” ”carpet$ “ ”on” ”carpet$ rpet$ arpet$

slide-76
SLIDE 76

Tries | 76

pet et t "

Suffix Trie: Compress (cont’d)

$ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ n” ”carpet$ “ ”on” ”carpet$ “ ”on” ”carpet$ rpet$ “ ”on” ”carpet$ arpet$

slide-77
SLIDE 77

Tries | 77

pet et t " $ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ n” ”carpet$ “ ”on” ”carpet$ “ ”on” ”carpet$ rpet$ “ ”on” ”carpet$

  • Many substrings appear over and over again – impractical to store each label directly
  • Instead, store only the indices in an array

Suffix Trie: Index

arpet$

slide-78
SLIDE 78

Tries | 78

pet et t " $ $ $ $ carpet$

  • n” ”carpet$

carpet$ “ ”on” ”carpet$ n” ”carpet$ “ ”on” ”carpet$ “ ”on” ”carpet$ rpet$ “ ”on” ”carpet$

p e t ‘ ’

  • n

‘ ’ c a r p e t $ 1 2 3 4 5 6 7 8 9 10 11 12 13

Suffix Trie: Index (cont’d)

arpet$

slide-79
SLIDE 79

Tries | 79

0:2 1:2 12:12 3:3 13:13 13:13 13:13 7:13 4:13 8:13 7:13 3:13 5:13 3:13 3:13 9:13 3:13

p e t ‘ ’

  • n

‘ ’ c a r p e t $ 1 2 3 4 5 6 7 8 9 10 11 12 13

Suffix Trie: Index (cont’d)

13:13

slide-80
SLIDE 80

Suffix Tries: Properties

  • A suffix trie saves space compared to a standard trie, because of using compression
  • The length of all the suffixes of a string ! of length " is

" + " − 1 + " − 2 + … + 2 + 1 = " " + 1 2

  • A standard trie uses )("+) space to store all the suffixes of !
  • A suffix trie (which is compressed) uses only )(") space
  • A suffix trie can be constructed using the simple procedure described in )(|Σ|"+) time

because the total length of the suffixes is quadratic in "

  • For production use, there is are linear-time algorithms - )(") - for constructing suffix tries
  • P. Weiner. 1973. Linear Pattern Matching Algorithms.
  • E. Ukkonen. 1995. On-line Construction of Suffix Trees.

Tries | 80

slide-81
SLIDE 81

Thank you.