An Introduction to Tries Kevin Leckey Monash University 21.09.2015 - - PowerPoint PPT Presentation
An Introduction to Tries Kevin Leckey Monash University 21.09.2015 - - PowerPoint PPT Presentation
An Introduction to Tries Kevin Leckey Monash University 21.09.2015 Introduction CS Background Given: Words, e.g. in binary code 1 = 11010 . . . , 2 = 00011 . . . , 3 = 01101 . . . , 4 = 00000 . . . , 5 = 11111 . . . , 6 =
Introduction CS Background
Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19
Introduction CS Background
Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19
Introduction CS Background
Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations → Use tree-like data structures such as a Trie
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19
Introduction CS Background
Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations → Use tree-like data structures such as a Trie (Information retrieval)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19
Introduction CS Background
Constructing a Trie
Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19
Introduction CS Background
Constructing a Trie
Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19
Introduction CS Background
Constructing a Trie
Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .
Ξ1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19
Introduction CS Background
Constructing a Trie
Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19
Introduction CS Background
Constructing a Trie
Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Searching cost = Depth of Ξ1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Searching cost = Depth of Ξ1 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Searching cost = Depth of Ξ1 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Searching cost = Depth of Ξ1 = 3 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction CS Background
Searching
Search for Ξ1 = 11010 . . .
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Searching cost = Depth of Ξ1 = 3 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie = 4
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19
Introduction The Probabilistic Model
Input Model
Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19
Introduction The Probabilistic Model
Input Model
Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19
Introduction The Probabilistic Model
Input Model
Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19
Introduction The Probabilistic Model
Input Model
Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:
- independent,
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19
Introduction The Probabilistic Model
Input Model
Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:
- independent,
- P(ξj = 0) = 1/2 = P(ξj = 1)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19
Introduction The Probabilistic Model
Input Model
Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:
- independent,
- P(ξj = 0) = 1/2 = P(ξj = 1)
More general models allow ξ1, ξ2, . . . to be dependent (e.g. evolving as a Markov chain)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ1, Ξ2
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ1, Ξ2
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ3 Ξ2 Ξ1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ4 Ξ2 Ξ1 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ4 Ξ2 Ξ1 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1, Ξ4 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ4 Ξ1 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1 Ξ4 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ5 Ξ2 Ξ1 Ξ4 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1 Ξ4 Ξ3, Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1 Ξ4 Ξ5 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1 Ξ4 Ξ3, Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1 Ξ4 Ξ5 Ξ3
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Introduction The resulting random Tree
A recursive construction of the Trie
Ξ2 Ξ1 Ξ4 Ξ3 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19
Analysis The Depth
Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1?
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth
Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . .
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth
Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth
Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =
- 1 −
1 2 kn−1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth
Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =
- 1 −
1 2 kn−1 Consequence: P(Dn ≤ α log2(n)) =
- 1 − n−αn−1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth
Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =
- 1 −
1 2 kn−1 Consequence: P(Dn ≤ α log2(n)) =
- 1 − n−αn−1 n→∞
− →
- 1,
if α > 1, 0, if α < 1.
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19
Analysis The Depth
Results on Dn
Shown on the previous slide: Dn log2(n)
P
− → 1 (n → ∞)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19
Analysis The Depth
Results on Dn
Shown on the previous slide: Dn log2(n)
P
− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈
- 1 − 2−x
n n−1
n→∞
− → e−2−x (Limit is a Gumbel distribution known from extreme value theory)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19
Analysis The Depth
Results on Dn
Shown on the previous slide: Dn log2(n)
P
− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈
- 1 − 2−x
n n−1
n→∞
− → e−2−x (Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E[Dn] = log2(n) + Ψ(log2(n)) + o(1) with periodic function Ψ
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19
Analysis The Depth
Results on Dn
Shown on the previous slide: Dn log2(n)
P
− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈
- 1 − 2−x
n n−1
n→∞
− → e−2−x (Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E[Dn] = log2(n) + Ψ(log2(n)) + o(1) with periodic function Ψ Thm (Szpankowski ’86): Var(Dn) ∼ Φ(log2(n)) with periodic function Φ
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19
Analysis The Height
Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie?
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height
Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}.
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height
Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) =
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height
Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n})
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height
Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n))
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height
Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·
- 1 −
- 1 − n−αn
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height
Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·
- 1 −
- 1 − n−αn
≤ n2−α
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height
Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·
- 1 −
- 1 − n−αn
≤ n2−α Consequence: P (Hn > α log2(n)) → 0 for α > 2
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19
Analysis The Height
Results on Hn
Partly proven on the previous slide: Hn 2 log2(n)
P
− → 1
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19
Analysis The Height
Results on Hn
Partly proven on the previous slide: Hn 2 log2(n)
P
− → 1 Thm (Devroye ’84): lim
n→∞ P(Hn − 2 log2(n) − 1 ≤ x) = exp(−2−x),
x ∈ R
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19
Analysis The Height
Results on Hn
Partly proven on the previous slide: Hn 2 log2(n)
P
− → 1 Thm (Devroye ’84): lim
n→∞ P(Hn − 2 log2(n) − 1 ≤ x) = exp(−2−x),
x ∈ R Thm (Regnier ’82): E[Hn] ∼ 2 log2(n) (n → ∞) (Flajolet, Steyaert ’82 → periodic second order term)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19
Analysis The Height
Summary: Typical depth: log2(n), height: 2 log2(n).
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19
Analysis The Height
Summary: Typical depth: log2(n), height: 2 log2(n). Profile (Park, Hwang, Nicod` eme, Szpankowski):
log2
n log n + O(1)
log2 n + O(1) 2 log2 n + O(1) log2
n log n + O(1)
log2 n + O(1) 2 log2 n + O(1)
(External nodes/Leaves) (Internal nodes)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19
Analysis The External Path Length
Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=
n
- i=1
Dn,i, Dn,i = Dn(Ξi).
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19
Analysis The External Path Length
Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=
n
- i=1
Dn,i, Dn,i = Dn(Ξi).
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19
Analysis The External Path Length
Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=
n
- i=1
Dn,i, Dn,i = Dn(Ξi).
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Example: L6 = 2 + 3 + 4 · 4 = 21
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19
Analysis The External Path Length
A Recursion for Ln
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length
A Recursion for Ln
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length
A Recursion for Ln
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kn = # words starting with 0
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length
A Recursion for Ln
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kn = # words starting with 0 Ln
d
=
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length
A Recursion for Ln
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kn = # words starting with 0 Ln
d
= LKn
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length
A Recursion for Ln
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kn = # words starting with 0 Ln
d
= LKn + ˜ Ln−Kn
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length
A Recursion for Ln
Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5
Kn = # words starting with 0 Ln
d
= LKn + ˜ Ln−Kn + n
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
- 1. Rescaling: Xn = (Ln − E[Ln])/
- Var(Ln)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
- 1. Rescaling: Xn = (Ln − E[Ln])/
- Var(Ln)
Xn
d
= An,1XKn + An,2 Xn−Kn + bn
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
- 1. Rescaling: Xn = (Ln − E[Ln])/
- Var(Ln)
Xn
d
= An,1XKn + An,2 Xn−Kn + bn
- 2. Find the Limits: (An,1, An,2, bn) −
→ ???
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
- 1. Rescaling: Xn = (Ln − E[Ln])/
- Var(Ln)
Xn
d
= An,1XKn + An,2 Xn−Kn + bn
- 2. Find the Limits: (An,1, An,2, bn) −
→ (( √ 2)−1, ( √ 2)−1, 0)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
- 1. Rescaling: Xn = (Ln − E[Ln])/
- Var(Ln)
Xn
d
= An,1XKn + An,2 Xn−Kn + bn
- 2. Find the Limits: (An,1, An,2, bn) −
→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2
- X
(1)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
- 1. Rescaling: Xn = (Ln − E[Ln])/
- Var(Ln)
Xn
d
= An,1XKn + An,2 Xn−Kn + bn
- 2. Find the Limits: (An,1, An,2, bn) −
→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2
- X
(1)
- 3. Solution to (1): Existence of a solution to (1). Here: Normal
distribution with mean 0.
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
- 1. Rescaling: Xn = (Ln − E[Ln])/
- Var(Ln)
Xn
d
= An,1XKn + An,2 Xn−Kn + bn
- 2. Find the Limits: (An,1, An,2, bn) −
→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2
- X
(1)
- 3. Solution to (1): Existence of a solution to (1). Here: Normal
distribution with mean 0.
- 4. Contraction: Find a metric such that (1) corresponds to the fixed
point of a contracting map.
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
The Contraction Method in a Nutshell
Aim: Find a limit law for Ln (after rescaling properly) Ln
d
= LKn + ˜ Ln−Kn + n
- 1. Rescaling: Xn = (Ln − E[Ln])/
- Var(Ln)
Xn
d
= An,1XKn + An,2 Xn−Kn + bn
- 2. Find the Limits: (An,1, An,2, bn) −
→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2
- X
(1)
- 3. Solution to (1): Existence of a solution to (1). Here: Normal
distribution with mean 0.
- 4. Contraction: Find a metric such that (1) corresponds to the fixed
point of a contracting map.
- 5. Convergence: Prove convergence with respect to that metric.
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19
Analysis The External Path Length
Results on Ln
Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]
- Var(Ln)
d
− → N(0, 1)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Analysis The External Path Length
Results on Ln
Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]
- Var(Ln)
d
− → N(0, 1) From the analysis of Dn: E[Ln] = E n
- i=1
Dn(Ξi)
- Kevin Leckey
(Monash University) An Introduction to Tries 21.09.2015 15 / 19
Analysis The External Path Length
Results on Ln
Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]
- Var(Ln)
d
− → N(0, 1) From the analysis of Dn: E[Ln] = E n
- i=1
Dn(Ξi)
- = nE[Dn]
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Analysis The External Path Length
Results on Ln
Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]
- Var(Ln)
d
− → N(0, 1) From the analysis of Dn: E[Ln] = E n
- i=1
Dn(Ξi)
- = nE[Dn] = n log2(n) + nΨ(log2(n)) + o(n)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Analysis The External Path Length
Results on Ln
Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]
- Var(Ln)
d
− → N(0, 1) From the analysis of Dn: E[Ln] = E n
- i=1
Dn(Ξi)
- = nE[Dn] = n log2(n) + nΨ(log2(n)) + o(n)
Thm (Kirschenhofer, Prodinger ’86): Var(Ln) = n Ψ(log2(n)) + O(log2(n))
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19
Summary
Trie: tree-like data structure to store words
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary
Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary
Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary
Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’
Typical search/insert time (depth): around log2(n)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary
Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’
Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary
Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’
Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n) Construction cost (path length): around n log2(n)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
Summary
Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’
Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n) Construction cost (path length): around n log2(n)
Input model not very realistic, what about more general input models?
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19
The Markov Source Model
Markov Model
Generate n words Ξ1, . . . , Ξn such that
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19
The Markov Source Model
Markov Model
Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19
The Markov Source Model
Markov Model
Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19
The Markov Source Model
Markov Model
Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}
- P(ξ1 = a) = µa,
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19
The Markov Source Model
Markov Model
Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}
- P(ξ1 = a) = µa,
- P(ξj+1 = a|ξ1, . . . , ξj) = pξja
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19
The Markov Source Model
Markov Model
Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}
- P(ξ1 = a) = µa,
- P(ξj+1 = a|ξ1, . . . , ξj) = pξja
More general (Markov Model with k-dependency): distribution of ξj depends only on the previous k letters for some fixed k
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19
The Markov Source Model
Markov Model
Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}
- P(ξ1 = a) = µa,
- P(ξj+1 = a|ξ1, . . . , ξj) = pξja
More general (Markov Model with k-dependency): distribution of ξj depends only on the previous k letters for some fixed k Even more general: Dynamical Sources Model by Vall´ ee
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19
The Markov Source Model
Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)?
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19
The Markov Source Model
Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19
The Markov Source Model
Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large Entropy in the Markov Source Model: H = π0 (−p00 log(p00) − p01 log(p01)) + π1 (−p10 log(p10) − p11 log(p11)) with stationary distribution (π0, π1) =
- p10
p10 + p01 , p01 p10 + p01
- Kevin Leckey
(Monash University) An Introduction to Tries 21.09.2015 18 / 19
The Markov Source Model
Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large Entropy in the Markov Source Model: H = π0 (−p00 log(p00) − p01 log(p01)) + π1 (−p10 log(p10) − p11 log(p11)) with stationary distribution (π0, π1) =
- p10
p10 + p01 , p01 p10 + p01
- Depth for Markov Sources:
E[Dn] ∼ 1 H log(n)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19
The Markov Source Model
Results for the Markov Source Model
Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19
The Markov Source Model
Results for the Markov Source Model
Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19
The Markov Source Model
Results for the Markov Source Model
Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19
The Markov Source Model
Results for the Markov Source Model
Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength) Radix-Sort and -Select (Thesis L.)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19
The Markov Source Model
Results for the Markov Source Model
Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength) Radix-Sort and -Select (Thesis L.) Lempel-Ziv Parsing Scheme (data compression)
Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19