Using Stein’s method to show Poisson and normal limit laws for fringe trees
Cecilia Holmgren, Stockholm University study together with Svante Janson, Uppsala University AofA, Paris June 19, 2014
Using Steins method to show Poisson and normal limit laws for fringe - - PowerPoint PPT Presentation
Using Steins method to show Poisson and normal limit laws for fringe trees Cecilia Holmgren, Stockholm University study together with Svante Janson, Uppsala University AofA, Paris June 19, 2014 Aim of Study To show new general results,
Cecilia Holmgren, Stockholm University study together with Svante Janson, Uppsala University AofA, Paris June 19, 2014
◮ To show new general results, as well as provide more
direct proofs, of earlier results on fringe trees (i.e., subtrees consisting of some node and its descendants) in binary search trees and random recursive trees.
◮ To apply Stein’s method with couplings (as described
by Barbour, Holst and Janson) in the study of fringe trees.
◮ To see whether the general results on fringe trees
could lead to simple solutions of various types of problems on random trees, such as for example the asymptotic distribution of the number of protected nodes in the binary search tree.
Start to draw a number so-called key from the set {1, 2, 3, . . . , 10} and place it in the root.
Start to draw a number so-called key from the set {1, 2, 3, . . . , 10} and place it in the root.
6
Draw a new number/key from the remaining numbers in the set {1, 2, 3, . . . , 10}. Compare the new key to the root’s key. If it is smaller/larger it is associated with the left/right child.
6 3
Continue to draw new keys and start the comparison from the root.
6 3 8
6 3 8 10
6 3 8 5 10
6 3 8 1 5 10
6 3 8 1 5 10 4
6 3 8 1 5 7 10 4
6 3 8 1 5 7 10 4 9
6 3 8 1 5 7 10 2 4 9
◮ We use the representation of the binary search tree Tn by
random time stamp Uk to each key k describing when it is
this connection.
◮ This tree is constructed from (1, U1), . . . , (10, U10), where
1 = U6 < U3 < U8 < U10 < U5 < U1 < U4 < U7 < U9 < U2 = 10.
6 3 8 1 5 7 10 2 4 9
The unique binary search tree constructed from (1, U1), . . . , (n, Un) have two characterizing properties.
◮ It is a binary search tree with respect to the first
coordinates in the pairs.
◮ Along every path down from the root the values Ui, are
increasing.
6,1 3,2 8,3 1,6 5.5 7,8 10,4 2,10 4,7 9,9
6,1 3,2 8,3 1,6 5.5 7,8 10,4 2,10 4,7 9,9
◮ The fringe tree rooted at 3 in the example is associated
with (1, 6), (2, 10), (3, 2), (4, 7), (5, 5).
◮ We use ”subtree” and ”fringe tree” as synonyms. 6,1 3,2 8,3 1,6 5.5 7,8 10,4 2,10 4,7 9,9
◮ Each node u in the binary search tree can thus be
associated with a subset Tn(u) of {(1, U1), . . . , (n, Un)}.
◮ Let f(T) be a function from the set of (unlabelled) binary
trees to R. Set Xn =
f(Tn(u)), summing over all nodes in Tn. Since the shape of the subtree rooted at u is determined by Tn(u), we can use Xn to calculate the number of subtrees with properties that interest us by choosing appropriate functions f.
Let f(T) be a function from the set of (unlabelled) binary trees to R. Set Xn =
f(Tn(u)). Examples:
◮ Let f(Tn(u)) = 1{Tn(u) ≈ T}. Then Xn is the number of
subtrees that are equal to T (since each permutation uniquely determines a subtree shape).
◮ Let f(Tn(u)) = 1{| Tn(u) |= k}. Then Xn is the number of
subtrees with exactly k nodes.
◮ Let f(Tn(u)) = 1{| Tn(u) |= 1}. Then Xn is the number of
leaves.
◮ Write
σ(i, k) = {(i, Ui), . . . , (i + k − 1, Ui+k−1)} for k ≥ 1 and 1 ≤ i ≤ n − k + 1.
◮ We define the indicator variable
Ii,k = 1{σ(i, k) is a subtree in Tn}. Defining U0 = Un+1 = 0, we see that Ii,k = 1
◮ Note that we have two boundary cases for i = 1 and
i = n − k + 1.
◮ The representation by Devroye representing the tree as
(1, U1), . . . , (n, Un) is natural and useful, but the terms with i = 1 and i = (n − k + 1) have to be treated specially because of boundary effects.
◮ Boundary effects can be avoided by instead using a cyclic
representation.
◮ Let U0, . . . , Un ∼ U(0, 1) be i.i.d. uniform r.v.. Let
Ui+k·(n+1) := Ui, i ∈ {0, . . . , n}, k ∈ Z.
◮ When discussing these variables, we will use the natural
metric on Zn+1 defined by |i − j|n+1 := min
ℓ∈Z |i − j − ℓ · (n + 1)|.
Again let Ii,k = 1
but now for all i and k.
◮ The number of subtrees of size k < n in a binary search
tree with n nodes, is with Devroye’s representation equal to
n−k+1
Ii,k.
◮ Using the cyclic representation, the number of subtrees of
size k < n is equal to
n+1
Ii,k; the Ii,k are equally distributed and E(Ii,k) =
2 (k+1)(k+2).
Recall Ii,k = 1
◮ The sum n+1 i=1 Ii,k is invariant under a cyclic shift of
U0, . . . , Un. If we shift the values U0, U1, U2, . . . , Un so that U0 is the smallest we are back in the representation by Devroye, where one can assume that U0 = Un+1 = 0 (since only order relations are important for Ii,k).
The expected value and variance of Xn,k := n+1
i=1 Ii,k is easy to
calculate using the cyclic representation.
Let 1 ≤ k < n. For the random binary search tree Tn, E(Xn,k) = 2(n + 1) (k + 1)(k + 2) and Var(Xn,k) = E Xn,k − (n + 1)
22k2+44k+12 (k+1)(k+2)2(2k+1)(2k+3),
k < n−1
2 ,
E Xn,k + 2
n − 64 (n+3)2 ,
k = n−1
2 ,
E Xn,k − (E Xn,k)2 = E Xn,k −
4(n+1)2 (k+1)2(k+2)2 ,
k > n−1
2 .
Start with a root with label 0.
At stage 1 attach a new node with label 1 to the root.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
At stage i (i = 1, . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0, . . . , i − 1.
◮ Let (X, A) be any measurable space. The total variation
distance dTV between two probability measures µ1 and µ2
dTV(µ1, µ2) := sup
A∈A
| µ1(A) − µ2(A) |.
◮ Let L(X) denote the distribution of a random variable X.
Poisson Convergence
The following theorem except the explicit rate O
k
Feng, Mahmoud and Panholzer, and Fuchs by using variants of the method of moments. Here we provide a more direct proof using Stein’s method.
Let k = kn where k < n. Then it holds that dTV(L(Xn,k), Po(µn,k)) = O 1 k
where Xn,k is the number of subtrees of size k in the random binary search tree Tn and µn,k := E(Xn,k) =
2(n+1) (k+1)(k+2). Similarly it holds that
dTV(L(ˆ Xn,k), Po(ˆ µn,k)) = O 1 k
where ˆ Xn,k is the number of subtrees in the random recursive tree Λn and ˆ µn,k := E( ˆ Xn,k) =
n (k+1)(k+2).
Poisson Convergence Consequently, if n → ∞ and k → ∞, then dTV(L(Xn,k), Po(µn,k)) → 0 and dTV(L( ˆ Xn,k), Po( ˆ µn,k)) → 0 , where we recall that Xn,k and ˆ Xn,k is the number of subtrees of size k in the random binary search tree Tn respectively in the random recursive tree Λn.
Sketch of Proof of Theorem 1
◮ For proving Poisson convergence for sums of weakly
dependent indicator variables it is often useful to find couplings.
◮ Let Γ be a finite index set and let (Iα, α ∈ Γ) be indicator
random variables. We write W :=
α∈Γ Iα and λ = E(W). ◮ A coupling (W, Wα) between W and a random variable
Wα is defined on the same probability space as W, with the property L(Wα) = L(W − Iα | Iα = 1). Such a coupling can be used for approximating W with a Poisson distribution Po(λ).
The coupling (W, Wα) can be constructed in the following way:
◮ We find random variables (Jβα, β ∈Γ) defined on the same
probability space as (Iα, α ∈Γ), in such a way that for each α ∈Γ, and jointly for all β ∈Γ, L(Jβα) = L(Iβ | Iα = 1).
◮ Then Wα = β=α Jβα is defined on the same probability
space as W =
α∈Γ Iα and it holds that
L(Wα) = L(W − Iα | Iα = 1).
For showing Poisson convergence as k → ∞ of Xn,k = n+1
i=1 Ii,k (the number of subtrees of size k < n in the
random binary search tree of size n) we showed that there exist appropriate couplings of the indicators Ii,k.
◮ We use that Ii,k and Ij,k are independent, whenever
|i − j|n+1 ≥ k + 2. This independence follows, since Ii,k = 1
which means that if |i − j|n+1 ≥ k + 2, Ii,k and Ij,k are depending on disjoint Um.
We want to show Poisson convergence as k → ∞ of Xn,k = n+1
i=1 Ii,k.
Let k ∈ {1, . . . , n − 1}. Then for each i ∈ {1, . . . , n + 1}, there exists a coupling ((Ij,k)j, (Z k
ji )j) such that
L(Z k
ji ) = L(Ij,k | Ii,k = 1) jointly for all j ∈ {1, . . . , n + 1}.
Furthermore, Z k
ji = Ij,k
if |j − i|n+1 > k + 1, Z k
ji ≥ Ij,k
if |j − i|n+1 = k + 1, Z k
ji = 0 ≤ Ij,k
if 0 < |j − i|n+1 ≤ k.
Let U0 = 0. If we condition on that the keys {4, 5, 6} forms a subtree, we only need to change the times {U3, U4, U5, U6, U7}, so that U3 and U7 are the two smallest of these five values. All
(7, U6 = 1) and (6, U6 = 1) to (6, U7 = 8).
6,1 3,2 8,3 1,6 5.5 7,8 10,4 2,10 4,7 9,9 7,1 3,2 8,3 1,6 5.5 6,8 10,4 2,10 4,7 9,9
◮ Recall that (Jβα, β ∈ Γ) are random variables with
L(Jβα) = L(Iβ | Iα = 1), where Iα, α ∈ Γ are indicator random variables for the event α.
◮ Suppose that, for each α, the set Γα = Γ\{α} is partioned
into Γ−
α and Γα\Γ− α in such a way that
Jβα ≤ Iβ, if β ∈ Γ−
α .
Let W =
α∈Γ Iα and λ = E(W). Let Γα = Γ\{α} and Γ− α be
defined as above. Then it holds that dTV(L(W), Po(λ)) ≤ min{1, λ−1}
α
E(IαIβ)
The Poisson approximation is good if the set Γ−
α is large
compared to the set Γα\Γ−
α .
Let Γ := {1, . . . , n + 1} and Γi := Γ \ {i}. From Lemma 2 we see that if |j − i|n+1 = k + 1 then Z k
ji ≤ Ij,k, and thus the set
Γ−
i := Γ \ {i, i ± (k + 1)} is LARGE compared to the set Γi\Γ− i .
Let k ∈ {1, . . . , n − 1}. Then for each i ∈ {1, . . . , n + 1}, there exists a coupling ((Ij,k)j, (Z k
ji )j) such that
L(Z k
ji ) = L(Ij,k | Ii,k = 1) jointly for all j ∈ {1, . . . , n + 1}.
Furthermore, Z k
ji = Ij,k
if |j − i|n+1 > k + 1, Z k
ji ≥ Ij,k
if |j − i|n+1 = k + 1, Z k
ji = 0 ≤ Ij,k
if 0 < |j − i|n+1 ≤ k.
◮ There is a bijection called the natural correspondance
between ordered trees of size n and binary trees of size n − 1, introduced by Knuth.
◮ As noted by Fuchs, Hwang and Neininger the natural
correspondence yields a coupling between the random recursive tree of size n and the binary search tree of size n − 1.
1 3 4 2 5 8 9 6 10 7 (a) A random recursive tree. 6,1 3,2 8,3 1,6 5.5 7,8 10,4 2,10 4,7 9,9 (b) The corresponding binary search tree.
1 3 4 2 5 8 9 6 10 7
(c) Two subtrees in the random recursive tree.
6,1 3,2 8,3 1,6 5,5 7,8 10,4 2,10 4,7 9,9
(d) The corresponding subtrees in the binary search tree.
Let Tn be a random binary search tree with n nodes. Let X n
T be
the number of subtrees T in the binary search tree Tn. Let T 1, . . . , T d be a fixed sequence of distinct binary trees, and let ¯ X n
d = (X n T 1, X n T 2, . . . , X n T d). Let
µn
d :=
T 1), E(X n T 2), . . . , E(X n T d)
i,j=1 denote the matrix with elements
γij = limn→∞ 1
nCov(X n T i, X n T j) (where γij are given explicitly).
Then Γ is non-singular and ¯ X n
d − µn d
√n
d
→ N(0, Γ). There is a corresponding result for the random recursive tree.
◮ By the Cram´
er–Wold device (Billingsley Theorem 7.7), to show that ¯ X n
d = (X n T 1, X n T 2, . . . , X n T d) converges to a
multivariate normal distribution, it is enough to show that every linear combination of the components in the vector converges to a normal distribution.
◮ We use a version of Stein’s method with dependency
graphs as defined by Janson et al. (and earlier used by Devroye for the 1-dimensional case), for convergence to normal distributions, to prove that ¯ X n
d converges to a
multivariate normal distribution.
We consider the number of so-called 2-protected nodes in binary search trees. A node is 2-protected if the shortest distance to a leaf is at least two, i.e., it is neither a leaf or the parent of a leaf.
We consider the number of so-called 2-protected nodes in binary search trees. A node is 2-protected if the shortest distance to a leaf is at least two, i.e., it is neither a leaf or the parent of a leaf.
The following theorem was shown by Mahmoud and Ward using generating functions and recurrences.
Let Xn be the number of protected nodes in a binary search
Xn − 11
30n
√n
d
− → N
225
◮ We provide a simple proof of this theorem using that the
number of unprotected nodes equals twice the number of leaves minus the number of cherry subtrees.
◮ Hence, since any linear combination of the components in
a random vector with a multivariate distribution is normal, Theorem 4 follows from Theorem 3.
◮ We introduced a cyclic version of Devroye’s representation
to study sums of functions of fringe trees in binary search trees.
◮ Using Stein’s method we showed that the number of
subtrees of size k < n in the binary search tree and the random recursive tree (of size n) converges to a Poisson distribution as k → ∞.
◮ We studied the random number of copies of a certain fixed
subtree T in the binary search tree (respectively the random recursive tree). Using the Cram´ er–Wold device, we show that a vector with the components corresponding to these random numbers for different fixed subtrees, converges to a multivariate normal distribution.
◮ We introduced certain couplings related to Stein’s
method in the study of fringe trees.
◮ We showed that the the natural correspondence
between the random recursive tree and the binary search tree could be used to analyze fringe trees in the random recursive tree.
◮ We showed that we could translate the problem
concerning the number of protected nodes in the binary search tree to a problem concerning fringe trees. Thus, the fringe tree approach lead to a simple proof, for showing that the the number of protected nodes in the binary search tree is asymptotically normal.