 
              Using Stein’s method to show Poisson and normal limit laws for fringe trees Cecilia Holmgren, Stockholm University study together with Svante Janson, Uppsala University AofA, Paris June 19, 2014
Aim of Study ◮ To show new general results, as well as provide more direct proofs, of earlier results on fringe trees (i.e., subtrees consisting of some node and its descendants) in binary search trees and random recursive trees. ◮ To apply Stein’s method with couplings (as described by Barbour, Holst and Janson) in the study of fringe trees. ◮ To see whether the general results on fringe trees could lead to simple solutions of various types of problems on random trees , such as for example the asymptotic distribution of the number of protected nodes in the binary search tree.
The Binary Search Tree Start to draw a number so-called key from the set { 1 , 2 , 3 , . . . , 10 } and place it in the root.
The Binary Search Tree Start to draw a number so-called key from the set { 1 , 2 , 3 , . . . , 10 } and place it in the root. 6
The Binary Search Tree Draw a new number/key from the remaining numbers in the set { 1 , 2 , 3 , . . . , 10 } . Compare the new key to the root’s key. If it is smaller/larger it is associated with the left/right child. 6 3
The Binary Search Tree Continue to draw new keys and start the comparison from the root. 6 3 8
The Binary Search Tree 6 3 8 10
The Binary Search Tree 6 3 8 5 10
The Binary Search Tree 6 3 8 1 5 10
The Binary Search Tree 6 3 8 1 5 10 4
The Binary Search Tree 6 3 8 1 5 7 10 4
The Binary Search Tree 6 3 8 1 5 7 10 9 4
The Binary Search Tree 6 3 8 1 5 7 10 2 9 4
Construction with Allocated Random Time Stamps ◮ We use the representation of the binary search tree T n by Devroye. We interpret the permutation as assigning a random time stamp U k to each key k describing when it is inserted. We sometimes use the notation ( k , U k ) to denote this connection. ◮ This tree is constructed from ( 1 , U 1 ) , . . . , ( 10 , U 10 ) , where 1 = U 6 < U 3 < U 8 < U 10 < U 5 < U 1 < U 4 < U 7 < U 9 < U 2 = 10 . 6 3 8 1 5 7 10 2 4 9
Construction with Allocated Random Time Stamps The unique binary search tree constructed from ( 1 , U 1 ) , . . . , ( n , U n ) have two characterizing properties. ◮ It is a binary search tree with respect to the first coordinates in the pairs. ◮ Along every path down from the root the values U i , are increasing. 6,1 3,2 8,3 1,6 5.5 7,8 10,4 2,10 4,7 9,9
What is a Fringe Tree? 6,1 3,2 8,3 1,6 7,8 10,4 5.5 2,10 4,7 9,9
A Subtree is Associated with Keys and Time Stamps ◮ The fringe tree rooted at 3 in the example is associated with ( 1 , 6 ) , ( 2 , 10 ) , ( 3 , 2 ) , ( 4 , 7 ) , ( 5 , 5 ) . ◮ We use ”subtree” and ”fringe tree” as synonyms. 6,1 3,2 8,3 1,6 7,8 10,4 5.5 2,10 4,7 9,9
Functions of Subtrees ◮ Each node u in the binary search tree can thus be associated with a subset T n ( u ) of { ( 1 , U 1 ) , . . . , ( n , U n ) } . ◮ Let f ( T ) be a function from the set of (unlabelled) binary trees to R . Set � X n = f ( T n ( u )) , u summing over all nodes in T n . Since the shape of the subtree rooted at u is determined by T n ( u ) , we can use X n to calculate the number of subtrees with properties that interest us by choosing appropriate functions f .
Examples of Functions of Subtrees Let f ( T ) be a function from the set of (unlabelled) binary trees to R . Set � X n = f ( T n ( u )) . u Examples: ◮ Let f ( T n ( u )) = 1 { T n ( u ) ≈ T } . Then X n is the number of subtrees that are equal to T (since each permutation uniquely determines a subtree shape). ◮ Let f ( T n ( u )) = 1 {| T n ( u ) | = k } . Then X n is the number of subtrees with exactly k nodes . ◮ Let f ( T n ( u )) = 1 {| T n ( u ) | = 1 } . Then X n is the number of leaves .
Requirements for Being A Subtree ◮ Write σ ( i , k ) = { ( i , U i ) , . . . , ( i + k − 1 , U i + k − 1 ) } for k ≥ 1 and 1 ≤ i ≤ n − k + 1. ◮ We define the indicator variable I i , k = 1 { σ ( i , k ) is a subtree in T n } . Defining U 0 = U n + 1 = 0 , we see that � � I i , k = 1 U i − 1 and U i + k are the smallest of U i − 1 , . . . , U i + k . ◮ Note that we have two boundary cases for i = 1 and i = n − k + 1.
Requirements for Being A Subtree 6,1 3,2 8,3 1,6 7,8 5.5 10,4 2,10 4,7 9,9
Requirements for Being A Subtree 6,1 3,2 8,3 1,6 7,8 10,4 5.5 2,10 4,7 9,9
Requirements for Being A Subtree 6,1 3,2 8,3 1,6 7,8 5.5 10,4 2,10 4,7 9,9
Requirements for Being A Subtree 6,1 3,2 8,3 1,6 7,8 5.5 10,4 2,10 4,7 9,9
Cyclic Representation ◮ The representation by Devroye representing the tree as ( 1 , U 1 ) , . . . , ( n , U n ) is natural and useful, but the terms with i = 1 and i = ( n − k + 1 ) have to be treated specially because of boundary effects. ◮ Boundary effects can be avoided by instead using a cyclic representation. ◮ Let U 0 , . . . , U n ∼ U ( 0 , 1 ) be i.i.d. uniform r.v.. Let U i + k · ( n + 1 ) := U i , i ∈ { 0 , . . . , n } , k ∈ Z . ◮ When discussing these variables, we will use the natural metric on Z n + 1 defined by | i − j | n + 1 := min ℓ ∈ Z | i − j − ℓ · ( n + 1 ) | .
Cyclic Representation Again let � � I i , k = 1 U i − 1 and U i + k are the smallest of U i − 1 , . . . , U i + k , but now for all i and k . ◮ The number of subtrees of size k < n in a binary search tree with n nodes, is with Devroye’s representation equal to n − k + 1 � I i , k . i = 1 ◮ Using the cyclic representation, the number of subtrees of size k < n is equal to n + 1 � I i , k ; i = 1 2 the I i , k are equally distributed and E ( I i , k ) = ( k + 1 )( k + 2 ) .
Cyclic Representation Recall � � I i , k = 1 U i − 1 and U i + k are the smallest of U i − 1 , . . . , U i + k . ◮ The sum � n + 1 i = 1 I i , k is invariant under a cyclic shift of U 0 , . . . , U n . If we shift the values U 0 , U 1 , U 2 , . . . , U n so that U 0 is the smallest we are back in the representation by Devroye, where one can assume that U 0 = U n + 1 = 0 (since only order relations are important for I i , k ).
The Number of Subtrees of size k The expected value and variance of X n , k := � n + 1 i = 1 I i , k is easy to calculate using the cyclic representation. Lemma 1 Let 1 ≤ k < n. For the random binary search tree T n , 2 ( n + 1 ) E ( X n , k ) = ( k + 1 )( k + 2 ) and 22 k 2 + 44 k + 12  k < n − 1 E X n , k − ( n + 1 ) ( k + 1 )( k + 2 ) 2 ( 2 k + 1 )( 2 k + 3 ) , 2 ,    E X n , k + 2 64 k = n − 1 n − Var ( X n , k ) = ( n + 3 ) 2 , 2 , E X n , k − ( E X n , k ) 2 = E X n , k − 4 ( n + 1 ) 2  k > n − 1  ( k + 1 ) 2 ( k + 2 ) 2 , 2 . 
The Random Recursive Tree Start with a root with label 0. 0
The Random Recursive Tree At stage 1 attach a new node with label 1 to the root. 0 1
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 1 2
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 1 3 2
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 4 1 3 2
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 4 1 3 2 5
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 4 1 3 2 5 6
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 4 1 3 2 5 6 7
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 4 1 3 2 5 8 6 7
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 4 1 3 2 5 8 9 6 7
The Random Recursive Tree At stage i ( i = 1 , . . . , 10), attach a new node with label i uniformly at random to one of the previous nodes 0 , . . . , i − 1. 0 4 1 3 2 5 8 9 6 10 7
Total Variation Distance Definition ◮ Let ( X , A ) be any measurable space. The total variation distance d TV between two probability measures µ 1 and µ 2 on X is defined to be d TV ( µ 1 , µ 2 ) := sup | µ 1 ( A ) − µ 2 ( A ) | . A ∈A ◮ Let L ( X ) denote the distribution of a random variable X .
Recommend
More recommend