an optimal ancestry labeling scheme
play

An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud - PowerPoint PPT Presentation

An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud CNRS and University Paris Diderot 1 Speaker 1 / 31 Outline Informative Labeling Scheme Why should we fight for constants? Optimal ancestry-labeling scheme Small universal


  1. An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud CNRS and University Paris Diderot 1 Speaker 1 / 31

  2. Outline Informative Labeling Scheme Why should we fight for constants? Optimal ancestry-labeling scheme Small universal posets Conclusion 2 / 31

  3. Informative Labeling scheme Graph representations: ◮ traditional: names given to the nodes serve merely as pointers to entries in a data structure ◮ informative labeling: mechanism for assigning short, yet informative, names to nodes (Kannan, Naor, Rudich [STOC ’88]) General objective To assign labels to nodes in such a way that allows one to infer information regarding any two nodes directly from their labels . Main quality measure Label size = number of bits used to form the labels 3 / 31

  4. Example 1: adjacency in trees Input: tree T 4 / 31

  5. Example 1: adjacency in trees Input: tree T 1. Give distinct IDs to the nodes, between 1 and n 2. Root T at an arbitrary vertex L ( u ) = ( ID ( u ) , ID ( parent ( u )) u and v are adjacent ⇐ ⇒ u = parent ( v ) or v = parent ( u ) Label size = 2 ⌈ log 2 n ⌉ bits 4 / 31

  6. Informative Labeling Scheme Let P be a boolean predicate defined on pairs of vertices for graphs in F Encoder (or marker) M Given G ∈ F , M ( G ) = L where L : V ( G ) → { 0 , 1 } ∗ Decoder D D : { 0 , 1 } ∗ × { 0 , 1 } ∗ → { true, false } For any G ∈ F , and any ( u , v ) ∈ V ( G ) × V ( G ) , P ( u , v ) = true ⇐ ⇒ D ( L ( u ) , L ( v )) = true Can be generalized to various types of functions (distance, connectivity, etc.), or tasks (e.g., routing). 5 / 31

  7. Outline Informative Labeling Scheme Why should we fight for constants? Optimal ancestry-labeling scheme Small universal posets Conclusion 6 / 31

  8. Adjacency in trees Definition A graph U is universal for a graph family F if any G ∈ F is isomorphic to an induced subgraph of U . Theorem ( Kannan, Naor, Rudich [STOC ’88] ) There exists an adjacency labeling scheme for F with labels of at most k bits if and only if there exists a universal graph for F of order at most 2 k . 7 / 31

  9. Adjacency in trees Definition A graph U is universal for a graph family F if any G ∈ F is isomorphic to an induced subgraph of U . Theorem ( Kannan, Naor, Rudich [STOC ’88] ) There exists an adjacency labeling scheme for F with labels of at most k bits if and only if there exists a universal graph for F of order at most 2 k . Adjacency: State of the art 2 log n (Kannan, Naor, and Rudich [STOC ’88]) log n + O ( log ∗ n ) (Alstrup and Rauhe [FOCS ’02]) ⇒ universal graph of order n 2 log ∗ n 7 / 31

  10. Example 2: ancestry in trees Input: rooted tree 8 / 31

  11. Example 2: ancestry in trees Input: rooted tree Give distinct DFS numbers to the nodes, between 1 and n L ( u ) = ( DFS ( u ) , DFS ( u max )) where u max is the node with largest DFS number in the subtree rooted at u . u is an ancestor of v ⇐ ⇒ DFS ( v ) ∈ [ DFS ( u ) , DFS ( u max )] Label size = 2 ⌈ log 2 n ⌉ bits 8 / 31

  12. XML trees < art > < book > < Sutter’s Gold > < author > Blaise Cendrars < /author > art < Release > 1925 < /Release > < /Sutter’s Gold > book movie < /book > < movie > < Citizen Kane > Sutter's Gold Citizen Kane Once Upon a Time < direct > Orson Wells < /direct > in the West < Release > 1941 < /Release > < /Citizen Kane > author Release director director Release Release < Once Upon a Time in the West > date date date < direct > Sergio Leone < /direct > < Release > 1968 < /Release > < /Once Upon a Time in the West > < /movie > < /art > • Answer queries using the index labels only, without accessing the actual documents. • A small improvement in the label size ⇒ significant improvement in the performances of XML search engines. 9 / 31

  13. State of the art: ancestry in trees Ancestry 2 log n (Kannan, Naor, and Rudich [STOC ’88]) 3 2 log n + O ( log log n ) (Abiteboul, Kaplan, and Milo [SODA ’01]) log n + O ( log n / log log n ) (Thorup and Zwick [SPAA ’01]) � log n + O ( log n ) (Alstrup and Rauhe [SODA ’02]) log n + Ω( log log n ) (Alstrup, Bille and Rauhe [SODA ’03]) log n + 2 log ( depth ) + O ( 1 ) (Fraigniaud and Korman, [SODA ’10]) log n + O ( log log n ) (Fraigniaud and Korman, [STOC ’10]) 10 / 31

  14. Outline Informative Labeling Scheme Why should we fight for constants? Optimal ancestry-labeling scheme Small universal posets Conclusion 11 / 31

  15. Interval containment v ancestor of u ⇐ ⇒ I ( u ) ⊆ I ( v ) 2 log n -scheme by Kannan, Naor, and Rudich use n 2 intervals. We aim at using n log c n intervals We use intervals of the following form, for k = 1 , . . . , log n : x a x (a+b) k k I (k,a,b) x k 2x k 1 N level k: x k 12 / 31

  16. Spine decomposition v 1 v i F 1 F i v s F s Nodes classified as either heavy or apex . 13 / 31

  17. Trees with bounded spine decomposition depth d = 0 ( 1 ) 14 / 31

  18. Trees with bounded spine decomposition depth d = 0 ( 1 ) F ( n , d ) = forests with ≤ n nodes, v 1 and spine-decomposition depth ≤ d . We aim at using nd 2 intervals for v i F 1 F ∈ F ( n , d ) Induction of k = log n F i Difficult case: F containing a tree T v s of size larger than 2 k , i.e., 2 k < | T | ≤ 2 k + 1 . F s 14 / 31

  19. General idea I(v ) I(v ) I(v ) 1 2 s bin J level k+1 J 1 J 2 J s x k+1 level k x k I(F ) I(F ) I(F ) 1 2 s c |F | c |F | c |F | k 1 k 2 k s s I( U F ) i=1 i 15 / 31

  20. Tuning of the parameters (1/3) I(v ) I(v ) I(v ) 1 2 s bin J level k+1 J 1 J 2 J s x k+1 level k x k I(F ) I(F ) I(F ) 1 2 s c |F | c |F | c |F | k 1 k 2 k s s I( U F ) i=1 i For 1 ≤ i < s , the length of I ( v i ) must satisfy s � | I ( v i ) | ≈ c k | F i | + x k + 1 + | I ( v i + 1 ) | ≈ c k ( | F i | ) + i · x k + 1 . j = i Bin J to be of length | J | ≈ c k · 2 k + 1 + ( s + 1 ) · x k + 1 suffices. 16 / 31

  21. Tuning of the parameters (2/3) Since s ≤ d , we must have | J | be approximately c k + 1 2 k + 1 ≈ c k 2 k + 1 + d · x k + 1 Choose the values of c k so that: d · x k + 1 c k + 1 − c k ≈ 2 k + 1 We set k 2 k 1 � c k ≈ j 1 + ǫ , and thus x k ≈ d · k 1 + ǫ j = 1 17 / 31

  22. Tuning of the parameters (3/3) Let A k ≈ N / x k and B k ≈ c k 2 k / x k . x a x (a+b) k k I (k,a,b) 1 x k 2x k N level k: x k where 1 ≤ a ≤ A k and 1 ≤ b ≤ B k . Thus, N ≈ c log n · n = O ( n ) . The number of level- k intervals is O ( A k · B k ) = O ( nd 2 k 2 ( 1 + ǫ ) / 2 k ) , yielding a total of O ( nd 2 ) intervals, as desired. 18 / 31

  23. The general case: uses the folding-decomposition v 1 v 1 * = v 1 v 2 u 1 u 2 v i v s v 2 * u 2 v 2 = u 1 v i v j F 1 v j v s F i F 1 F i * F s F 2 = F s * F 1 (a) (b) 19 / 31

  24. Ancestry preservation DFS traversal in T that visits apex children first. For any node u , let DFS ( u ) be the DFS number of u . v 1 v 1 * = v 1 v 2 u 1 u 2 v i v s v 2 * u 1 u 2 v 2 v i v j = F 1 v j F i v s F 1 F i * F s F 2 = F s * F 1 (a) (b) Lemma Node v is an ancestor of u in T if and only if at least one of the following two conditions hold ◮ C1: v is an ancestor of u in T ∗ ; ◮ C2: APEX ( v ) is ancestor of u in T ∗ and DFS ( v ) < DFS ( u ) . 20 / 31

  25. Ordering the intervals Lemma Node v is an ancestor of u in T if and only if at least one of the following two conditions hold ◮ C1: v is an ancestor of u in T ∗ ; ◮ C2’: APEX ( v ) is ancestor of u in T ∗ and I ( v ) ≺ I ( u ) . v 1 I(v ) I(v ) 1 2 k+1 v 2 k I(F ) I(F ) 1 2 2 F 1 1 F 2 (a) (b) label ( u ) = ( I ( u ) , I ( APEX ( u ))) 21 / 31

  26. Compact encoding of I ( APEX ( v )) It is sufficient to encode: ◮ its level k ′ ◮ two shifts b ′ left and b ′ right in [ 1 , B k ′ ] x a' x (a'+b') k' k' I k',a',b' x a" N k' level k': I(v) level k: 22 / 31

  27. Outline Informative Labeling Scheme Why should we fight for constants? Optimal ancestry-labeling scheme Small universal posets Conclusion 23 / 31

  28. Graph arboricity The arboricity of a graph is the minimum number of forests into which its edges can be partitioned. Corollary ( Kannan, Naor, Rudich [STOC ’88] ) There exists an adjacency labeling scheme for the family of graphs with arboricity at most k with labels of at most ( k + 1 ) log n bits. High level correspondence between: adjacency/arboricity for graphs and ancestry/tree-dimension for posets 24 / 31

  29. Partially ordered sets Poset ( X , ≤ ) ◮ reflexivity: x ≤ x ◮ antisymmetry: ( x ≤ y and y ≤ x ) ⇒ x = y ◮ transitivity: ( x ≤ y and y ≤ z ) ⇒ x ≤ z ( X , ≤ ′ ) is an extension of ( X , ≤ ) if: ∀ x , y ∈ X , x ≤ y ⇒ x ≤ ′ y The dimension of a poset ( X , ≤ ) is the smallest number of linear (i.e., total order) extensions of ( X , ≤ ) the intersection of which gives rise to ( X , ≤ ) . 25 / 31

  30. Universal posets A poset ( X , ≤ X ) contains a poset ( Y , ≤ Y ) as an induced suborder if there exists an injective mapping φ : Y → X such that for any two elements a , b ∈ Y : a ≤ Y b ⇐ ⇒ φ ( a ) ≤ X φ ( b ) . Definition A poset ( U , ≤ ) is called universal for a family of posets F if ( U , ≤ ) contains every poset in F as an induced suborder. 26 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend