An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud - PowerPoint PPT Presentation

An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud CNRS and University Paris Diderot 1 Speaker 1 / 31

Outline Informative Labeling Scheme Why should we fight for constants? Optimal ancestry-labeling scheme Small universal posets Conclusion 2 / 31

Informative Labeling scheme Graph representations: ◮ traditional: names given to the nodes serve merely as pointers to entries in a data structure ◮ informative labeling: mechanism for assigning short, yet informative, names to nodes (Kannan, Naor, Rudich [STOC ’88]) General objective To assign labels to nodes in such a way that allows one to infer information regarding any two nodes directly from their labels . Main quality measure Label size = number of bits used to form the labels 3 / 31

Example 1: adjacency in trees Input: tree T 4 / 31

Example 1: adjacency in trees Input: tree T 1. Give distinct IDs to the nodes, between 1 and n 2. Root T at an arbitrary vertex L ( u ) = ( ID ( u ) , ID ( parent ( u )) u and v are adjacent ⇐ ⇒ u = parent ( v ) or v = parent ( u ) Label size = 2 ⌈ log 2 n ⌉ bits 4 / 31

Informative Labeling Scheme Let P be a boolean predicate defined on pairs of vertices for graphs in F Encoder (or marker) M Given G ∈ F , M ( G ) = L where L : V ( G ) → { 0 , 1 } ∗ Decoder D D : { 0 , 1 } ∗ × { 0 , 1 } ∗ → { true, false } For any G ∈ F , and any ( u , v ) ∈ V ( G ) × V ( G ) , P ( u , v ) = true ⇐ ⇒ D ( L ( u ) , L ( v )) = true Can be generalized to various types of functions (distance, connectivity, etc.), or tasks (e.g., routing). 5 / 31

Adjacency in trees Definition A graph U is universal for a graph family F if any G ∈ F is isomorphic to an induced subgraph of U . Theorem ( Kannan, Naor, Rudich [STOC ’88] ) There exists an adjacency labeling scheme for F with labels of at most k bits if and only if there exists a universal graph for F of order at most 2 k . 7 / 31

Adjacency in trees Definition A graph U is universal for a graph family F if any G ∈ F is isomorphic to an induced subgraph of U . Theorem ( Kannan, Naor, Rudich [STOC ’88] ) There exists an adjacency labeling scheme for F with labels of at most k bits if and only if there exists a universal graph for F of order at most 2 k . Adjacency: State of the art 2 log n (Kannan, Naor, and Rudich [STOC ’88]) log n + O ( log ∗ n ) (Alstrup and Rauhe [FOCS ’02]) ⇒ universal graph of order n 2 log ∗ n 7 / 31

Example 2: ancestry in trees Input: rooted tree 8 / 31

Example 2: ancestry in trees Input: rooted tree Give distinct DFS numbers to the nodes, between 1 and n L ( u ) = ( DFS ( u ) , DFS ( u max )) where u max is the node with largest DFS number in the subtree rooted at u . u is an ancestor of v ⇐ ⇒ DFS ( v ) ∈ [ DFS ( u ) , DFS ( u max )] Label size = 2 ⌈ log 2 n ⌉ bits 8 / 31

XML trees < art > < book > < Sutter’s Gold > < author > Blaise Cendrars < /author > art < Release > 1925 < /Release > < /Sutter’s Gold > book movie < /book > < movie > < Citizen Kane > Sutter's Gold Citizen Kane Once Upon a Time < direct > Orson Wells < /direct > in the West < Release > 1941 < /Release > < /Citizen Kane > author Release director director Release Release < Once Upon a Time in the West > date date date < direct > Sergio Leone < /direct > < Release > 1968 < /Release > < /Once Upon a Time in the West > < /movie > < /art > • Answer queries using the index labels only, without accessing the actual documents. • A small improvement in the label size ⇒ significant improvement in the performances of XML search engines. 9 / 31

State of the art: ancestry in trees Ancestry 2 log n (Kannan, Naor, and Rudich [STOC ’88]) 3 2 log n + O ( log log n ) (Abiteboul, Kaplan, and Milo [SODA ’01]) log n + O ( log n / log log n ) (Thorup and Zwick [SPAA ’01]) � log n + O ( log n ) (Alstrup and Rauhe [SODA ’02]) log n + Ω( log log n ) (Alstrup, Bille and Rauhe [SODA ’03]) log n + 2 log ( depth ) + O ( 1 ) (Fraigniaud and Korman, [SODA ’10]) log n + O ( log log n ) (Fraigniaud and Korman, [STOC ’10]) 10 / 31

Interval containment v ancestor of u ⇐ ⇒ I ( u ) ⊆ I ( v ) 2 log n -scheme by Kannan, Naor, and Rudich use n 2 intervals. We aim at using n log c n intervals We use intervals of the following form, for k = 1 , . . . , log n : x a x (a+b) k k I (k,a,b) x k 2x k 1 N level k: x k 12 / 31

Spine decomposition v 1 v i F 1 F i v s F s Nodes classified as either heavy or apex . 13 / 31

Trees with bounded spine decomposition depth d = 0 ( 1 ) 14 / 31

Trees with bounded spine decomposition depth d = 0 ( 1 ) F ( n , d ) = forests with ≤ n nodes, v 1 and spine-decomposition depth ≤ d . We aim at using nd 2 intervals for v i F 1 F ∈ F ( n , d ) Induction of k = log n F i Difficult case: F containing a tree T v s of size larger than 2 k , i.e., 2 k < | T | ≤ 2 k + 1 . F s 14 / 31

General idea I(v ) I(v ) I(v ) 1 2 s bin J level k+1 J 1 J 2 J s x k+1 level k x k I(F ) I(F ) I(F ) 1 2 s c |F | c |F | c |F | k 1 k 2 k s s I( U F ) i=1 i 15 / 31

Tuning of the parameters (1/3) I(v ) I(v ) I(v ) 1 2 s bin J level k+1 J 1 J 2 J s x k+1 level k x k I(F ) I(F ) I(F ) 1 2 s c |F | c |F | c |F | k 1 k 2 k s s I( U F ) i=1 i For 1 ≤ i < s , the length of I ( v i ) must satisfy s � | I ( v i ) | ≈ c k | F i | + x k + 1 + | I ( v i + 1 ) | ≈ c k ( | F i | ) + i · x k + 1 . j = i Bin J to be of length | J | ≈ c k · 2 k + 1 + ( s + 1 ) · x k + 1 suffices. 16 / 31

Tuning of the parameters (2/3) Since s ≤ d , we must have | J | be approximately c k + 1 2 k + 1 ≈ c k 2 k + 1 + d · x k + 1 Choose the values of c k so that: d · x k + 1 c k + 1 − c k ≈ 2 k + 1 We set k 2 k 1 � c k ≈ j 1 + ǫ , and thus x k ≈ d · k 1 + ǫ j = 1 17 / 31

Tuning of the parameters (3/3) Let A k ≈ N / x k and B k ≈ c k 2 k / x k . x a x (a+b) k k I (k,a,b) 1 x k 2x k N level k: x k where 1 ≤ a ≤ A k and 1 ≤ b ≤ B k . Thus, N ≈ c log n · n = O ( n ) . The number of level- k intervals is O ( A k · B k ) = O ( nd 2 k 2 ( 1 + ǫ ) / 2 k ) , yielding a total of O ( nd 2 ) intervals, as desired. 18 / 31

The general case: uses the folding-decomposition v 1 v 1 * = v 1 v 2 u 1 u 2 v i v s v 2 * u 2 v 2 = u 1 v i v j F 1 v j v s F i F 1 F i * F s F 2 = F s * F 1 (a) (b) 19 / 31

Ancestry preservation DFS traversal in T that visits apex children first. For any node u , let DFS ( u ) be the DFS number of u . v 1 v 1 * = v 1 v 2 u 1 u 2 v i v s v 2 * u 1 u 2 v 2 v i v j = F 1 v j F i v s F 1 F i * F s F 2 = F s * F 1 (a) (b) Lemma Node v is an ancestor of u in T if and only if at least one of the following two conditions hold ◮ C1: v is an ancestor of u in T ∗ ; ◮ C2: APEX ( v ) is ancestor of u in T ∗ and DFS ( v ) < DFS ( u ) . 20 / 31

Ordering the intervals Lemma Node v is an ancestor of u in T if and only if at least one of the following two conditions hold ◮ C1: v is an ancestor of u in T ∗ ; ◮ C2’: APEX ( v ) is ancestor of u in T ∗ and I ( v ) ≺ I ( u ) . v 1 I(v ) I(v ) 1 2 k+1 v 2 k I(F ) I(F ) 1 2 2 F 1 1 F 2 (a) (b) label ( u ) = ( I ( u ) , I ( APEX ( u ))) 21 / 31

Compact encoding of I ( APEX ( v )) It is sufficient to encode: ◮ its level k ′ ◮ two shifts b ′ left and b ′ right in [ 1 , B k ′ ] x a' x (a'+b') k' k' I k',a',b' x a" N k' level k': I(v) level k: 22 / 31

Graph arboricity The arboricity of a graph is the minimum number of forests into which its edges can be partitioned. Corollary ( Kannan, Naor, Rudich [STOC ’88] ) There exists an adjacency labeling scheme for the family of graphs with arboricity at most k with labels of at most ( k + 1 ) log n bits. High level correspondence between: adjacency/arboricity for graphs and ancestry/tree-dimension for posets 24 / 31

Partially ordered sets Poset ( X , ≤ ) ◮ reflexivity: x ≤ x ◮ antisymmetry: ( x ≤ y and y ≤ x ) ⇒ x = y ◮ transitivity: ( x ≤ y and y ≤ z ) ⇒ x ≤ z ( X , ≤ ′ ) is an extension of ( X , ≤ ) if: ∀ x , y ∈ X , x ≤ y ⇒ x ≤ ′ y The dimension of a poset ( X , ≤ ) is the smallest number of linear (i.e., total order) extensions of ( X , ≤ ) the intersection of which gives rise to ( X , ≤ ) . 25 / 31

Universal posets A poset ( X , ≤ X ) contains a poset ( Y , ≤ Y ) as an induced suborder if there exists an injective mapping φ : Y → X such that for any two elements a , b ∈ Y : a ≤ Y b ⇐ ⇒ φ ( a ) ≤ X φ ( b ) . Definition A poset ( U , ≤ ) is called universal for a family of posets F if ( U , ≤ ) contains every poset in F as an induced suborder. 26 / 31

An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud - PowerPoint PPT Presentation

An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud CNRS and University Paris Diderot 1 Speaker 1 / 31 Outline Informative Labeling Scheme Why should we fight for constants? Optimal ancestry-labeling scheme Small universal

Getting the most out of your LDS-Ancestry Membership on Ancestry.com Webinar August 2014 Tim

Precision medicine and ethnic labeling of genetic variants 1. Investigated ancestry of 3,528

Scheme Announcements Scheme Scheme is a Dialect of Lisp 4 Scheme is a Dialect of Lisp What

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Fall Seminar Seed Sampling & Labeling Larry Nees Seed Administrator Office of INDIANA

Hub Labeling Algorithms Andrew V. Goldberg Amazon.com A.V. Goldberg Hub Labeling 6/2/2016 1 /

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

What can Scheme learn from JavaScript? Scheme Workshop 2014 Andy Wingo Me and Scheme Guile

PTW Position CO2 measurement and labeling scheme scheme ACEM July 2009 ACEM July 2009 Facts

THE TRAINING LAYOFF SCHEME THE TRAINING LAYOFF SCHEME 1 October 2009 The Training Layoff Scheme

Government Pension Scheme (LGPS) Scheme Administration Defined Benefit Scheme National

Countryside Stewardship Scheme Farm Update North Events Overview and Update on scheme for 2018

Hokio Drainage Scheme Scheme Facts Scheme Assets. 4 floodgated culverts 45 km of

Menu Labeling Supplemental Draft Guidance for Industry November 7, 2017 Supplemental Menu

Food Labeling in Canada Food Labeling in Canada September 27, 2005 Interunion Marketing WUSATA

Pragmatism, Expressivism and the Global Challenge Huw Price & David Macarthur October 14,

Some key themes . Ruthless skepticism Genealogical method Tie problem of

(Re ) Disc o ve ring Ge ne a lo g y in GAL I L E O T AMI K A ST RONG COMO 2016 F RI

BIBLICAL SURVEY Christmas Class From here To here BIBLICAL SURVEY BIBLICAL SURVEY Christmas

The Recent Common Ancestry of All Humanity about 6,000 Years Ago Evolution only presses on

PPFX FOR DUNE Amit Bashyal Oregon State University 1 INTRODUCTION My

ERACER: A Database Approach for Statistical Inference and Data Cleaning Chris Mayfield Jennifer

Free Software / Debian Philosophy, Design, Merits Ritesh Raj Sarraf April 23 rd 2011 Agenda

An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud - PowerPoint PPT Presentation

An Optimal Ancestry Labeling Scheme Amos Korman 1 Pierre Fraigniaud CNRS and University Paris Diderot 1 Speaker 1 / 31 Outline Informative Labeling Scheme Why should we fight for constants? Optimal ancestry-labeling scheme Small universal

Getting the most out of your LDS-Ancestry Membership on Ancestry.com Webinar August 2014 Tim

Precision medicine and ethnic labeling of genetic variants 1. Investigated ancestry of 3,528

Scheme Announcements Scheme Scheme is a Dialect of Lisp 4 Scheme is a Dialect of Lisp What

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Fall Seminar Seed Sampling &amp; Labeling Larry Nees Seed Administrator Office of INDIANA

Hub Labeling Algorithms Andrew V. Goldberg Amazon.com A.V. Goldberg Hub Labeling 6/2/2016 1 /

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

What can Scheme learn from JavaScript? Scheme Workshop 2014 Andy Wingo Me and Scheme Guile

PTW Position CO2 measurement and labeling scheme scheme ACEM July 2009 ACEM July 2009 Facts

THE TRAINING LAYOFF SCHEME THE TRAINING LAYOFF SCHEME 1 October 2009 The Training Layoff Scheme

Government Pension Scheme (LGPS) Scheme Administration Defined Benefit Scheme National

Countryside Stewardship Scheme Farm Update North Events Overview and Update on scheme for 2018

Hokio Drainage Scheme Scheme Facts Scheme Assets. 4 floodgated culverts 45 km of

Menu Labeling Supplemental Draft Guidance for Industry November 7, 2017 Supplemental Menu

Food Labeling in Canada Food Labeling in Canada September 27, 2005 Interunion Marketing WUSATA

Pragmatism, Expressivism and the Global Challenge Huw Price &amp; David Macarthur October 14,

Some key themes . Ruthless skepticism Genealogical method Tie problem of

(Re ) Disc o ve ring Ge ne a lo g y in GAL I L E O T AMI K A ST RONG COMO 2016 F RI

BIBLICAL SURVEY Christmas Class From here To here BIBLICAL SURVEY BIBLICAL SURVEY Christmas

The Recent Common Ancestry of All Humanity about 6,000 Years Ago Evolution only presses on

PPFX FOR DUNE Amit Bashyal Oregon State University 1 INTRODUCTION My

ERACER: A Database Approach for Statistical Inference and Data Cleaning Chris Mayfield Jennifer

Free Software / Debian Philosophy, Design, Merits Ritesh Raj Sarraf April 23 rd 2011 Agenda

Fall Seminar Seed Sampling & Labeling Larry Nees Seed Administrator Office of INDIANA

Pragmatism, Expressivism and the Global Challenge Huw Price & David Macarthur October 14,