external memory geometric data structures
play

External Memory Geometric Data Structures Lars Arge Duke University - PowerPoint PPT Presentation

External Memory Geometric Data Structures Lars Arge Duke University June 28, 2002 Summer School on Massive Datasets External memory data structures Yesterday 1 c 1 Fan-out ( ) B-tree ( ) c B Degree balanced tree with


  1. External Memory Geometric Data Structures Lars Arge Duke University June 28, 2002 Summer School on Massive Datasets

  2. External memory data structures Yesterday Θ 1 c ≥ 1 • Fan-out ( ) B-tree ( ) c B – Degree balanced tree with each node/leaf in O (1) blocks – O ( N/B ) space + (log T ) – O B N I/O query B (log ) – I/O update O N B • Persistent B-tree – Update current version, query all previous versions – B-tree bounds with N number of operations performed • Buffer tree technique – Lazy update/queries using buffers attached to each node ( 1 N – log ) amortized bounds O B B M B N N ( log ) – E.g. used to construct structures in I/Os O M B B B Lars Arge 2

  3. External memory data structures Simplifying Assumption • Model – N : Elements in structure D – B : Elements per block – M : Elements in main memory Block I/O – T : Output size in searching problems M • Assumption – Today (and tomorrow) assume that M>B 2 – Assumption not crucial but simplify P expressions a lot, e.g.: = N N N ( log ) ( log ) O O N M B B B B B Lars Arge 3

  4. External memory data structures Today • “Dimension 1.5” problems: – More complicated problems: Interval stabbing and point location – Looking for same bounds: * O ( N/B ) space + * query (log ) T O B N B * update (log ) O N B = N N N * ( log ) ( log ) construction O O N M B B B B B • Use of tools/techniques discussed yesterday as well as – Logarithmic method – Weight-balanced B-trees – Global rebuilding Lars Arge 4

  5. External memory data structures Interval Management • Problem: – Maintain N intervals with unique endpoints dynamically such that stabbing query with point x can be answered efficiently x • As in (one-dimensional) B-tree case we are interested in – space ( N ) O B – (log ) update O N B + – query (log T ) O B N B Lars Arge 5

  6. External memory data structures Interval Management: Static Solution • Sweep from left to right maintaining persistent B-tree – Insert interval when left endpoint is reached – Delete interval when right endpoint is reached x • Query x answered by reporting all intervals in B-tree at “time” x – ( N ) space O B + – (log ) query T O B N B N – construction using buffer technique ( log ) O N B B (log 2 N ) • Dynamic with insert bound using logarithmic method O B Lars Arge 6

  7. External memory data structures Internal Memory Logarithmic Method Idea • Given (semi-dynamic) structure D on set V – O (log N ) query, O (log N ) delete, O ( N log N ) construction • Logarithmic method: – Partition V into subsets V 0 , V 1 , … V log N , | V i | = 2 i or | V i | = 0 – Build D i on V i .................................. * Delete: O (log N ) 0 1 2 log N 2 2 2 2 * Query: Query each D i ÿ O (log 2 N ) * Insert: Find first empty D i and construct D i out of + þ − = 1 i j i 1 2 2 elements in V 0 , V 1 , … V i -1 = 0 j – O (2 i log 2 i ) construction ÿ O (log N ) per moved element (log 2 N – Element moved O (log N ) times ÿ ) amortized O Lars Arge 7

  8. External memory data structures External Logarithmic Method Idea • Decrease number of subsets V i (log 2 N to log B N to get ) query O .................................. B 0 1 2 log N B B B B B + þ − B < 1 i j i • Problem: Since 1 there are not enough elements in B = 0 j V 0 , V 1 , … V i -1 to build V i • Solution: We allow V i to contain any number of elements ≤ B i þ = < i i V B – Insert: Find first D i such that and construct new j 0 j D i from elements in V 0 , V 1 , … V i − − þ ≥ 1 1 i i V B * We move elements = 0 j j * If D i constructed in O ((| V i |/ B )log B | V i |) = O ( B i -1 log B N ) I/Os every moved element charged O (log B N ) I/Os (log 2 N * Element moved O (log B N ) times ÿ ) amortized O B Lars Arge 8

  9. External memory data structures External Logarithmic Method Idea • Given (semi-dynamic) linear space external data structure with + – I/O query (log ) O B N T B N – ( log ) I/O construction O N B B (– (log ) I/O delete) O N B ý • Linear space dynamic data structure with (log 2 + – I/O query ) T O B N B (log 2 N – I/O insert amortized ) O B (– I/O delete) (log ) O N B • Dynamic interval management + (log 2 – ) I/O query T O B N B (log 2 N ) – I/O insert amortized O B x Lars Arge 9

  10. External memory data structures Internal Interval Tree • Base tree on endpoints – “slab” X v associated with each node v • Interval stored in highest node v where it contains midpoint of X v • Intervals I v associated with v stored in – Left slab list sorted by left endpoint (search tree) – Right slab list sorted by right endpoint (search tree) ÿ Linear space and O (log N ) update (assuming fixed endpoint set) Lars Arge 10

  11. External memory data structures Internal Interval Tree x • Query with x on left side of midpoint of X root – Search left slab list left-right until finding non-stabbed interval – Recurse in left child ÿ O (log N+T ) query bound Lars Arge 11

  12. External memory data structures Externalizing Interval Tree • Natural idea: – Block tree – Use B-tree for slab lists • Number of stabbed intervals in large slab list may be small (or zero) – We can be forced to do I/O in each of O (log N ) nodes Lars Arge 12

  13. External memory data structures Externalizing Interval Tree Θ ( B ) multislab • Idea: ÿ height remains Θ – Decrease fan-out to (log ) ( B ) O N B Θ Θ – ( B ) slabs define ( B ) multislabs – Interval stored in two slab lists (as before) and one multislab list – Intervals in small multislab lists collected in underflow structure – Query answered in v by looking at 2 slab lists and not O (log N ) Lars Arge 13

  14. � External memory data structures External Interval Tree Θ • Base tree: Fan-out ( B ) B-tree on endpoints – Interval stored in highest node v where it contains slab boundary • Each internal node v contains: v Θ – Left slab list for each of slabs ( B ) $m$ blocks Θ – Right slab lists for each of slabs ( B ) Θ – multislab lists ( B ) – Underflow structure • Interval in set I v of intervals associated with v stored in – Left slab list of slab containing left endpoint v Θ ( B ) – Right slab list of slab containing right endpoint – Widest multislab list it spans • If < B intervals in multislab list they are instead stored in underflow B 2 intervals) structure ( ÿ contains Lars Arge 14

  15. External memory data structures External Interval tree • Each leaf contains O ( B ) intervals (unique endpoint assumption) – Stored in one O ( 1 ) block • Slab lists implemented using B-trees + T v – query ( 1 ) O B – Linear space Θ * We may “wasted” a block for each of the ( B ) lists in node Θ N ( ) * But only internal nodes B B • Underflow structure implemented using static structure 2 + = + T T – query (log ) ( 1 ) O B O v v v B B B Θ – Linear space ( B ) ý • Linear space Lars Arge 15

  16. External memory data structures External Interval Tree v $m$ blocks • Query with x – Search down tree for x while in node v reporting all intervals in I v stabbed by x • In node v – Query two slab lists – Report all intervals in relevant multislab lists – Query underflow structure • Analysis: – Visit (log ) nodes O N B – Query slab lists ÿ + (log T ) O B N + T v B – Query multislab lists ( 1 ) O B – Query underflow structure Lars Arge 16

  17. � External memory data structures External Interval Tree • Update (assuming fixed endpoint set – static base tree): – Search for relevant node v Θ (log ) O N ( B ) – Update two slab lists B – Update multislab list or underflow structure • Update of underflow structure in O ( 1 ) I/Os amortized – Maintain update block with B updates – Check of update block adds O ( 1 ) I/Os to query bound – Rebuild structure when B updates have been collected using 2 = 2 B ( log ) ( ) I/Os (Global rebuilding) O B O B B B ý Update in (log ) I/Os amortized O N B Lars Arge 17

  18. External memory data structures External Interval Tree • Note: – Insert may increase number of intervals in underflow structure for same multislab to B – Delete may decrease number of intervals in multislab to B ý Need to move B intervals to/from multislab/underflow structure • We only move – intervals from multislab list when decreasing to size B/2 – Intervals to multislab list when increasing to size B ý O ( 1 ) I/Os amortized used to move intervals Lars Arge 18

  19. External memory data structures Removing Fixed Endpoint Assumption • We need to use dynamic base tree – Natural choice is B-tree v • Insertion: – Insert new endpoints and rebalance base tree (using splits) – Insert interval as previously in (log ) I/Os amortized O N B v’ v’’ • Split: Boundary in v becomes boundary in parent ( v ) Lars Arge 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend