ExternalMemoryGeometricDataStructures LarsArge DukeUniversity - - PowerPoint PPT Presentation

external memory geometric data structures
SMART_READER_LITE
LIVE PREVIEW

ExternalMemoryGeometricDataStructures LarsArge DukeUniversity - - PowerPoint PPT Presentation

ExternalMemoryGeometricDataStructures LarsArge DukeUniversity June29,2002 SummerSchoolonMassiveDatasets Externalmemorydatastructures SoFarSoGood Yesterdaywediscussed


slide-1
SLIDE 1

ExternalMemoryGeometricDataStructures

LarsArge DukeUniversity

June29,2002

SummerSchoolonMassiveDatasets

slide-2
SLIDE 2

LarsArge Externalmemorydatastructures 2

SoFarSoGood

  • Yesterdaywediscussed “dimension1.5”problems:

– Intervalstabbing andpointlocation

  • Wedevelopedanumberofusefultools/techniques

– Logarithmicmethod – Weight-balancedB-trees – Globalrebuilding

  • On Thursdaywealsodiscussedseveraltools/techniques

– B-trees – PersistentB-trees – Constructionusingbuffertechnique

slide-3
SLIDE 3

LarsArge Externalmemorydatastructures 3

  • MaintainN intervalswithuniqueendpointsdynamicallysuchthat

stabbingquerywithpointxcanbeansweredefficiently

  • Solvedusingexternalintervaltree
  • Weobtainedthesameboundsasforthe1d case

– Space:O(N/B) – Query: – Updates:I/Os

IntervalManagement

x

) (log N O

B

) (log

B T B N

O +

slide-4
SLIDE 4

LarsArge Externalmemorydatastructures 4

IntervalManagement

  • Externalintervaltree:

– Fan-outweight-balancedB-tree onendpoints – IntervalsstoredinO(B)secondarystructureineachinternalnode – Queryefficiencyusingfiltering – Bootstrapping usedtoavoidO(B)searchcostineachnode * SizeO(B2)underflowstructureineachnode * ConstructedusingsweepandpersistentB-tree * Dynamicusingglobalrebuilding

$m$blocks

v

) ( B Θ

v ) ( B Θ

slide-5
SLIDE 5

LarsArge Externalmemorydatastructures 5

3-SidedRangeSearching

  • Intervalmanagementcorrespondstosimpleformof2d rangesearch
  • Moregeneralproblem:Dynamic 3-sidederangesearching

– Maintainsetofpointsinplanesuch thatgivenquery(q1, q2,q3),allpoints (x,y)withq1 ≤ x ≤ q2 andy ≥ q3 can befoundefficiently

(x,x) (x1,x2) x x1 x2 q3 q2 q1

slide-6
SLIDE 6

LarsArge Externalmemorydatastructures 6

3-SidedRangeSearching:StaticSolution

  • Construction:Sweep top-downinsertingx inpersistentB-treeat(x,y)

– O(N/B)space – I/Oconstructionusingbuffertechnique

  • Query(q1, q2,q3):Performrangequerywith[q1,q2]inB-treeatq3

– I/Os

  • Dynamicusinglogarithmicmethod

– Insert: – Query:

  • Improveto?Deletes?

q3 q2 q1

) (log

B T B N

O + ) (log2 N O

B

) log ( N O

B B N

) (log2

B T B N

O + ) (log N O

B

slide-7
SLIDE 7

LarsArge Externalmemorydatastructures 7

  • Basetreeonx-coordinates withnodesaugmentedwithpoints
  • Heapony-coordinates

– Decreasing yvaluesonroot-leafpath – (x,y)onpathfromroottoleafholdingx – Ifv holdspointthenparent(v)holdspoint

InternalPrioritySearchTree

9 16.20 16 19,9 13 13,3 19 20,3 4 5,6 5 9,4 1 1,2 20 19 16 13 9 5 4 4,1 1

slide-8
SLIDE 8

LarsArge Externalmemorydatastructures 8

  • Linearspace
  • Insertof(x,y)(assumingfixedx-coordinateset):

– Compareywithy-coordinateinroot – Smaller:Recursivelyinsert (x,y)insubtree onpathtox – Bigger:Insertinrootandrecursivelyinsertoldpointinsubtree O(log N)update

InternalPrioritySearchTree

9 16.20 16 19,9 13 13,3 19 20,3 4 5,6 5 9,4 1 1,2 20 19 16 13 9 5 4 4,1 1

Insert(10,21)

10,21

slide-9
SLIDE 9

LarsArge Externalmemorydatastructures 9

InternalPrioritySearchTree

  • Query with(q1, q2,q3)startingatrootv:

– Reportpointinv ifsatisfyingquery – Visitbothchildrenofv ifpointreported – Alwaysvisitchild(s)ofv onpath(s)toq1 and q2 O(log N+T)query

9 16.20 16 19,9 13 13,3 19 20,3 4 5,6 5 9,4 1 1,2 20 19 16 13 9 5 4 4,1 1

4 19 4

slide-10
SLIDE 10

LarsArge Externalmemorydatastructures 10

  • Naturalidea:Blocktree
  • Problem:

– I/Ostofollowpathstoto q1 and q2 – But O(T)I/Osmaybeusedtovisitothernodes(“overshooting”)

  • query

ExternalizingPrioritySearchTree

9 16.20 16 19,9 13 13,3 19 20,3 4 5,6 5 9,4 1 1,2 20 19 16 13 9 5 4 4,1 1

) (log N O

B

) (log T N O

B

+

slide-11
SLIDE 11

LarsArge Externalmemorydatastructures 11

ExternalizingPrioritySearchTree

  • Solutionidea:

– StoreB pointsineachnode * O(B2)pointsstoredineachsupernode * B outputpointscanpayfor“overshooting” – Bootstrapping: * StoreO(B2)pointsineachsupernode instaticstructure

9 16.20 16 19,9 13 13,3 19 20,3 4 5,6 5 9,4 1 1,2 20 19 16 13 9 5 4 4,1 1

slide-12
SLIDE 12

LarsArge Externalmemorydatastructures 12

ExternalPrioritySearchTree

  • Basetree:Weight-balancedB-treeonx-coordinates(a,k=B)
  • Pointsin“heaporder”:

– RootstoresB toppointsforeachofthechildslabs – Remainingpointsstoredrecursively

  • Pointsineachnodestoredin“O(B2)-structure”

– PersistentB-treestructureforstaticproblem

  • Linearspace

) (B Θ

) (B Θ

slide-13
SLIDE 13

LarsArge Externalmemorydatastructures 13

ExternalPrioritySearchTree

  • Query with(q1, q2,q3)startingatrootv:

– QueryO(B2)-structureandreportpointssatisfyingquery – Visitchildv if * v onpathtoq1 or q2 * Allpointscorrespondingtov satisfyquery

slide-14
SLIDE 14

LarsArge Externalmemorydatastructures 14

ExternalPrioritySearchTree

  • Analysis:

– I/Osusedtovisitnodev – nodesonpathtoq1 or q2 – Foreachnodev notonpathtoq1 or q2visited,Bpointsreported inparent(v)

  • query

) 1 ( ) (log

2 B T B T B

v v

O B O + = + ) (log N O

B

) (log

B T B N

O +

slide-15
SLIDE 15

LarsArge Externalmemorydatastructures 15

ExternalPrioritySearchTree

  • Insert(x,y) (assumingfixedx-coordinateset– staticbasetree):

– Findrelevantnodev: * QueryO(B2)-structuretofind Bpointsinrootcorresponding tonodeu onpathtox * Ify smallerthany-coordinates

  • fallB pointsthenrecursively

searchinu – Insert(x,y) inO(B2)-structureofv – IfO(B2)-structurecontains>B pointsforchildu,removelowest pointandinsertrecursivelyinu

  • Delete:Similarly

u

slide-16
SLIDE 16

LarsArge Externalmemorydatastructures 16

  • Analysis:

– Queryvisitsnodes – O(B2)-structurequeried/updatedineachnode * Onequery * Oneinsertandonedelete

  • O(B2)-structureanalysis:

– Query: – UpdateinO(1)I/Osusingupdate blockandglobalrebuilding

  • I/Os

ExternalPrioritySearchTree

u

) (log N O

B

) 1 ( ) / (log

2

O B B B O

B

= + ) (log N O

B

slide-17
SLIDE 17

LarsArge Externalmemorydatastructures 17

RemovingFixedx-coordinateSetAssumption

  • Deletion:

– Deletepointaspreviously – Deletex-coordinatefrombase treeusingglobalrebuilding

  • I/Osamortized
  • Insertion:

– Insertx-coordinateinbasetree andrebalance(usingsplits) – Insertpointaspreviously

  • Split:Boundaryinv becomesboundaryinparent(v)

) (log N O

B

v v’’ v’

slide-18
SLIDE 18

LarsArge Externalmemorydatastructures 18

RemovingFixedx-coordinateSetAssumption

  • Split:Whenv splitsB newpointsneededinparent(v)
  • Onepointobtainedfromv’ (v’’)using“bubble-up”operation:

– Findtoppointp inv’ – Insertp inO(B2)-structure – Removep fromO(B2)-structureofv’ – Recursivelybubble-uppointtov

  • Bubble-up inI/Os

– Followonepathfromv toleaf – UsesO(1)I/Oineachnode

  • SplitinI/Os

v’’ v’

)) ( (log v w O

B

)) ( ( )) ( log ( v w O v w B O

B

=

slide-19
SLIDE 19

LarsArge Externalmemorydatastructures 19

RemovingFixedx-coordinateSetAssumption

  • O(1)amortizedsplitcost:

– Cost:O(w(v)) – Weightbalancedbasetree:insertsbelowvbetweensplits

  • ExternalPrioritySearchTree

– Space:O(N/B) – Query: – Updates:I/Osamortized

  • Amortizationcanberemovedfromupdateboundinseveralways

– Utilizinglazyrebuilding )) ( ( v w Ω ) (log N O

B

) (log

B T B N

O +

v’’ v’

slide-20
SLIDE 20

LarsArge Externalmemorydatastructures 20

Summary:3-sidedRangeSearching

  • 3-sidederangesearching

– Maintainsetofpointsinplanesuch thatgivenquery(q1, q2,q3),allpoints (x,y)withq1 ≤ x ≤ q2 andy ≥ q3 can befoundefficiently

  • Weobtainedthesameboundsasforthe1dcase

– Space:O(N/B) – Query: – Updates:I/Os

q3 q2 q1

) (log

B T B N

O + ) (log N O

B

slide-21
SLIDE 21

LarsArge Externalmemorydatastructures 21

Summary:3-sidedRangeSearching

  • Mainproblemindesigningexternalpriority

searchtreewastheincreasedfanout in combinationwith“overshooting”

  • Samegeneral solutiontechniquesasinintervaltree:

– Bootstrapping: * UseO(B2)sizestructureineachinternalnode * Constructedusingpersistence * Dynamicusingglobalrebuilding – Weight-balancedB-tree:Split/fuseinamortizedO(1) – Filtering:Chargepartofquerycosttooutput

q3 q2 q1

slide-22
SLIDE 22

LarsArge Externalmemorydatastructures 22

Two-DimensionalRangeSearch

  • Wehavenowdiscussedstructuresforspecialcases oftwo-

dimensionalrangesearching – Space:O(N/B) – Query: – Updates:

  • Cannotbeobtainedforgeneral2d rangesearching:

– queryrequiresspace – spacerequiresquery

q3 q2 q1 q q q3 q2 q1 q4

) (

log log log N N B N

B B B

  • )

(log N O

c B

) ( B

N

O ) (

B N

Ω ) (log N O

B

) (log

B T B N

O +

slide-23
SLIDE 23

LarsArge Externalmemorydatastructures 23

  • Basetree:Fan-outweightbalancedtreeonx-coordinates
  • height
  • Pointsbeloweachnodestoredin4linearspacesecondarystructures:

– “Right”prioritysearchtree – “Left”prioritysearchtree – B-tree ony-coordinates – Intervaltree

  • space

ExternalRangeTree

) (

log log log N N

B B B

O ) (log N

B

Θ ) (

log log log N N B N

B B B

  • )

(log N

B

Θ

slide-24
SLIDE 24

LarsArge Externalmemorydatastructures 24

  • Secondaryintervaltreestructure:

– Connectpointsineachslabiny-order – Projectobtainedsegmentsiny-axis – Intervalsstoredinintervaltree * Intervalaugmentedwithpointertocorrespondingpointsiny- coordinateB-treeincorrespondingchildnode

ExternalRangeTree

) (log N

B

Θ

slide-25
SLIDE 25

LarsArge Externalmemorydatastructures 25

  • Query with(q1, q2,q3,q4)answeredin topnodewithq1 and q2 in

differentslabsv1 andv2

  • Pointsinslabv1

– Foundwith3-sidedqueryinv1 usingrightprioritysearchtree

  • Pointsinslabv2

– Foundwith3-sidedqueryinv2 usingleftprioritysearchtree

  • Pointsinslabsbetweenv1 andv2

– Answerstabbingquerywithq3 usingintervaltree firstpointaboveq3 ineachoftheslabs – Findpointsusingy-coordinateB-treeinslabs

ExternalRangeTree

) (log N O

B

) (log N O

B

) (log N

B

Θ v1 v2

slide-26
SLIDE 26

LarsArge Externalmemorydatastructures 26

ExternalRangeTree

  • Queryanalysis:

– I/Ostofindrelevantnode – I/Ostoanswertwo3-sidedqueries – I/Ostoqueryintervaltree – I/OstotraverseB-trees

  • I/Os

) (log N O

B

) (log

B T B N

O + ) (log ) (log

log

N O N O

B B N B

B

= + ) (log

B T B N

O + ) (log N O

B

) (log

B T B N

O +

) (log N

B

Θ v1 v2

slide-27
SLIDE 27

LarsArge Externalmemorydatastructures 27

ExternalRangeTree

  • Insert:

– Insertx-coordinateinweight-balancedB-tree * SplitofvcanbeperformedinI/Os

  • I/Os

– Updatesecondarystructuresinallnodeson one root-leafpath * Updateprioritysearchtrees * Updateintervaltree * UpdateB-tree

  • I/Os
  • Delete:

– Similarandusingglobalrebuilding ) (

log log log N N

B B B

O ) (

log log log2 N N

B B B

O )) ( log ) ( ( v w v w O

B

) (

log log log2 N N

B B B

O

) (log N

B

Θ v1 v2

slide-28
SLIDE 28

LarsArge Externalmemorydatastructures 28

Summary:ExternalRangeTree

  • 2d rangesearching inspace

– I/Oquery – I/Oupdate

  • Optimal amongquerystructures

) (

log log log2 N N

B B B

O ) (log

B T B N

O + ) (

log log log N N B N

B B B

O ) (log

B T B N

O +

q3 q2 q1 q4

slide-29
SLIDE 29

LarsArge Externalmemorydatastructures 29

kdB-tree

  • kd-tree:

– Recursivesubdivisionofpoint-setintotwohalfusing vertical/horizontalline – Horizontallineonevenlevels,verticalonunevenlevels – Onepointineachleaf

  • Linearspaceandlogarithmicheight
slide-30
SLIDE 30

LarsArge Externalmemorydatastructures 30

kdB-tree

  • Query:

– Recursivelyvisitnodecorrespondingtoregionsintersectedquery – Reportpointintrees/nodescompletelycontainedinquery

  • Analysis:

– Numberofregionsintersectinghorizontallinesatisfyrecurrence Q(N)=2+2Q(N/4) Q(N)= – Queryintersectsregions ) ( N O ) ( ) ( 4 T N O T N O + = + ⋅

slide-31
SLIDE 31

LarsArge Externalmemorydatastructures 31

kdB-tree

  • KdB-tree:

– Blockingofkd-treebutwithB pointineachleaf

  • Query asbefore

– AnalysisasbeforeexceptthateachregionnowcontainsB points

  • I/Oquery

) (

B T B N

O +

slide-32
SLIDE 32

LarsArge Externalmemorydatastructures 32

kdB-tree

  • kdB-treecanbeconstructed inI/Os

– somewhatcomplicated

  • Dynamicusinglogarithmicmethod:

– I/Oquery – I/Oupdate – O(N/B)space ) (

B T B N

O + ) (log2 N O

B

) log ( N O

B B N

slide-33
SLIDE 33

LarsArge Externalmemorydatastructures 33

O-TreeStructure

  • O-tree:

– B-treeonvertical slabs – B-treeonhorizontal slabsineachverticalslab – kdB-treeonpointsineachleaf

N

B B N

log

) log ( N

B B N

Θ ) log ( ) (

2 ) log (

2

N B

B N N

B B N

Θ = Θ

N

B B N 2

log

) log ( N

B B N

Θ

N B

B 2

log

slide-34
SLIDE 34

LarsArge Externalmemorydatastructures 34

O-TreeQuery

  • Performrangesearchwithq1 andq2 invertical B-tree

– QueryallkdB-trees inleavesoftwohorizontal B-treeswithx- intervalintersectedbutnotspannedbyquery – Performrangesearchwithq3 andq4 horizontal B-treeswithx- intervalspannedbyquery * QueryallkdB-trees withrangeintersectedbyquery

N

B B N

log

N

B B N 2

log

N B

B 2

log

slide-35
SLIDE 35

LarsArge Externalmemorydatastructures 35

O-TreeQueryAnalysis

  • Vertical B-treequery:
  • QueryofallkdB-trees inleavesoftwohorizontal B-trees:
  • Queryhorizontal B-trees:
  • QuerykdB-trees notcompletelyinquery
  • QueryinkdB-trees completely

containedinquery:

  • I/Os

) ( )) log ( (log

B N B B N B

O N O = ) ( ) log ( ) log (

2 B T B N B T B B B N

O B N B O N O + = + ⋅ ) log ( N O

B B N

) ( )) log ( (log ) log (

B N B B N B B B N

O N O N O = ⋅ ) ( ) log ( ) log ( 2

2 B T B N B T B B B N

O B N B O N O + = + ⋅ ⋅ ) ( B

T

O ) (

B T B N

O + ) log ( 2 N O

B B N

slide-36
SLIDE 36

LarsArge Externalmemorydatastructures 36

O-TreeUpdate

  • Insert:

– Searchinvertical B-tree:I/Os – Searchinhorizontal B-tree:I/Os – InsertinkdB-tree:I/Os

  • Useglobalrebuilding whenstructuresgrowtoobig/small

– B-treesnotcontainelements – kdB-treesnotcontainelements

  • I/Os
  • Deletes canbehandled

inI/Ossimilarly ) log ( N

B B N

Θ ) log (

2 N

B

B

Θ ) (log N O

B

) (log N O

B

) (log )) log ( (log

2 2

N O N B O

B B B

= ) (log N O

B

) (log N O

B

slide-37
SLIDE 37

LarsArge Externalmemorydatastructures 37

Summary:O-Tree

  • 2d rangesearching inlinearspace

– I/Oquery – I/Oupdate

  • Optimal amongstructures

usinglinearspace

  • Canbeextendedtoworkind-dimensions

withoptimalquerybound

q3 q2 q1 q4

) (log N O

B

) (

B T B N

O + ) ) ((

1

1 B T B N

d

O +

slide-38
SLIDE 38

LarsArge Externalmemorydatastructures 38

Summary:3and4-sidedRangeSearch

  • 3-sided2drangesearching:Externalprioritysearchtree

– query,space, update

  • General(4-sided)2d rangesearching:

– Externalrangetree:query,space, update – O-tree:query,space, update

q3 q2 q1 q3 q2 q1 q4

) (

log log log N N B N

B B B

  • )

(log

B T B N

O + ) ( B

N

O ) (

B T B N

+ Ω ) (log N O

B

) (log

B T B N

O + ) (log N O

B

) (

log log log2 N N

B B B

O ) ( B

N

O

slide-39
SLIDE 39

LarsArge Externalmemorydatastructures 39

Techniques(onefinaltime)

  • Tools:

– B-trees – PersistentB-trees – Buffertrees – Logarithmicmethod – Weight-balancedB-trees – Globalrebuilding

  • Techniques:

– Bootstrapping – Filtering

q3 q2 q1 q3 q2 q1 q4 (x,x)

slide-40
SLIDE 40

LarsArge Externalmemorydatastructures 40

Otherresults

  • Manyotherresults fore.g.

– Higherdimensionalrangesearching – Rangecounting – Halfspace (andotherspecialcases)ofrangesearching – Structuresformovingobjects – Proximityqueries

  • Manyheuristicstructures indatabasecommunity
  • Implementationefforts:

– LEDA-SM(MPI) – TPIE(Duke)

slide-41
SLIDE 41

LarsArge Externalmemorydatastructures 41

THEEND