Multi-Attribute Range Queries on Read-Only DHT Verdi March, Yong - - PowerPoint PPT Presentation

multi attribute range queries on read only dht
SMART_READER_LITE
LIVE PREVIEW

Multi-Attribute Range Queries on Read-Only DHT Verdi March, Yong - - PowerPoint PPT Presentation

Multi-Attribute Range Queries on Read-Only DHT Verdi March, Yong Meng Teo Department of Computer Science National University of Singapore Email: [ verdimar,teoym]@comp.nus.edu.sg 11 September 2006 ICCCN2006 1 Outline Introduction to


slide-1
SLIDE 1

11 September 2006 ICCCN2006 1

Multi-Attribute Range Queries

  • n Read-Only DHT

Verdi March, Yong Meng Teo

Department of Computer Science National University of Singapore

Email: [ verdimar,teoym]@comp.nus.edu.sg

slide-2
SLIDE 2

ICCCN 2006 11 September 2006 2

Outline

Introduction to R-DHT Problem Statement Related Works Midas Indexing Range-Query Optimizations Analysis Conclusion

slide-3
SLIDE 3

ICCCN 2006 11 September 2006 3

Introduction

  • Goal: provide lookup service in large distributed systems with

minimum dependency to a 3rd-party infrastructure

Effective : result guarantee (minimize false negative) Efficient

: short bounded lookup path length, scalable to # nodes

  • DHT : distributed implementation of hash-table abstractions,

i.e. ‹key, value›, get(key), and put(key, value)

Distributed file system (CFS, PAST) Multicast (Scribe) RSS distribution (Corona , FeedTree) Grid

resource discovery (DGRID, MAAN, Self-Organizing Condor, RIC, XenoSearch)

slide-4
SLIDE 4

ICCCN 2006 11 September 2006 4

DHT Lookups

User: lookup key k DHT: walk along a path to a certain direction User: I’ve walk 10 steps, and I haven’t see k DHT: Continue 10 steps. … User: I’ve been walking for a total of 50 steps DHT: Look around. If k is not around, then k does not exist

slide-5
SLIDE 5

ICCCN 2006 11 September 2006 5

DHT Concepts

14 56 10 54 21 38 55

  • Map keys to nodes. Keys (and

values) are stored to the responsible nodes

Node = bucket Locating a key is equals to

locating the responsible node

  • Structured
  • verlay

network: topology + nodes ordering

Routing

to a node in short bounded path length

Node maintains a small number

  • f routing states

Higher result guarantee Scalability Data items are distributed across the overlay network, and this is controlled by the hash function. Nodes under different adm. domain (e.g. commercial organization):

Ownership, don’t proactively “push” data Self-interest to protect investment

slide-6
SLIDE 6

ICCCN 2006 11 September 2006 6

R-DHT Framework

A class of DHT Framework to turn existing DHT into a read-only version No distribution of key-value pairs Each node stores only its own key-value pairs (data

items)

Keys are mapped to their original location

Conventional DHT R-DHT Yes No Yes Yes Hash-Table Abstraction Store Lookup

slide-7
SLIDE 7

ICCCN 2006 11 September 2006 7

R-DHT

Virtualize 2 | 3 9 | 3 5 | 6 2 | 9 5 | 9 9 | 9 Organize R-Chord 2 | 3 9 | 3 5 | 6 2 | 9 5 | 9 9 | 9 S2 S9 S5 2 9 5 5 2 9 3 6 9 Key k Host Identifier h m-bit m-bit k| h = Lookup is O(log N) hops:

  • similar with Chord
  • N = # hosts
slide-8
SLIDE 8

ICCCN 2006 11 September 2006 8

R-DHT Example

Resource Type 2 Resource Type 9 Administrative Domain 3 MDS

R-DHT Terminologies

2 | 3 9 | 3

Virtualize

Chord-based R-DHT Overlay

2 | 3 9 | 3

Organize 2 9 Host Keys

T3 = { 2 , 9 }

3 m-bit identifier space 2m-bit identifier space

slide-9
SLIDE 9

ICCCN 2006 11 September 2006 9

Outline

  • Introduction to R-DHT
  • Problem Statem ent
  • Related Works
  • Midas

Indexing Range-Query Optimizations

  • Analysis
  • Conclusion
slide-10
SLIDE 10

ICCCN 2006 11 September 2006 10

Multi-Attribute Resources

Basic lookup operation in DHT supports only exact queries lookup(3) to search resource type 3 Ongoing research for efficient multi-attribute range queries

in DHT

Resource type is described by d attributes: cpu and ram A multi-attribute range query:

Find resources where { cpu= * , ‘1 GB’ ≤ ram ≤ ‘2 GB’}

slide-11
SLIDE 11

ICCCN 2006 11 September 2006 11

Modeling Multi-Attribute Resource

d-attribute resource type d-dimensional attribute space Dimension : attribute Point

: resource type (≥ 1 resource instances)

cpu ram

P3 P4 1 GB 2 GB

2-Dimensional Attribute Space

We index resources by their type (the d attributes)

slide-12
SLIDE 12

ICCCN 2006 11 September 2006 12

Proposed Scheme

Objective: efficient searching

through multi-dimensional indexing on top of R-DHT to answer multi-attribute range queries

Find { cpu= ‘P3’, ‘1 GB’ ≤ ram ≤ ‘2 GB’} Our approach, Midas, is based on d-to-one

mapping scheme

Multi-dimensional indexing of resource types Search strategy to efficiently retrieve answers

slide-13
SLIDE 13

ICCCN 2006 11 September 2006 13

Contribution

Midas scheme to support multi-attribute range queries on

R-DHT

Study on the implication of data-item distribution to the

performance of multi-attribute range queries

slide-14
SLIDE 14

ICCCN 2006 11 September 2006 14

Outline

  • Introduction to R-DHT
  • Problem Statement
  • Related W orks
  • Midas

Indexing Range-Query Optimizations

  • Analysis
  • Conclusion
slide-15
SLIDE 15

ICCCN 2006 11 September 2006 15

Related Works (1)

d-to-d Mapping d-to-one Mapping Distributed Inverted Index 1-dimensional DHT d-dimensional DHT d-Attribute Resource Type Ring: Chord, Pastry Tree: Kademlia d-dimensional torus: CAN

slide-16
SLIDE 16

ICCCN 2006 11 September 2006 16

Related Works (2)

Distributed Inverted Index MAAN (Cai et. al., 2004), CANDy (Bauer et. al., 2004),

Harren 2002, KSS (Gnawali 2002), and MLP (Shi et. al., 2004)

d-to-d Mapping pSearch (Tang et. al., 2003), MURK (Ganesan et. al.,

2004), and 2CAN (Agrawal et. al., 2005)

d-to-one Mapping Squid (Schmidt et. al., 2003), CONE (Agrawal et. al.,

2005), ZNet (Shu et. al., 2005), SCRAP (Ganesan et. al., 2004), and CISS (Lee et. al., 2004)

slide-17
SLIDE 17

ICCCN 2006 11 September 2006 17

Distributed Inverted Index (1)

h(‘P3’) = 1 h(‘1 GB’) = 30

1 30 56

Resource R = { cpu= ‘P3’, ram= ‘ 1GB’}

store store

Order-Preserving Hashing Indexing: store each key to the DHT

slide-18
SLIDE 18

ICCCN 2006 11 September 2006 18

Distributed Inverted Index (2)

h(‘P3’) = 1 h(‘1 GB’) = 30

1 30 56

Find resource where { cpu= ‘P3’, ram= ‘ 1GB’}

RS1 = σcpu = P3 RS2 = σram = 1 GB RS1 ∩ RS2

1 30 56

RS1 = σcpu = P3 RS2 = RS1 ∩ σram = 1 GB

1 30 56

RS = σcpu = P3 ∩ σram = 1 GB

slide-19
SLIDE 19

ICCCN 2006 11 September 2006 19

d-to-d Mapping

cpu ram Resource type

Maps d-dimensional attribute space

to d-dimensional DHT (CAN)

With the exception of 2CAN,

which maps d-dimensional attribute space to 2d- dimensional CAN

Range query is modeled as a region

in d-dimensional space

Route a search request to any point

in the query region

Flood to the remaining points in the

region

slide-20
SLIDE 20

ICCCN 2006 11 September 2006 20

d-to-one Mapping

cpu ram

hash(sparc, 4 GB) = 10 hash(P3, 1 GB) = 3 8 48 56 3 10

Map point in d-dimensional space to one-dimensional key Store keys to DHT For indexing resources and query processing

slide-21
SLIDE 21

ICCCN 2006 11 September 2006 21

Outline

  • Introduction to R-DHT
  • Problem Statement
  • Related Works
  • Midas

I ndexing Range-Query Optim izations

  • Analysis
  • Conclusion
slide-22
SLIDE 22

ICCCN 2006 11 September 2006 22

Midas Framework

Resource r R-DHT Key k d-to-one mapping R-DHT mapping Query q { k} d-to-one mapping R-DHT lookups Search Keys I ndexing Query Processing

slide-23
SLIDE 23

ICCCN 2006 11 September 2006 23

Space-Filling Curve

Hilbert

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 2 3 1 2 3

(3, 0) = 15

Hilbert SFC is an example of d-to-one mapping function

slide-24
SLIDE 24

ICCCN 2006 11 September 2006 24

Indexing

r = (cpu= ‘P3’, memory= ‘1 GB’) = (3, 0) k = 15 3 cpu memory 1 5 nk,h = 15| h 1 5 | h Key k Host h Virtualization Organize S15

m-bit 2m-bit

slide-25
SLIDE 25

ICCCN 2006 11 September 2006 25

Query Processing

Search keys = { 1 , 2, 13, 14} Result set = { } lookup(1) lookup(2) Search keys = { 2 , 13, 14} Result set = { 1} Search keys = { 1 3 , 14} Result set = { 1, 2}

1 2 3 1 2 14 13

Search keys = { } Result set = { 1, 2} S1 S2 S15 S3 lookup(13)

slide-26
SLIDE 26

ICCCN 2006 11 September 2006 26

Outline

  • Introduction to R-DHT
  • Problem Statement
  • Related Works
  • Midas

Indexing Range-Query Optimizations

  • Analysis
  • Conclusion
slide-27
SLIDE 27

ICCCN 2006 11 September 2006 27

Experiment Setup

Compare Midas on R-Chord and Chord Parameters m = 16-bit d = 3–4 K = 10,000–50,000

Keys follow normal distribution in d-dimensional

space

N = 25,000

Each administrative domain has 4–10 resource types

Query selectivity = 1% (of 2m)

slide-28
SLIDE 28

ICCCN 2006 11 September 2006 28

Resiliency

: ability to locate available resources when FN nodes fail simultaneously (0 ≤ F ≤ 1)

Resources are not replicated (i.e. we are not looking at resource

availability)

With R-Chord as the underlying infrastructure, nearly all keys are

retrieved

Though no replication In Chord, without replication, # keys retrieved is affected by F

Resiliency to Node Failures (1)

slide-29
SLIDE 29

ICCCN 2006 11 September 2006 29

Resiliency to Node Failures (2)

In R-DHT, node is responsible for only one key, i.e., its own

resources

In conventional DHT, node is responsible for several keys

(even clusters), i.e., index other resources. When a node is down, it affects resources belonging to other nodes. n’ k n r

1 2

r k

slide-30
SLIDE 30

ICCCN 2006 11 September 2006 30

Query Cost

Expected size of result set is affected by d

Size of d-dimensional space

is fixed (2m)

Increasing d causes the

space to be more compact and dense

In Chord, cost is constant Query selectivity (size of

query region) is constant

In R-Chord, cost is affected

by size of result set

slide-31
SLIDE 31

ICCCN 2006 11 September 2006 31

Query Cost

Cost = # hops visited In Chord, cost is constant Query selectivity (size of query region) is constant Cost is determined by size of overlay network (N) In R-Chord, cost is affected by size of result set (which in turn, is affected by K)

  • Performance hit is due to # lookups, not the cost (i.e. path length) of

individual lookup

slide-32
SLIDE 32

ICCCN 2006 11 September 2006 32

Query Cost (2)

Assume keys are uniformly distributed Query selectivity is 0 ≤ s ≤ 1 Result set contains pK keys 0 ≤ p ≤ 1 Conventional DHT visits sN nodes, i.e. all nodes responsible

for query region

R-DHT visits pK nodes, i.e. equals to # keys (# answers)

DHT sN nodes pK nodes R-DHT s p

slide-33
SLIDE 33

ICCCN 2006 11 September 2006 33

Amortizing Query Cost

Conventional DHT separates keys and resources At the end, still need to contact the administrative

domain that shares the resources n’ k n r

1 2

r k

Conventional DHT R-DHT

slide-34
SLIDE 34

ICCCN 2006 11 September 2006 34

Query Performance under Churn

Metrics: lookup resiliency under churn R-Chord performs reasonably well, considering that its

  • verlay is larger (7x) than Chord.

When K is increased, R-Chord cannot effectively exploits

the segment-based overlay to support redundancy of routing tables.

slide-35
SLIDE 35

ICCCN 2006 11 September 2006 35

Outline

  • Introduction to R-DHT
  • Problem Statement
  • Related Works
  • Midas

Indexing Range-Query Optimizations

  • Analysis
  • Conclusion
slide-36
SLIDE 36

ICCCN 2006 11 September 2006 36

Conclusion

  • Midas

Indexing

: resource type key R-DHT node

Query engine

: incremental search + key elimination

  • Implication of data-item distribution to performance of query

processing

R-DHT achieves high lookup resiliency without requiring

replication

R-DHT query cost is due to a higher number of lookup

  • perations are needed

R-DHT is more suitable for large range queries with small

result set

To improve query performance in R-DHT, allow selective data-

item distributions