11 September 2006 ICCCN2006 1
Multi-Attribute Range Queries
- n Read-Only DHT
Multi-Attribute Range Queries on Read-Only DHT Verdi March, Yong - - PowerPoint PPT Presentation
Multi-Attribute Range Queries on Read-Only DHT Verdi March, Yong Meng Teo Department of Computer Science National University of Singapore Email: [ verdimar,teoym]@comp.nus.edu.sg 11 September 2006 ICCCN2006 1 Outline Introduction to
11 September 2006 ICCCN2006 1
ICCCN 2006 11 September 2006 2
ICCCN 2006 11 September 2006 3
Effective : result guarantee (minimize false negative) Efficient
Distributed file system (CFS, PAST) Multicast (Scribe) RSS distribution (Corona , FeedTree) Grid
ICCCN 2006 11 September 2006 4
User: lookup key k DHT: walk along a path to a certain direction User: I’ve walk 10 steps, and I haven’t see k DHT: Continue 10 steps. … User: I’ve been walking for a total of 50 steps DHT: Look around. If k is not around, then k does not exist
ICCCN 2006 11 September 2006 5
14 56 10 54 21 38 55
Node = bucket Locating a key is equals to
Routing
Node maintains a small number
Higher result guarantee Scalability Data items are distributed across the overlay network, and this is controlled by the hash function. Nodes under different adm. domain (e.g. commercial organization):
Ownership, don’t proactively “push” data Self-interest to protect investment
ICCCN 2006 11 September 2006 6
A class of DHT Framework to turn existing DHT into a read-only version No distribution of key-value pairs Each node stores only its own key-value pairs (data
Keys are mapped to their original location
Conventional DHT R-DHT Yes No Yes Yes Hash-Table Abstraction Store Lookup
ICCCN 2006 11 September 2006 7
Virtualize 2 | 3 9 | 3 5 | 6 2 | 9 5 | 9 9 | 9 Organize R-Chord 2 | 3 9 | 3 5 | 6 2 | 9 5 | 9 9 | 9 S2 S9 S5 2 9 5 5 2 9 3 6 9 Key k Host Identifier h m-bit m-bit k| h = Lookup is O(log N) hops:
ICCCN 2006 11 September 2006 8
Resource Type 2 Resource Type 9 Administrative Domain 3 MDS
R-DHT Terminologies
2 | 3 9 | 3
Virtualize
Chord-based R-DHT Overlay
2 | 3 9 | 3
Organize 2 9 Host Keys
T3 = { 2 , 9 }
3 m-bit identifier space 2m-bit identifier space
ICCCN 2006 11 September 2006 9
Indexing Range-Query Optimizations
ICCCN 2006 11 September 2006 10
Basic lookup operation in DHT supports only exact queries lookup(3) to search resource type 3 Ongoing research for efficient multi-attribute range queries
Resource type is described by d attributes: cpu and ram A multi-attribute range query:
Find resources where { cpu= * , ‘1 GB’ ≤ ram ≤ ‘2 GB’}
ICCCN 2006 11 September 2006 11
d-attribute resource type d-dimensional attribute space Dimension : attribute Point
cpu ram
P3 P4 1 GB 2 GB
2-Dimensional Attribute Space
We index resources by their type (the d attributes)
ICCCN 2006 11 September 2006 12
Objective: efficient searching
Find { cpu= ‘P3’, ‘1 GB’ ≤ ram ≤ ‘2 GB’} Our approach, Midas, is based on d-to-one
Multi-dimensional indexing of resource types Search strategy to efficiently retrieve answers
ICCCN 2006 11 September 2006 13
Midas scheme to support multi-attribute range queries on
Study on the implication of data-item distribution to the
ICCCN 2006 11 September 2006 14
Indexing Range-Query Optimizations
ICCCN 2006 11 September 2006 15
d-to-d Mapping d-to-one Mapping Distributed Inverted Index 1-dimensional DHT d-dimensional DHT d-Attribute Resource Type Ring: Chord, Pastry Tree: Kademlia d-dimensional torus: CAN
ICCCN 2006 11 September 2006 16
Distributed Inverted Index MAAN (Cai et. al., 2004), CANDy (Bauer et. al., 2004),
d-to-d Mapping pSearch (Tang et. al., 2003), MURK (Ganesan et. al.,
d-to-one Mapping Squid (Schmidt et. al., 2003), CONE (Agrawal et. al.,
ICCCN 2006 11 September 2006 17
h(‘P3’) = 1 h(‘1 GB’) = 30
1 30 56
Resource R = { cpu= ‘P3’, ram= ‘ 1GB’}
store store
Order-Preserving Hashing Indexing: store each key to the DHT
ICCCN 2006 11 September 2006 18
h(‘P3’) = 1 h(‘1 GB’) = 30
1 30 56
Find resource where { cpu= ‘P3’, ram= ‘ 1GB’}
RS1 = σcpu = P3 RS2 = σram = 1 GB RS1 ∩ RS2
1 30 56
RS1 = σcpu = P3 RS2 = RS1 ∩ σram = 1 GB
1 30 56
RS = σcpu = P3 ∩ σram = 1 GB
ICCCN 2006 11 September 2006 19
cpu ram Resource type
Maps d-dimensional attribute space
to d-dimensional DHT (CAN)
With the exception of 2CAN,
which maps d-dimensional attribute space to 2d- dimensional CAN
Range query is modeled as a region
in d-dimensional space
Route a search request to any point
in the query region
Flood to the remaining points in the
region
ICCCN 2006 11 September 2006 20
hash(sparc, 4 GB) = 10 hash(P3, 1 GB) = 3 8 48 56 3 10
Map point in d-dimensional space to one-dimensional key Store keys to DHT For indexing resources and query processing
ICCCN 2006 11 September 2006 21
I ndexing Range-Query Optim izations
ICCCN 2006 11 September 2006 22
Resource r R-DHT Key k d-to-one mapping R-DHT mapping Query q { k} d-to-one mapping R-DHT lookups Search Keys I ndexing Query Processing
ICCCN 2006 11 September 2006 23
Hilbert
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 2 3 1 2 3
(3, 0) = 15
Hilbert SFC is an example of d-to-one mapping function
ICCCN 2006 11 September 2006 24
r = (cpu= ‘P3’, memory= ‘1 GB’) = (3, 0) k = 15 3 cpu memory 1 5 nk,h = 15| h 1 5 | h Key k Host h Virtualization Organize S15
m-bit 2m-bit
ICCCN 2006 11 September 2006 25
Search keys = { 1 , 2, 13, 14} Result set = { } lookup(1) lookup(2) Search keys = { 2 , 13, 14} Result set = { 1} Search keys = { 1 3 , 14} Result set = { 1, 2}
1 2 3 1 2 14 13
Search keys = { } Result set = { 1, 2} S1 S2 S15 S3 lookup(13)
ICCCN 2006 11 September 2006 26
Indexing Range-Query Optimizations
ICCCN 2006 11 September 2006 27
Compare Midas on R-Chord and Chord Parameters m = 16-bit d = 3–4 K = 10,000–50,000
Keys follow normal distribution in d-dimensional
N = 25,000
Each administrative domain has 4–10 resource types
Query selectivity = 1% (of 2m)
ICCCN 2006 11 September 2006 28
Resiliency
: ability to locate available resources when FN nodes fail simultaneously (0 ≤ F ≤ 1)
Resources are not replicated (i.e. we are not looking at resource
availability)
With R-Chord as the underlying infrastructure, nearly all keys are
retrieved
Though no replication In Chord, without replication, # keys retrieved is affected by F
ICCCN 2006 11 September 2006 29
In R-DHT, node is responsible for only one key, i.e., its own
In conventional DHT, node is responsible for several keys
1 2
ICCCN 2006 11 September 2006 30
Size of d-dimensional space
Increasing d causes the
In Chord, cost is constant Query selectivity (size of
In R-Chord, cost is affected
ICCCN 2006 11 September 2006 31
Cost = # hops visited In Chord, cost is constant Query selectivity (size of query region) is constant Cost is determined by size of overlay network (N) In R-Chord, cost is affected by size of result set (which in turn, is affected by K)
individual lookup
ICCCN 2006 11 September 2006 32
Assume keys are uniformly distributed Query selectivity is 0 ≤ s ≤ 1 Result set contains pK keys 0 ≤ p ≤ 1 Conventional DHT visits sN nodes, i.e. all nodes responsible
R-DHT visits pK nodes, i.e. equals to # keys (# answers)
DHT sN nodes pK nodes R-DHT s p
ICCCN 2006 11 September 2006 33
Conventional DHT separates keys and resources At the end, still need to contact the administrative
1 2
Conventional DHT R-DHT
ICCCN 2006 11 September 2006 34
Metrics: lookup resiliency under churn R-Chord performs reasonably well, considering that its
When K is increased, R-Chord cannot effectively exploits
ICCCN 2006 11 September 2006 35
Indexing Range-Query Optimizations
ICCCN 2006 11 September 2006 36
Indexing
Query engine
R-DHT achieves high lookup resiliency without requiring
R-DHT query cost is due to a higher number of lookup
R-DHT is more suitable for large range queries with small
To improve query performance in R-DHT, allow selective data-