Data- -Centric Query in Sensor Networks II Centric Query in Sensor - - PowerPoint PPT Presentation

data centric query in sensor networks ii centric query in
SMART_READER_LITE
LIVE PREVIEW

Data- -Centric Query in Sensor Networks II Centric Query in Sensor - - PowerPoint PPT Presentation

Data- -Centric Query in Sensor Networks II Centric Query in Sensor Networks II Data Jie Gao Computer Science Department Stony Brook University 1 Papers Papers Rik Sarkar, Xianjin Zhu, Jie Gao, Double Rulings for Information Brokerage in


slide-1
SLIDE 1

1

Data Data-

  • Centric Query in Sensor Networks II

Centric Query in Sensor Networks II

Jie Gao

Computer Science Department Stony Brook University

slide-2
SLIDE 2

2

Papers Papers

  • Rik Sarkar, Xianjin Zhu, Jie Gao, Double Rulings for

Information Brokerage in Sensor Networks, MobiCom’06.

  • Qing Fang, Jie Gao, Leonidas J. Guibas, Landmark-

Based Information Storage and Retrieval in Sensor Networks, INFOCOM'06.

slide-3
SLIDE 3

3

Problem: Information Brokerage Problem: Information Brokerage

  • Information producers and information consumers need to find

each-other. – Tourists in a park looking for animals, but sensors with animals in range do not know where tourists are.

  • Challenges

– Content based search – Spatial/Temporal separation – Limited network resources

  • Easy solution : Flood

– Inefficient

slide-4
SLIDE 4

4

Geographic Hash Tables Geographic Hash Tables

  • Data centric hashing.

– Hashed node forms rendezvous – Enables brokerage

  • Pros

– Simple, works without flooding

  • Cons

– Nodes near hashed location become bottleneck. – Not distance-sensitive. Nearby producer and consumer may hash to far away nodes.

slide-5
SLIDE 5

5

Another approach: Double Rulings Another approach: Double Rulings

  • Hash data to a 1-d curve, instead of a 0-d

point

  • Motivations for generalization

– Data delivery uses multi-hop routing

  • Leave information along route at no extra cost

– More flexible data retrieval

  • Easier to encounter a 1-d curve than a 0-d point
slide-6
SLIDE 6

6

Simple Double Ruling Simple Double Ruling

  • Rectilinear Double Ruling

– Producer stores data on horizontal lines – Consumer searches along vertical lines – Correctness : Every horizontal line intersects every vertical line – Distance sensitive: q finds p in time O(d), where d=|pq|.

References: [Liu Huang Zhang 04], Rumor routing [Barginsky-Estrin 02], Quorum- based routing [Stojmenovic99].

slide-7
SLIDE 7

7

Spherical Double Rulings Scheme Spherical Double Rulings Scheme

  • Producer follows a circle to the hashed location

– Includes GHT as a sub-case – Allows a large variety of retrieval mechanisms

  • Improves on GHT

– Load balancing for popular data types – Distance sensitivity – Flexible data retrieval schemes improve system robustness

slide-8
SLIDE 8

8

Double Double Rulings Rulings on a Sphere

  • n a Sphere
  • Stereographic projection maps a projective

plane to a sphere

– Circles map to circles – May incur distortion

  • For a finite sensor field

– Can choose location and size of sphere such that distance distortion is bounded by 1+ε.

slide-9
SLIDE 9

9

Spherical Double Rulings Spherical Double Rulings

  • Any two great circles intersect

– Use great circles in place of vertical/horizontal lines

slide-10
SLIDE 10

10

Spherical Double Rulings Spherical Double Rulings

  • One major difference with rectilinear double

rulings:

– Infinitely many great circles through a point – A lot more flexibility

slide-11
SLIDE 11

11

Data Replication Data Replication

  • Data centric hash function h(Ti )=hi .
  • Producer p replicates data along the great

circle C(p, hi ) .

slide-12
SLIDE 12

12

Data Replication Data Replication

  • Different producers with the same data type

hash to different great circles, all passing through , and its antipodal point .

– Allow aggregation.

h

h

slide-13
SLIDE 13

13

Replication Curve Examples Replication Curve Examples

Hashed node Producer 1 Producer 2

GHT paths

Replication curve Antipode

slide-14
SLIDE 14

14

Data Retrieval Data Retrieval

  • Flexible retrieval rules
  • 1. GHT Style Retrieval
  • 2. Distance Sensitive Retrieval
  • 3. Aggregated Data Retrieval
  • 4. Full Power Data Retrieval
slide-15
SLIDE 15

15

  • 1. GHT Style Retrieval
  • 1. GHT Style Retrieval
  • GHT still works
  • Consumer q wants data Ti

Consumer goes to hashed node h or its antipodal, whichever is closer. Consumer goes to hashed node h or its antipodal, whichever is closer.

slide-16
SLIDE 16

16

  • 2. Distance Sensitive Retrieval
  • 2. Distance Sensitive Retrieval
  • Distance Sensitive : If producer is at

distance d from q, consumer should find data with cost O(d).

– Consumes less network resources – Users are likely to be more interested in immediate vicinity. – Lower delay --- Important in emergency response.

slide-17
SLIDE 17

17

  • 2. Distance Sensitive Retrieval
  • 2. Distance Sensitive Retrieval
  • Rotate the sphere so that hashed node is

at the north pole.

Retrieval along the latitude curve Replication along the longitude curve

≤ ≤ ≤ ≤ d • π

π π π/2 |pq|=d If q is d away from p, the distance from q along latitude curve is ≤ ≤ ≤ ≤ d • π π π π/2.

slide-18
SLIDE 18

18

  • 2. Distance Sensitive Retrieval
  • 2. Distance Sensitive Retrieval
  • Distance Sensitive : If producer is at

distance d from q, consumer should find data with cost O(d).

  • Wrong direction ?

– Handled using a doubling technique – A random choice of direction works well in practice (we use this in simulations).

Consumer q follows the circle with fixed distance to the hashed location. Consumer q follows the circle with fixed distance to the hashed location.

slide-19
SLIDE 19

19

  • 2. Distance Sensitive Retrieval
  • 2. Distance Sensitive Retrieval

Producer Hashed node Consumer Retrieval curve Antipode

slide-20
SLIDE 20

20

  • 3. Aggregated Data Retrieval
  • 3. Aggregated Data Retrieval
  • Consumer wants data of several Data

Types {Ti }

– E.g., monkey & elephant detections.

– Correctness: Any closed cycle that separates hi from its antipodal intersects the producer curve. – Many such retrieval curves! more freedom for consumers and better load balancing.

Follow a closed curve that separates hi and its antipodal point, for each data type Ti Follow a closed curve that separates hi and its antipodal point, for each data type Ti

slide-21
SLIDE 21

21

  • 3. Aggregate Data Retrieval
  • 3. Aggregate Data Retrieval

Producer Antipode Hashed node Consumer Retrieval curve

slide-22
SLIDE 22

22

  • 4. Full Power Data Retrieval
  • 4. Full Power Data Retrieval
  • Consumer wants all the data in the network

– Correctness : Any two great circles intersect – Many such great circles!

Follow a great circle, retrieve all data. Follow a great circle, retrieve all data.

slide-23
SLIDE 23

23

  • 4. Full Power Data Retrieval
  • 4. Full Power Data Retrieval

Producer Antipode Hashed node Consumer Great Circle Retrieval curve

slide-24
SLIDE 24

24

Local Data Recovery upon Node Local Data Recovery upon Node Failures Failures

  • When a group of nodes are destroyed,

All the data on those nodes are available on the boundary of destroyed region. All the data on those nodes are available on the boundary of destroyed region.

slide-25
SLIDE 25

25

Local Data Recovery upon Node Local Data Recovery upon Node Failures Failures

Survived Data Replicas on the boundary

slide-26
SLIDE 26

26

Implementation Implementation

  • How to forward data on a virtual curve ?

– Use “Geographic Greedy forwarding

  • n a Curve”
  • The question of density

– Is it always possible to forward ? – Simulation : A suitable 2-hop neighbor exists with high probability, for networks with avg degree ≥5.

Badri Nath and D. Niculescu. Routing on a curve. SIGCOMM Comput. Commun. Rev., 2003.

slide-27
SLIDE 27

27

Simulation: Distance Sensitivity Simulation: Distance Sensitivity

  • INFOCOM !!"

Distance Sensitivity of queries

GHT Spherical Double Ruling GLIDER scheme

4200 nodes with average degree 8 per node.

slide-28
SLIDE 28

28

Simulation: Storage/Retrieval Tradeoff Simulation: Storage/Retrieval Tradeoff

Larger Replication Interval Decreasing Storage Cost Increasing Consumer cost

Nodes on replication curve can store the data or a pointer to the actual data.

Replication Interval

slide-29
SLIDE 29

29

Simulation: Simulation: Storage/Retrieval Tradeoff Storage/Retrieval Tradeoff

More storage, Lower retrieval cost.

Replication

  • nly on the

hashed node and antipode.

slide-30
SLIDE 30

30

Simulation: Load Balancing Simulation: Load Balancing

Double Ruling GHT Load Distribution

Number of messages through a node

500 consumers querying a popular data item

slide-31
SLIDE 31

31

Discussion Discussion

  • Data collection by mobile data mules.

– Physically move along any retrieval curve.

  • Advanced hashing schemes.

– E.g., similar data types are placed nearby.

  • Networks with holes.

– Require special care.

slide-32
SLIDE 32

32

When sensors are not regular When sensors are not regular… …

  • Double rulings on an irregular shape.

– Shape parameterization

  • Integrate double rulings with other

approaches.

slide-33
SLIDE 33

33

Two Two-

  • level brokerage structure

level brokerage structure

  • Recall GLIDER: landmark-based routing.
  • Partition the sensor field into tiles.

– GHT on the tiles. – Double rulings inside each tile.

slide-34
SLIDE 34

34

Combinatorial Delaunay graph Combinatorial Delaunay graph

  • Select landmarks.
  • Landmarks flood the

network.

  • Every node remember

its closest landmark – landmark Voronoi diagram.

  • Construct Combinatorial

Delaunay Triangulation (CDT) on landmarks

slide-35
SLIDE 35

35

Double Double-

  • ruling and Geometry

ruling and Geometry

  • In general, double-ruling requires

geometry of sensor layout.

  • Previous work use geographic

information.

  • The CDT captures spatial

adjacency information of a landmark with respect to its neighboring landmarks, hence enabling double-ruling at a local scale.

slide-36
SLIDE 36

36

  • Hashing on coarse data

types for structured data storage. Both producers and consumers

  • f the same content type follow

the shortest path tree to the hashed tile (the root of the tree). Consumers return once the data are retrieved, otherwise move on towards the hashed tile.

Large-sized Animals giraffes elephants …… DHT at a coarse data type level Stored in the same tile

hash to

At the CDT level: GHT on tiles At the CDT level: GHT on tiles

slide-37
SLIDE 37

37

Within Each Tile Within Each Tile – – Double

Double-

  • ruling in Transit Tiles

ruling in Transit Tiles

  • Routes formed by following shortest paths to guides
  • The two sets of curves always meet

An example by simulation

u consumer producer v x y Guides v, x, y are landmarks selected

according to a set of rules based on hashing and the CDT

slide-38
SLIDE 38

38

Double Double-

  • ruling in Hashed Tile

ruling in Hashed Tile

  • Producers and consumers

are guaranteed to meet by following the two sets of curves.

  • The consumers may not

need to reach the hashed tile to fetch the data as the data are available at some transit tiles.

u v x y producer

consumer

slide-39
SLIDE 39

39

Reducing Consumer Query Cost Reducing Consumer Query Cost – – Simpler Simpler Retrieval Route Retrieval Route

u v x y producer

Case I: meet in tile u

u v x y

Case II: will meet in tile x. But this shouldn’t happen if we follow shortest path on CDT.

consumer producer consumer

slide-40
SLIDE 40

40

Load Balancing within Each Tile Load Balancing within Each Tile

  • Data by different

producers are hashed along different routes depending on the entry points to the tile.

  • By passing through the

tile towards the common next tile (given by the shortest path on CDT), a consumer fetches all data on the same content type.

u v x y Giraffe1 Giraffe2 elephant1 consumer

slide-41
SLIDE 41

41

Reducing Producer Cost Reducing Producer Cost – – en route en route Data Data Aggregation Aggregation

  • Producers of the same

content type share the shortest path tree (on CDT) rooted at the hashed tile.

  • Data of the same type

can be aggregated

– Inside the tile if two producers share one – Inside the tile of their common ancestors

slide-42
SLIDE 42

42

Transmission Cost Comparison with GHT by Transmission Cost Comparison with GHT by Simulations Simulations

  • For the producers, the extra cost on the finger threes can offset by

the savings of multiple producers sharing common paths

  • Alleviates locality insensitivity problem for the consumers in GHT
slide-43
SLIDE 43

43

Locality Awareness Comparison with GHT by Locality Awareness Comparison with GHT by Simulations Simulations –

– Transmission Cost by Individual Node Transmission Cost by Individual Node

  • Scenario: one producer; all

nodes query for the producer data; one big hole in the network connectivity graph.

  • Note the y-scale in figure 1 is

twice of that in figure 2.

  • The total load is much lower

than using GHT.

  • The load is also more

balanced than using GHT.

  • 1. GHT
  • 2. Landmark-based
slide-44
SLIDE 44

44

Summary Summary

  • Double rulings: hash on a curve.
  • Make sure consumer curve hits producer

curve.

  • Major advantages: improve location-

sensitivity; improve load balancing and data robustness.

  • In practice, we can use combinations of

GHT and double rulings.

slide-45
SLIDE 45

45

Presentation on 11/7 Presentation on 11/7

  • [Avin04] Chen Avin, Carlos Brito, Efficient and Robust

Query Processing in Dynamic Environments Using Random Walk Techniques, IPSN’04.

  • [Silberstein06] Adam Silberstein, Kamesh Munagala,

Jun Yang, Energy Efficient Monitoring of Extreme Values in Sensor Networks, SigMOD’06.