Data Data- -Centric Query in Sensor Networks Centric Query in - - PowerPoint PPT Presentation

data data centric query in sensor networks centric query
SMART_READER_LITE
LIVE PREVIEW

Data Data- -Centric Query in Sensor Networks Centric Query in - - PowerPoint PPT Presentation

Data Data- -Centric Query in Sensor Networks Centric Query in Sensor Networks Jie Gao Jie Gao Computer Science Department Stony Brook University 1 Papers Papers [Intanagonwiwat00] Chalermek Intanagonwiwat, Ramesh Govindan and Deborah


slide-1
SLIDE 1

Data Data-

  • Centric Query in Sensor Networks

Centric Query in Sensor Networks

Jie Gao

1

Jie Gao

Computer Science Department Stony Brook University

slide-2
SLIDE 2

Papers Papers

  • [Intanagonwiwat00] Chalermek Intanagonwiwat, Ramesh Govindan and

Deborah Estrin, Directed diffusion: A scalable and robust communication paradigm for sensor networks, MobiCOM '00. The first paper on data-centric routing in sensor networks. Data discovery relies on flooding the network.

  • [Ratnasamy02] Sylvia Ratnasamy, Li Yin, Fang Yu, Deborah Estrin,

Ramesh Govindan, Brad Karp, Scott Shenker, GHT: A Geographic Hash

2

Table for Data-Centric Storage, In First ACM International Workshop on Wireless Sensor Networks and Applications (WSNA) 2002. Hash data to geographical locations, for storage and retrieval.

  • [Braginsky02] David Braginsky, Deborah Estrin, Rumor routing algorithm

for sensor networks, 1st ACM workshop on Wireless Sensor Networks, 2002.

  • [Sarkar06] Rik Sarkar, Xianjin Zhu, Jie Gao, Double Rulings for Information

Brokerage in Sensor Networks, MobiCom06. Hash data to circles.

slide-3
SLIDE 3

Scenario I: tourists and animals Scenario I: tourists and animals

  • A sensor network in a zoo.
  • A tourist asks: where is the elephant?
  • So which sensor has the data about the elephant?

3

slide-4
SLIDE 4

Scenario II: location service Scenario II: location service

  • A missing part of routing with geographical or

virtual coordinates: how does the source know the location (or virtual coordinates) of the destination?

  • Location service: a brokerage service that answers

queries such as: where is the node with ID 23?

4

queries such as: where is the node with ID 23?

  • Geographical routing:
  • The source asks for the location of destination;
  • The source routes by using geographical routing.
  • Notice: chicken and egg problem.
slide-5
SLIDE 5

Data Data-

  • centric

centric

  • Traditional networks: routing is based on network ID

(e.g., IP addresses).

  • Sensor networks: communication abstractions are

based on data rather than node network addresses.

5

  • Data-centric routing

– Route to the node with the data the user wants.

  • Data-centric storage

– Store/sort the data by data type (elephant).

slide-6
SLIDE 6

Abstraction of data Abstraction of data-

  • centric routing

centric routing

  • Information producer/consumer problem.
  • Information producer.

– Can be anywhere in the network. – Dynamic, mobile.

6

– Dynamic, mobile. – Multiple producers generating data about the same data type.

  • Users = information consumer.

– Can be anywhere in the network. – Concurrent multiple consumers.

slide-7
SLIDE 7

Challenges Challenges

  • Information producers/consumers have no idea

about each other.

  • Yet we want them to find each other quickly.

7

  • Main approaches:
  • Push-based: producers do most of the work.
  • Pull-based: consumers actively search.
  • Push-pull: both producers/consumers search to

find each other.

slide-8
SLIDE 8

This class This class

  • Directed diffusion

– Pull-based

  • Geographical hash table

8

  • Rumor routing
  • Double rulings

– Push-pull – In-network storage

slide-9
SLIDE 9

Directed diffusion Directed diffusion

  • Data is named by attribute-value pairs.

9

  • Query is represented by interest.
slide-10
SLIDE 10

Interest dissemination Interest dissemination

  • A sensing task is disseminated in the network as an

interest for named data.

  • Interest is refreshed for robustness.

10

slide-11
SLIDE 11

Gradient establishment Gradient establishment

  • Each node caches a gradient for interest: which

specifies the data rate and duration.

11

slide-12
SLIDE 12

Data transmission Data transmission

  • Data is transmitted back to sink.
  • Multi-path can be adopted.
  • Good paths (low delay, more reliable ones) are

reinforced.

12

slide-13
SLIDE 13

Pros and Cons Pros and Cons

  • The first scheme for data-centric routing.
  • Pull-based approach.
  • Ok for streaming data type – the cost for

flooding is amortized.

13

flooding is amortized.

  • Flooding is expensive for infrequent queries, or

queries that only involve a small set of nodes.

slide-14
SLIDE 14

Distributed hash table (DHT) Distributed hash table (DHT)

  • For Bob and Alice to find each other.
  • “Lost and found”.
  • Basic idea: data-dependent rendezvous.

14

  • Use a content-based hash function

h h h h(elephant)=sensor #10.

  • All the sensors with elephants info send to #10.
  • All the tourists interested in elephants go to #10 to

fetch the information.

slide-15
SLIDE 15

Distributed hash table (DHT) Distributed hash table (DHT)

  • Originally proposed for Peer-to-Peer routing on

the Internet.

– E.g, Chord, Pastry, Tapastry, etc.

  • A data object is given a key.

15

  • A data object is given a key.
  • Each node saves a set of keys.
  • A routing algorithm allows any node to locate the
  • ne with an arbitrary key.
slide-16
SLIDE 16

Geographical hash table (GHT) Geographical hash table (GHT)

  • Assume nodes know their locations and do geo-routing.
  • The content-based hash function outputs a geographical

location: h h h h(elephant) = (14, 22).

  • Use geographical routing for information

producers/consumers to route to the rendezvous.

16

h h h h(elephant)

slide-17
SLIDE 17

Geographical hash table (GHT) Geographical hash table (GHT)

  • The content-based hash function

h h h h(elephant) = a geographical location (14, 22).

  • Use geographical routing for information

producers/consumers to route to the reservoir.

17

producers/consumers to route to the reservoir.

  • Two questions:
  • What if there is no sensor at location (14, 22)?
  • What if geographical routing gets stuck?
slide-18
SLIDE 18

Geographical hash table (GHT) Geographical hash table (GHT)

  • We route to location L=(14, 22) and

geographical routing finds out there is no way to (14, 22) by touring along a perimeter of a face and get back to where it started.

18

Home node: the one that is geographically closest to L. Home perimeter: the perimeter that geographical routing tours around.

slide-19
SLIDE 19

Geographical hash table (GHT) Geographical hash table (GHT)

  • We replicate elephant information on all the

nodes on the perimeter.

  • The query follows the same home perimeter and

retrieve the message.

19

Home node: the one that is geographically closest to L. Home perimeter: the perimeter that geographical routing tours around.

slide-20
SLIDE 20

GHT: maintenance GHT: maintenance

  • Home node periodically refresh replication by

sending a packet to the hashed location L.

  • If the timer of the replica times out, then a replica

node initiates a refresh.

20

slide-21
SLIDE 21

Hierarchical replication Hierarchical replication

  • To reduce bottleneck at the hash nodes

and improve data survivability under node failure

  • Hash location is replicated at each level of

21

  • Hash location is replicated at each level of

a quad tree.

slide-22
SLIDE 22

Geographical hash table (GHT) Geographical hash table (GHT)

  • Advantages:

– simple. – load balancing in storage.

  • Disadvantages:

22

– Not locality-sensitive. Consumer may travel far to fetch data even if the producer is close. – Fault tolerance? – Overload nodes on the boundary. – Nodes with popular data become bottleneck.

slide-23
SLIDE 23

Rumor routing Rumor routing

  • Producer: route along a line or random

walk, and leave data traces on the way.

  • Consumer: route along another line or

23

  • Consumer: route along another line or

random walk, hope to pick up the data.

slide-24
SLIDE 24

A geometric observation A geometric observation

  • Inside a circle, draw two random lines, what is

the probability that they intersect?

1

1

  • x

1-x

3 1 2 ) 1 ( = ⋅ −

  • dx

x x

24

slide-25
SLIDE 25

A geometric observation A geometric observation

  • Inside a circle, draw k random lines, what is the

probability that another random line intersects at least one of the k lines?

k k

k

=

− = 2 1 1 1 1 ) Pr(k

=

− = 3 2 1 3 1 1 1 ) Pr(

Pr(5)= 87% Pr(10)= 98%. Pr(logn)=1-O(1/n).

25

slide-26
SLIDE 26

Algorithm Basics Algorithm Basics

  • All nodes maintain a neighbor list.
  • Nodes also maintain a event table

– When it observes an event, the event is added with distance 0.

  • Agents

– Packets that carry local event info across the network. – Packets that carry local event info across the network. – Aggregate events as they go.

  • Agents do a random walk: among the 1-hop neighbors,

find one that is not visited recently.

26

slide-27
SLIDE 27

Examples Examples

27

slide-28
SLIDE 28

Simulation results Simulation results

  • N=3000-5000, randomly in 200 by 200 field, communication radius

is 5. diameter of the network is roughly 40.

  • A: # agents, La=agent TTL, Lq=query TTL.

A large TTL for agents and query

28

slide-29
SLIDE 29

Some thought about simulation results Some thought about simulation results

  • Random walk is not

necessarily straight.

  • Random walk on a graph:

move to a neighbor with probability 1/d, where d is the degree. i

  • Hitting time H(i, j): expected

number of steps to reach j if we start from node i.

  • Suppose the source is i, sink

is j, then the total number of hops of the two random walk before they intersect = H(i, j) approximately. j

29

slide-30
SLIDE 30

Some thought about simulation results Some thought about simulation results

  • For general graph the hitting

time is Θ(n3).

  • For complete graph the

hitting time is O(n).

  • The maximum hitting time

between any two nodes is at i between any two nodes is at least half of the expected number of steps before a random walk visits half of the nodes.

  • So there are two nodes such

that a random walk between them visits about Ω(n) nodes. j

Random walk on graphs, a survey, by Lovasz.

30

slide-31
SLIDE 31

Rumor routing Rumor routing

  • Producer curve and consumer curve

intersect with some probability.

  • Random walk can be expensive.

31

  • Idea: design producer curve and consumer

curve such that they always intersect.

slide-32
SLIDE 32

Double Rulings: extend GHT and rumor Double Rulings: extend GHT and rumor routing routing

  • Hash data to a 1-d curve, instead of a 0-d

point

  • Motivations for generalization

– Data delivery uses multi-hop routing

32

– Data delivery uses multi-hop routing

  • Leave information along route at no extra cost

– More flexible data retrieval

  • Easier to encounter a 1-d curve than a 0-d point
slide-33
SLIDE 33

Rectilinear Double Ruling Rectilinear Double Ruling

  • Rectilinear Double Ruling

– Producer stores data on horizontal lines – Consumer searches along vertical lines

33

vertical lines – Correctness : Every horizontal line intersects every vertical line – Distance sensitive: q finds p in time O(d), where d=|pq|.

slide-34
SLIDE 34

Spherical Double Rulings Scheme Spherical Double Rulings Scheme

  • Producer follows a circle to the hashed location

– Includes GHT as a sub-case – Allows a large variety of retrieval mechanisms

34

  • Improves on GHT

– Load balancing for popular data types – Distance sensitivity – Flexible data retrieval schemes improve system robustness

slide-35
SLIDE 35

Double Double Rulings Rulings on a Sphere

  • n a Sphere
  • Stereographic projection maps a projective

plane to a sphere

– Circles map to circles – May incur distortion

35

  • For a finite sensor field

– Can choose location and size of sphere such that distance distortion is bounded by 1+.

slide-36
SLIDE 36

Spherical Double Rulings Spherical Double Rulings

  • Any two great circles intersect

– Use great circles in place of vertical/horizontal lines

36

slide-37
SLIDE 37

Spherical Double Rulings Spherical Double Rulings

  • One major difference with rectilinear double

rulings:

– Infinitely many great circles through a point – A lot more flexibility

37

slide-38
SLIDE 38

Data Replication Data Replication

  • Data centric hash function h(Ti )=hi .
  • Producer p replicates data along the great

circle C(p, hi ) .

38

slide-39
SLIDE 39

Data Replication Data Replication

  • Different producers with the same data type

hash to different great circles, all passing through , and its antipodal point .

– Allow aggregation.

h

h

39

slide-40
SLIDE 40

Replication Curve Examples Replication Curve Examples

Producer 2

40

Hashed node Producer 1

GHT paths

Replication curve Antipode

slide-41
SLIDE 41

Data Retrieval Data Retrieval

  • Flexible retrieval rules
  • 1. GHT Style Retrieval
  • 2. Distance Sensitive Retrieval
  • 3. Aggregated Data Retrieval

41

  • 3. Aggregated Data Retrieval
  • 4. Full Power Data Retrieval
slide-42
SLIDE 42
  • 1. GHT Style Retrieval
  • 1. GHT Style Retrieval
  • GHT still works
  • Consumer q wants data Ti

h Consumer goes to hashed node h or its

42

Consumer goes to hashed node h or its antipodal, whichever is closer.

slide-43
SLIDE 43

2

  • 2. Distance Sensitive Retrieval

. Distance Sensitive Retrieval

  • Distance Sensitive : If producer is at

distance d from q, consumer should find data with cost O(d).

– Consumes less network resources

43

– Consumes less network resources – Users are likely to be more interested in immediate vicinity. – Lower delay --- Important in emergency response.

slide-44
SLIDE 44

2

  • 2. Distance Sensitive Retrieval

. Distance Sensitive Retrieval

  • Rotate the sphere so that hashed node is

at the north pole.

44

Retrieval along the latitude curve Replication along the longitude curve

d • π

π π π/2 |pq|=d If q is d away from p, the distance from q along latitude curve is d • π π π π/2.

slide-45
SLIDE 45

2

  • 2. Distance Sensitive Retrieval

. Distance Sensitive Retrieval

  • Distance Sensitive : If producer is at

distance d from q, consumer should find data with cost O(d).

q Consumer q follows the circle with fixed

45

  • Wrong direction ?

– Handled using a doubling technique – A random choice of direction works well in practice (we use this in simulations).

Consumer q follows the circle with fixed distance to the hashed location.

slide-46
SLIDE 46

2

  • 2. Distance Sensitive Retrieval

. Distance Sensitive Retrieval

Hashed node

46

Producer Consumer Retrieval curve Antipode

slide-47
SLIDE 47

3

  • 3. Aggregated Data Retrieval

. Aggregated Data Retrieval

  • Consumer wants data of several Data

Types {Ti }

– E.g., monkey & elephant detections.

h Follow a closed curve that separates h and its

47

– Correctness: Any closed cycle that separates hi from its antipodal intersects the producer curve. – Many such retrieval curves! more freedom for consumers and better load balancing.

Follow a closed curve that separates hi and its antipodal point, for each data type Ti

slide-48
SLIDE 48

3

  • 3. Aggregate Data Retrieval

. Aggregate Data Retrieval

Hashed node

48

Producer Antipode Consumer Retrieval curve

slide-49
SLIDE 49

4

  • 4. Full Power Data Retrieval

. Full Power Data Retrieval

  • Consumer wants all the data in the network

Follow a great circle, retrieve all data.

49

– Correctness : Any two great circles intersect – Many such great circles!

slide-50
SLIDE 50

4

  • 4. Full Power Data Retrieval

. Full Power Data Retrieval

Hashed node

50

Producer Antipode Consumer Great Circle Retrieval curve

slide-51
SLIDE 51

Local Data Recovery upon Node Local Data Recovery upon Node Failures Failures

  • When a group of nodes are destroyed,

All the data on those nodes are available on the boundary of destroyed region.

51

region.

slide-52
SLIDE 52

Local Data Recovery upon Node Local Data Recovery upon Node Failures Failures

52

Survived Data Replicas on the boundary

slide-53
SLIDE 53

Difference of spherical v.s. Rectilinear Difference of spherical v.s. Rectilinear double rulings? double rulings?

  • All lines are great circles that pass through

the point of infinity.

  • The point of infinity is the hash location!

53

slide-54
SLIDE 54

Implementation Implementation

  • How to forward data on a virtual curve ?

– Use “Geographic Greedy forwarding

  • n a Curve”

54

  • The question of density

– Is it always possible to forward ? – Simulation : A suitable 2-hop neighbor exists with high probability, for networks with avg degree 5.

slide-55
SLIDE 55

Simulation: Distance Sensitivity Simulation: Distance Sensitivity

GHT GLIDER scheme

4200 nodes with average degree 8 per node.

55

GLIDER based scheme : Q. Fang et. al. Landmark-based information storage and retrieval in sensor networks. INFOCOM 2006.

Distance Sensitivity of queries

Spherical Double Ruling

slide-56
SLIDE 56

Simulation: Storage/Retrieval Tradeoff Simulation: Storage/Retrieval Tradeoff

cost

Nodes on replication curve can store the data or a pointer to the actual data.

56

Larger Replication Interval Decreasing Storage Cost Increasing Consumer co

Replication Interval

slide-57
SLIDE 57

Simulation: Simulation: Storage/Retrieval Tradeoff Storage/Retrieval Tradeoff

More storage, Lower retrieval cost.

Replication

  • nly on the

57

  • nly on the

hashed node and antipode.

slide-58
SLIDE 58

Simulation: Load Balancing Simulation: Load Balancing

h a node

500 consumers querying a popular data item

58

Double Ruling GHT Load Distribution

Number of messages through

slide-59
SLIDE 59

Discussion on double rulings Discussion on double rulings

  • Design for irregular shaped sensor field.
  • Landmark-based double rulings

– Example: 2 landmarks, red curve – d1+d2=const, blue curve – |d1-d2|=const

59

d1+d2=const, blue curve – |d1-d2|=const

  • Use sensor data

– Gradient – Iso-contours

slide-60
SLIDE 60

Discussion Discussion

  • Data collection by mobile data mules.

– Physically move along any retrieval curve.

Advanced hashing schemes.

60

  • Advanced hashing schemes.

– E.g., similar data types are placed nearby.