Making Sense of Suppressions and Failures in Sensor Data: A - - PowerPoint PPT Presentation

making sense of suppressions and failures in sensor data
SMART_READER_LITE
LIVE PREVIEW

Making Sense of Suppressions and Failures in Sensor Data: A - - PowerPoint PPT Presentation

Making Sense of Suppressions and Failures in Sensor Data: A Bayesian Approach Adam Silberstein Jun Yang, Kamesh Munagala Yahoo! Research Duke CS Gavino Puggiono, Alan Gelfand Duke ISDS September 27, 2007 September 27, 2007 1 1


slide-1
SLIDE 1

September 27, 2007 Silberstein, VLDB 2007 1 September 27, 2007 1

Making Sense of Suppressions and Failures in Sensor Data: A Bayesian Approach

Adam Silberstein Jun Yang, Kamesh Munagala Yahoo! Research Duke CS Gavino Puggiono, Alan Gelfand Duke ISDS

slide-2
SLIDE 2

September 27, 2007 Silberstein, VLDB 2007 2 September 27, 2007 2

Introduction

  • What is a sensor network?

– A collection of nodes – Node components

  • Sensors (e.g. temperature)
  • Radio (wireless) communication
  • Battery power

Crossbow Mica2 WiSARD

slide-3
SLIDE 3

September 27, 2007 Silberstein, VLDB 2007 3 September 27, 2007 3

Duke Forest Deployment

slide-4
SLIDE 4

September 27, 2007 Silberstein, VLDB 2007 4 September 27, 2007 4

Getting All the Data

  • Scientists often want ALL the data!

– No aggregates (e.g. mean)

  • Continuous reporting

– Repeatedly transmit readings to root

  • Explicitly construct central DB and use

traditional processing techniques

  • Radio costs too high!

Cost to transmit a bit over radio ~1000 times more than to execute machine instruction

Push processing into network with suppression

slide-5
SLIDE 5

Outline

  • 1. Suppression
  • 2. Failure!
  • 3. Coping using redundancy
  • 4. BaySail
  • Inference of missing readings, parameters
slide-6
SLIDE 6

September 27, 2007 Silberstein, VLDB 2007 6 September 27, 2007 6

Suppression

  • Push-based communication

– Only report deviations from a model

  • Value-based Temporal Suppression

–model: tempt=temp(t – 1)

  • In practice, include error tolerance

if (curr_temp != last_sent_temp) { transmit(temp); last_sent_temp=curr_temp; }

slide-7
SLIDE 7

The Catch for Suppression

  • What about reports generated, but lost to failure?
  • For non-reported values, the base station cannot

distinguish failures from suppressions

September 27, 2007 Silberstein, VLDB 2007 7

y1, y2, y3, y4 y1, ?2, y3, y4 y1, ?2, y3, ?4

Environment Supp. Sensor Network Base Station Transmitted

May be a spatio-temporal suppresion scheme with intra-node communication

slide-8
SLIDE 8

September 27, 2007 Silberstein, VLDB 2007 8 September 27, 2007 8

Coping With Failure

  • Focus on simple temporal suppression
  • Learn ALL missing values

System-level acks + re-transmissions

  • Sender re-sends until

receiver returns acknowledgement Minimize chance report not received

Application-level redundancy

  • Augment existing

reports Minimize impact

  • f missing report

Two Coping Strategies

slide-9
SLIDE 9

September 27, 2007 Silberstein, VLDB 2007 9 September 27, 2007 9

Redundancy

  • Temporal Suppression with error tolerance

– Report only if reading changes beyond ε since last reported

  • 5 report types
  • Increasing payload, increasing info

Name Payload Addition Standard Node reading Counter Incrementing report number Timestamp Last n report times Timestamp D Last n report times + direction bits History last n times + readings

slide-10
SLIDE 10

TinyOS Implementation

  • Application-level Redundancy

– Simple to implement

  • 40-50 lines of additional code to a tutorial example
  • Lower-level redundancy

– Activate “acks” in MAC-layer code – Re-transmissions in application code

  • Failure Rates

– Tied to distance, clearance, battery, etc. – Independent over time – 30% failure rate with maximum 2 re-transmissions gives <3% effective failure rate

September 27, 2007 Silberstein, VLDB 2007 10

slide-11
SLIDE 11

– Temporal suppression with ε = 0.3, prediction = last reported – Actual: (x1, x2, x3, x4) = (2.5, 3.5, 3.7, 2.7) – Base station receives: (2.5, nothing, nothing, 2.7) – With Timestamp (r=1)

  • (2.5, failed, suppressed, 2.7)
  • |x2 – 2.5| > 0.3; |x3 – x2| ·

· · · 0.3; |2.7 – x2| > 0.3

– With Timestamp+Direction Bit (r=1)

  • (2.5, failed & increased, suppressed, 2.7 & decreased)
  • x2 – 2.5 > 0.3; –0.3 ·

· · · x3 – x2 · · · · 0.3; x2 – 2.7 > 0.3

– With Count

  • One suppression and one failure in x2 and x3; not sure which
  • A very hairy constraint!

Suppression-Aware Inference

  • Redundancy + knowledge of suppression scheme )

hard constraints on missing data

September 27, 2007 Silberstein, VLDB 2007 11

  • Posterior: p(Xmis, Θ

Θ Θ Θ|Xobs), with Xmis subject to constraints

slide-12
SLIDE 12

Using Redundancy

Silberstein, VLDB 2007 12

x2

x3

???

x2 x3 x2 x3 x2

Just data No knowledge

  • f suppression

Knowledge of suppression & Timestamps Bayesian, model-based

AR(1) with uncertain parameter

x2

x2 2 [2.2, 3.0] x

3

2 [ x

2

– . 3 , x

2

+ . 3 ]

Knowledge of suppression & Timestamps+ Direction Bits

x3 x2

x2 > 3.0 x

3

2 [ x

2

– . 3 , x

2

+ . 3 ]

x3 x2

BayBase BaySail BaySail

x3

slide-13
SLIDE 13

September 27, 2007 Silberstein, VLDB 2007 13 September 27, 2007 13

BaySail Key Features

  • 1. Estimates missing readings/parameters
  • 2. Bayesian provides posterior distributions,

not just single point estimates

  • 3. Missing data not generically missing
  • Constrain possible settings using suppression

scheme and redundancy

4. Computing posteriors is hard

  • Gibbs’ sampling iteratively generates samples
  • f reading time series and of each parameter
  • 5. Combine simple, low-cost in-network

reporting with efficient out-of-network inference

slide-14
SLIDE 14

September 27, 2007 Silberstein, VLDB 2007 14 September 27, 2007 14

BaySail Experimental Example

  • Simple model of soil moisture

– ys,t = ct + φ ys,t-1 + εs,t

  • ct is a series of known precipitations
  • φ 2 (0,1) controls how fast moisture escapes soil
  • Cov(Ys, t , Ys’, t’) = σ2 (φ|t – t’|/(1 – φ2)) exp(–τ ||s – s’||)
  • τ controls strength of spatial correlation over distance
  • Prior: 1/σ2 ~ Gamma, φ ~ U(0,1), τ ~ Gamma
  • Joint Posterior: p(Ymis, φ, σ2, τ | Yobs) subject to

constraints

slide-15
SLIDE 15

Why the Direction Bit?

  • TS gives OR constraints: |x2-x1| > ε

– Inefficient rejection sampling

  • TS+D gives linear constraint: x1 – x2 > ε

– Allows for more efficient sampling [Rodriguez-Yam et al. 04]

September 27, 2007 Silberstein, VLDB 2007 15

>100x improvement… the major reason for the direction bit!

slide-16
SLIDE 16

September 27, 2007 Silberstein, VLDB 2007 16 September 27, 2007 16

3 Missing Values Cluster

BayBase: Conditioning on model and endpoints BaySail: Conditioning on model, endpoints, and that missing values are suppressions

s s s

slide-17
SLIDE 17

September 27, 2007 Silberstein, VLDB 2007 17 September 27, 2007 17

Metrics

  • Compare posterior mean to actual?

– Mean misleading for bimodal distributions

  • High density regions (hdr)

– Given percentage x, return minimal length range(s) of values such that x% of sample’s probability density contained in range(s) – Ensure hdr covers actual reading x% of time 50% 90%

r1 r2 r3 r4

slide-18
SLIDE 18

September 27, 2007 Silberstein, VLDB 2007 18 September 27, 2007 18

Cost vs. HDR Interval

  • Parameters induce 60% suppression rate

– σ2 = 1.0, φ = 0.9, ε = 1.0

  • Failure rate 30%
  • 3 Schemes

– Samp(τ)

  • Fixed reporting every τ rounds

– Supp/TD(r)

  • Timestamp + direction for last r reports

– Supp/Ack(r)

  • Maximum r re-transmission attempts
slide-19
SLIDE 19

September 27, 2007 Silberstein, VLDB 2007 19 September 27, 2007 19

Readings Interval

BaySail demonstrates significant improvement

80% hdr

slide-20
SLIDE 20

September 27, 2007 Silberstein, VLDB 2007 20 September 27, 2007 20

Phi Interval

Choice has little effect for process parameter

80% hdr

slide-21
SLIDE 21

Spatial Inference

September 27, 2007 Silberstein, VLDB 2007 21

1 2 3 4 5 6 7 8 9 3x3 Grid 1 2 3 4 5 6 7 8 9

slide-22
SLIDE 22

Conclusion

  • Suppression is a viable technique only when

made robust to failure

  • BaySail combines low-cost in-network

redundancy with efficient out-of-network statistical inference

– Generates posteriors distributions on raw missing values and process parameters

  • Future Challenges

– Sophisticated spatio-temporal schemes

  • Failure on in-network constraints
  • Failure of model parameter transmission

– Storing query results

September 27, 2007 Silberstein, VLDB 2007 22