Ryan Newton , Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel - - PowerPoint PPT Presentation

ryan newton sivan toledo lewis girod hari balakrishnan
SMART_READER_LITE
LIVE PREVIEW

Ryan Newton , Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel - - PowerPoint PPT Presentation

Ryan Newton , Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel Madden Example Application: Locating Marmots + 2 Gothic, CO deployment August 2007 Voxnet Platform 2x PXA255, 64MB RAM, 8GB Flash, 802.11B, Mica2 supervisor,


slide-1
SLIDE 1

 Ryan Newton,

Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel Madden

slide-2
SLIDE 2

+

  • Gothic, CO deployment August 2007
  • Voxnet Platform
  • 2x PXA255, 64MB RAM, 8GB Flash,

802.11B, Mica2 supervisor, Li+ battery, Charge controller

  • Sensors: 4x48KHz audio, 3-axis

accel, GPS, Internal temp

Example Application: Locating Marmots

2

with Lewis Girod & UCLA Blumstein Lab

slide-3
SLIDE 3

+

Animal localization

We target sensing applications

3

Pothole detection Computer Vision Pipeline leak detection EEG Seizure detection Speaker identification

slide-4
SLIDE 4

+Heterogeneous Platforms

4

Low power sensors weak cpu/radio

Smartphones medium cpu, strong radio

Router weak cpu, strong radio Linux microserver

JavaME Symbian Brew iPhone SDK Android TinyOS Java C++ Python Contiki

Mix and Match!

slide-5
SLIDE 5

+Contributions

5

Results Sensor source(s)

Network Boundary

slide-6
SLIDE 6

+Contributions

6

Sensor source(s) Results

slide-7
SLIDE 7

+Contributions

7

Sensor source(s) Results

Compile & Load Compile & Load

Contributions

  • First broadly portable

sensenet programming

  • Partitioning algorithm
  • Optimize CPU/radio

tradeoff even if app doesn’t “fit”

slide-8
SLIDE 8

+Architecture

8

Dataflow graph:

  • perators containing

code in portable intermediate language Partitioner

ANSI C NesC/TinyOS JavaME

Backend CodeGen

Wishbone Sample data (for profiling)

slide-9
SLIDE 9

+Targeting TinyOS

9

( , )

Execute!

tstart tend time

WaveScope: TinyOS:

iterate x in S { f(); for(i=…) { … } g(); }

  • 16 bit microcontroller
  • 10K RAM
  • No mem. protection
  • No threads

Task granularity, messaging model

f() for () {…} g()

Profile-directed Cooperative Multitasking:

msg1 msg2 msg3

Tasks

Same goal as Protothreads

slide-10
SLIDE 10

+Profiling Streams and Operators

 Every sensor source is

paired with sample data

 Includes timing info  Measure rates,

execution times

 Separately: profile

network channel in deployment environment

  per-node send rate audioStream =
 IFPROF(readFile(“foo8kHz”, 
 readSensor()))

10

3 ms 20 Kbps 27 Kbps

slide-11
SLIDE 11

+State, Replication, and Pinning

11

Pinning Constraints

  • All stateless ops:

unpinned

  • Stateful replicated ops:

unpinned

  • Stateful global ops:

pinned to server – don’t distribute!

slide-12
SLIDE 12

+Problem Scenario

12

Embedded Node Server / Base Station

Problem Inputs

  • profile data: net, cpu
  • network channel capacity

Network Boundary 3 19 4 11 12 23 Network: CPU: 7

NP-Hard

slide-13
SLIDE 13

+Partitioning Algorithm: Integer linear program formulation

 Introduce variables where 0=server, 1=sensor  Introduce variables where 1 = cut edge  Enforce resource bounds  where  where  Minimize objective function

13

fu {0,1} guv {0,1}

cpu < C

cpu = fu(computeu)

u

  • net < N

net = guv(datauv)

uv Edges

  • min( ◊cpu + net)

3 Parameters C, N,α

Proxy for Energy Tricky bit (see paper): Relating f and g while staying linear

slide-14
SLIDE 14

+Evaluation: Two Applications

EEG-based seizure

  • nset detection

Human speech detection/identification 1400 operators

14

cepstrals hamming FFT filtbank logs prefilt preemph source

slide-15
SLIDE 15

+Observation: Relative cost varies by platform

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 source preemph hamming prefilt FFT filtBank logs cepstrals Fraction of total CPU cost Operator Mote N80 PC

15

Wishbone’s profiling visualizations (via graphviz) for four platforms

slide-16
SLIDE 16

+Visualizing Profile Data: Bandwidth vs. Compute

10 100 1000 10000 100000 1e+06 s

  • u

r c e p r e e m p h h a m m i n g p r e f i l t F F T f i l t B a n k l

  • g

s c e p s t r a l s 10 20 30 40 50 Execution time of operator (microseconds) Bandwidth of cut (KBytes/Sec) Cumulative CPU Cost Bandwidth (Right-hand scale)

16

Cumulative CPU cost (red) Operators:

Reasonable cutpoints Processing reduces data quantity

slide-17
SLIDE 17

+Optimal partitions across platforms

EEG Application (1 of 22 channels)

10 20 30 40 50 60 70 80 2 4 6 8 10 12 14 16 18 20 Number of operators in optimal node partition Input data rate as a multiple of 8 kHz TmoteSky/TinyOS NokiaN80/Java

Each line represents 2100 partioner-runs

17

slide-18
SLIDE 18

+Speaker Detection: CPU performance

across partitions/platforms

Speaker Detection Application 0.001 0.01 0.1 1 10 100 1000 10000 source/1 filtbank/7 logs/8 cepstral/9 Handled input rate as multiple of 8 kHz Cutpoint / number of operators in node partition TinyOS JavaME iPhone VoxNet

18

Putting the pieces together:

  • Cpu & net bounds 
  • ptimal partition (if exists)
  • Partition  est. throughput
  • Binary search over rates

(aka cpu bounds)  max possible throughput example: picks cutpoint after filtBank for speaker detection

slide-19
SLIDE 19

+Groundtruth: Testbed deployment, 20 motes

1 2 3 4 5 source hamming FFT filtBank logs cepstral Detections per second Cutpoint 1 TMote + Basestation 20 TMote Network

How many detections can we actually get out of the network?

20 40 60 80 100 source hamming FFT filtBank logs cepstral Percent Cutpoint percent input events received percent network msgs successful goodput (product)

Compute/Bandwidth Tension (1 mote + basestation)

19

Best empirical cutpoint

slide-20
SLIDE 20

+Related Work

 Graph partitioning for scientific codes

 balanced, heuristic – e.g. Zoltan

 Task scheduling, commonly list scheduling  Dynamic: Map-reduce, Condor, etc.  Sensor network context: Tenet and Vango

 Linear pipeline of operators  Manual partition  Run TinyOS code on both server and sensor

20

slide-21
SLIDE 21

+

CONCLUSION

21

slide-22
SLIDE 22

+Partitioning: Algorithm Runtime

 Graph Preprocessing step

 Merge vertices until all edge-weights are

monotonically decreasing.

 Eliminates the majority of edges

 Even without preprocessing,

 8000 runs,  partitioning the 1400-node EEG dataflow graph,  with different CPU budget,  took under 10 seconds 95% of the time.  But there is a long tail… luckily ILP solvers

produce approximate solutions as well!

0.1 1 10 100 1000 Seconds Time to discover optimal Time to prove optimal

22

slide-23
SLIDE 23

+Motivating Example

1 2

5 4 1

1 2

5 4 1

1 2

5 4 1

1 2

5 4 1

1 2

5 4 1

1 2

5 4 1 budget = 2 budget = 3 budget = 4

bandwidth = 8 bandwidth = 6 bandwidth = 5

Unstable optimal partition. Flips between horizontal and vertical partition.

23