Ryan Newton,
Ryan Newton , Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel - - PowerPoint PPT Presentation
Ryan Newton , Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel - - PowerPoint PPT Presentation
Ryan Newton , Sivan Toledo, Lewis Girod, Hari Balakrishnan, Samuel Madden Example Application: Locating Marmots + 2 Gothic, CO deployment August 2007 Voxnet Platform 2x PXA255, 64MB RAM, 8GB Flash, 802.11B, Mica2 supervisor,
+
- Gothic, CO deployment August 2007
- Voxnet Platform
- 2x PXA255, 64MB RAM, 8GB Flash,
802.11B, Mica2 supervisor, Li+ battery, Charge controller
- Sensors: 4x48KHz audio, 3-axis
accel, GPS, Internal temp
Example Application: Locating Marmots
2
with Lewis Girod & UCLA Blumstein Lab
+
Animal localization
We target sensing applications
3
Pothole detection Computer Vision Pipeline leak detection EEG Seizure detection Speaker identification
+Heterogeneous Platforms
4
Low power sensors weak cpu/radio
Smartphones medium cpu, strong radio
Router weak cpu, strong radio Linux microserver
JavaME Symbian Brew iPhone SDK Android TinyOS Java C++ Python Contiki
Mix and Match!
+Contributions
5
Results Sensor source(s)
Network Boundary
+Contributions
6
Sensor source(s) Results
+Contributions
7
Sensor source(s) Results
Compile & Load Compile & Load
Contributions
- First broadly portable
sensenet programming
- Partitioning algorithm
- Optimize CPU/radio
tradeoff even if app doesn’t “fit”
+Architecture
8
Dataflow graph:
- perators containing
code in portable intermediate language Partitioner
ANSI C NesC/TinyOS JavaME
Backend CodeGen
Wishbone Sample data (for profiling)
+Targeting TinyOS
9
( , )
Execute!
tstart tend time
WaveScope: TinyOS:
iterate x in S { f(); for(i=…) { … } g(); }
- 16 bit microcontroller
- 10K RAM
- No mem. protection
- No threads
Task granularity, messaging model
f() for () {…} g()
Profile-directed Cooperative Multitasking:
msg1 msg2 msg3
Tasks
Same goal as Protothreads
+Profiling Streams and Operators
Every sensor source is
paired with sample data
Includes timing info Measure rates,
execution times
Separately: profile
network channel in deployment environment
per-node send rate audioStream = IFPROF(readFile(“foo8kHz”, readSensor()))
10
3 ms 20 Kbps 27 Kbps
+State, Replication, and Pinning
11
Pinning Constraints
- All stateless ops:
unpinned
- Stateful replicated ops:
unpinned
- Stateful global ops:
pinned to server – don’t distribute!
+Problem Scenario
12
Embedded Node Server / Base Station
Problem Inputs
- profile data: net, cpu
- network channel capacity
Network Boundary 3 19 4 11 12 23 Network: CPU: 7
NP-Hard
+Partitioning Algorithm: Integer linear program formulation
Introduce variables where 0=server, 1=sensor Introduce variables where 1 = cut edge Enforce resource bounds where where Minimize objective function
13
fu {0,1} guv {0,1}
cpu < C
cpu = fu(computeu)
u
- net < N
net = guv(datauv)
uv Edges
- min( ◊cpu + net)
3 Parameters C, N,α
Proxy for Energy Tricky bit (see paper): Relating f and g while staying linear
+Evaluation: Two Applications
EEG-based seizure
- nset detection
Human speech detection/identification 1400 operators
14
cepstrals hamming FFT filtbank logs prefilt preemph source
+Observation: Relative cost varies by platform
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 source preemph hamming prefilt FFT filtBank logs cepstrals Fraction of total CPU cost Operator Mote N80 PC
15
Wishbone’s profiling visualizations (via graphviz) for four platforms
+Visualizing Profile Data: Bandwidth vs. Compute
10 100 1000 10000 100000 1e+06 s
- u
r c e p r e e m p h h a m m i n g p r e f i l t F F T f i l t B a n k l
- g
s c e p s t r a l s 10 20 30 40 50 Execution time of operator (microseconds) Bandwidth of cut (KBytes/Sec) Cumulative CPU Cost Bandwidth (Right-hand scale)
16
Cumulative CPU cost (red) Operators:
Reasonable cutpoints Processing reduces data quantity
+Optimal partitions across platforms
EEG Application (1 of 22 channels)
10 20 30 40 50 60 70 80 2 4 6 8 10 12 14 16 18 20 Number of operators in optimal node partition Input data rate as a multiple of 8 kHz TmoteSky/TinyOS NokiaN80/Java
Each line represents 2100 partioner-runs
17
+Speaker Detection: CPU performance
across partitions/platforms
Speaker Detection Application 0.001 0.01 0.1 1 10 100 1000 10000 source/1 filtbank/7 logs/8 cepstral/9 Handled input rate as multiple of 8 kHz Cutpoint / number of operators in node partition TinyOS JavaME iPhone VoxNet
18
Putting the pieces together:
- Cpu & net bounds
- ptimal partition (if exists)
- Partition est. throughput
- Binary search over rates
(aka cpu bounds) max possible throughput example: picks cutpoint after filtBank for speaker detection
+Groundtruth: Testbed deployment, 20 motes
1 2 3 4 5 source hamming FFT filtBank logs cepstral Detections per second Cutpoint 1 TMote + Basestation 20 TMote Network
How many detections can we actually get out of the network?
20 40 60 80 100 source hamming FFT filtBank logs cepstral Percent Cutpoint percent input events received percent network msgs successful goodput (product)
Compute/Bandwidth Tension (1 mote + basestation)
19
Best empirical cutpoint
+Related Work
Graph partitioning for scientific codes
balanced, heuristic – e.g. Zoltan
Task scheduling, commonly list scheduling Dynamic: Map-reduce, Condor, etc. Sensor network context: Tenet and Vango
Linear pipeline of operators Manual partition Run TinyOS code on both server and sensor
20
+
CONCLUSION
21
+Partitioning: Algorithm Runtime
Graph Preprocessing step
Merge vertices until all edge-weights are
monotonically decreasing.
Eliminates the majority of edges
Even without preprocessing,
8000 runs, partitioning the 1400-node EEG dataflow graph, with different CPU budget, took under 10 seconds 95% of the time. But there is a long tail… luckily ILP solvers
produce approximate solutions as well!
0.1 1 10 100 1000 Seconds Time to discover optimal Time to prove optimal
22
+Motivating Example
1 2
5 4 1
1 2
5 4 1
1 2
5 4 1
1 2
5 4 1
1 2
5 4 1
1 2
5 4 1 budget = 2 budget = 3 budget = 4
bandwidth = 8 bandwidth = 6 bandwidth = 5
Unstable optimal partition. Flips between horizontal and vertical partition.
23