Enabling Signal Processing over Stream Data
Mi Milos Nikolic*, University of Oxford Badrish Chandramouli, Microsoft Research Jonathan Goldstein, Microsoft Research
*Work performed during internship at MSR
Enabling Signal Processing over Stream Data Mi Milos Nikolic * , - - PowerPoint PPT Presentation
Enabling Signal Processing over Stream Data Mi Milos Nikolic * , University of Oxford Badrish Chandramouli, Microsoft Research Jonathan Goldstein, Microsoft Research * Work performed during internship at MSR Signals in Streams Lots of
Mi Milos Nikolic*, University of Oxford Badrish Chandramouli, Microsoft Research Jonathan Goldstein, Microsoft Research
*Work performed during internship at MSR
M Group-by ID U Union
ID Time Value 0 0:42:19 67 1 0:42:22 80 2 0:42:22 85 0 0:42:23 69 2 0:42:24 85
Remove noise Interpolate missing data Find periodicity Discard invalid data Correlate live data w/ history
σ
⋈
DSP DSP
σ
⋈
2
Which tools to use to build such apps?
Data processing expert Digital signal processing expert
Engines: stream engines, DBMS, MPP systems Data model: (tempo)-relational Language: declarative (SQL, LINQ, functional) Scenarios: real-time, offline, progressive Engines: MATLAB, R Data model: array Language: imperative (array languages, C) Scenarios: mostly offline, real-time
3
Equally-spaced samples stored in array
FFT ➞ user-defined function ➞ IFFT
x[n]
x2
y[n]
x0 x1 y0 y1 y2
Per device
+ +
4
x2
+ +
x0 x1 y0 y1 y2
R STREAM PROCESSING SYSTEM
5
(relational with time)
(e.g., grouped computation)
Trill: Fast Streaming Analytics Engine DSP Library
[VLDB 2014 paper]
7
Logical time
e1
e2
e3 e4 e5
Tempo-Relational Model Relational Model
t1 t2 t3 t4
snapshots
INPUT Q = COUNT(*)
4
Logical time
1
1
1 1 2 1 2
OUTPUT
Q Q
8
struct SensorReading { long SensorId; long Time; double Value; }
var str = Network.ToStream(e => e.Time);
var query = str.Where(e => e.Value < 100) .Select(e => e.Value)
query.Subscribe(e => Console.Write(e)); // write results to console
9
Time Input events e1
e2
e3 e4 e5 Time Aggregated events 1
1
1 1 2 1 2
STREAMABLE SIGNALSTREAMABLE
var signal = stream.Where(e => e.Value < 100).Count()
STREAMS SIGNALS
left.Join(right, (l, r) => l + r) Type-safe operations
10
Time Input events misaligned missing
30 60 90 120 150 180 210
Time Output events
30 60 90 120 150 180 210
interpolated
var uniformSignal = signal.Sample(30, 0, ip => ip.Linear(60)); Interpolation window
STREAMS SIGNALS UNIFORM
11
Time
var s = uniformSignal.Window(5,3).FFT()…
Window = 5 samples Hop = 3 samples
12
var query = uniformSignal .Window(512, 256, w => w.FFT().Select(a => f(a)).IFFT(), a => a.Sum()) ) Uniform signal Uniform signal
UNWIN AGG
FFT f IFFT
WIN
13
copying with hopping windows
computation
OLD NEW Window Hop
FFT f IFFT
14
corresponding to a distinct grouping key
var q = signal .Map(s => s.Select(e => e.Value), e => e.SensorId) .Reduce(s => s.Window(512, 256, w => w.FFT().Select(a => f(a)).IFFT(), a => a.Sum()))
15
2 4 6 8 10 12 128 256 512 1024 2048 WINDOW SIZE
TrillDSP WaveScope MATLAB R
Window ➞ FFT ➞ Unwindow
RUNNING TIME (secs)
Pre-loaded datasets in memory Pure DSP task
Comparable to best DSP tools
16
4 8 16 32 64 128 256 230 179 128 76 25 HOP SIZE TrillDSP (1 core) MATLAB SparkR (16 cores) SciDB-R (16 cores)
Per sensor: Windowed FFT ➞ Function ➞ Inverse FFT ➞ Unwindow
NORMALIZED TIME TO TRILLDSP ON 16 CORES
Pre-loaded datasets in memory
Up to 2 OOM faster than others Performance benefits from:
group-aware DSP windowing
17
18
Up to 2 OOM faster than systems integrated w/ R