Enabling Signal Processing over Stream Data Mi Milos Nikolic * , - - PowerPoint PPT Presentation

enabling signal processing over stream data
SMART_READER_LITE
LIVE PREVIEW

Enabling Signal Processing over Stream Data Mi Milos Nikolic * , - - PowerPoint PPT Presentation

Enabling Signal Processing over Stream Data Mi Milos Nikolic * , University of Oxford Badrish Chandramouli, Microsoft Research Jonathan Goldstein, Microsoft Research * Work performed during internship at MSR Signals in Streams Lots of


slide-1
SLIDE 1

Enabling Signal Processing over Stream Data

Mi Milos Nikolic*, University of Oxford Badrish Chandramouli, Microsoft Research Jonathan Goldstein, Microsoft Research

*Work performed during internship at MSR

slide-2
SLIDE 2

Signals in Streams

  • Lots of “signals” in stream data
  • Internet-of-things devices, app telemetry (e.g., ad clicks)
  • IoT workflows combine relational & signal logic
  • Ex: Real-time app

M Group-by ID U Union

ID Time Value 0 0:42:19 67 1 0:42:22 80 2 0:42:22 85 0 0:42:23 69 2 0:42:24 85

Remove noise Interpolate missing data Find periodicity Discard invalid data Correlate live data w/ history

σ

DSP DSP

σ

2

Which tools to use to build such apps?

slide-3
SLIDE 3

Data processing expert Digital signal processing expert

Engines: stream engines, DBMS, MPP systems Data model: (tempo)-relational Language: declarative (SQL, LINQ, functional) Scenarios: real-time, offline, progressive Engines: MATLAB, R Data model: array Language: imperative (array languages, C) Scenarios: mostly offline, real-time

3

How to reconcile two worlds? Our solution:

  • high-performance (2 OOM faster)
  • one query language
  • familiar abstractions to both worlds
slide-4
SLIDE 4

Typical DSP Workflow

Equally-spaced samples stored in array

  • 1. Window
  • window size & hop size
  • 2. Per window: pipeline DSP ops
  • array to array
  • Example: spectral analysis

FFT ➞ user-defined function ➞ IFFT

  • 3. Unwindow
  • sum overlapping segments

x[n]

x2

y[n]

x0 x1 y0 y1 y2

Per device

+ +

4

slide-5
SLIDE 5

Loose Systems Integration

Stream Processing Engine + R

  • Stream engine for relational queries
  • Per-group computation, windowing, joins, etc.
  • R for highly-optimized DSP operations
  • Problem: impedance mismatch
  • High communication overhead (up to 95%)
  • Impractical for real-time analysis
  • Disparate query languages

x2

+ +

x0 x1 y0 y1 y2

R STREAM PROCESSING SYSTEM

5

slide-6
SLIDE 6
  • Performance
  • 2-4 OOM faster than today’s SPE
  • Query model
  • Based on temporal query model

(relational with time)

  • Real-time, offline, progressive queries
  • Language integration
  • Built as .NET library
  • Works with arbitrary C# data-types
  • Unified query model
  • Non-uniform & uniform signals
  • Type-safe mix of stream & signal operators
  • Array-based extensibility framework
  • DSP operator writer sees arrays
  • Supports incremental computation
  • “Walled garden” on top of Trill
  • No changes in data model
  • Inherits Trill’s efficient processing capability

(e.g., grouped computation)

TRILL DSP

Trill: Fast Streaming Analytics Engine DSP Library

[VLDB 2014 paper]

7

slide-7
SLIDE 7

Tempo-Relational Model

  • Uniformly represents offline and online datasets as stream data

Logical time

e1

e2

e3 e4 e5

Tempo-Relational Model Relational Model

t1 t2 t3 t4

snapshots

INPUT Q = COUNT(*)

4

Logical time

1

1

1 1 2 1 2

OUTPUT

Q Q

8

slide-8
SLIDE 8

Trill Example (Simplified)

  • Define event data-type in C#

struct SensorReading { long SensorId; long Time; double Value; }

  • Define ingress

var str = Network.ToStream(e => e.Time);

  • Write query (in C# app)

var query = str.Where(e => e.Value < 100) .Select(e => e.Value)

  • Subscribe to result

query.Subscribe(e => Console.Write(e)); // write results to console

9

slide-9
SLIDE 9

Signal = stream w/o overlapping events

Time Input events e1

e2

e3 e4 e5 Time Aggregated events 1

1

1 1 2 1 2

STREAMABLE SIGNALSTREAMABLE

var signal = stream.Where(e => e.Value < 100).Count()

STREAMS SIGNALS

  • Transition to signal domain
  • E.g., result of an aggregate query
  • Using stream operators to build signal operators
  • E.g., adding two signals as a temporal join of two streams

left.Join(right, (l, r) => l + r) Type-safe operations

10

slide-10
SLIDE 10

Uniformly-sampled signals

  • Sampling with interpolation

Time Input events misaligned missing

30 60 90 120 150 180 210

Time Output events

30 60 90 120 150 180 210

interpolated

var uniformSignal = signal.Sample(30, 0, ip => ip.Linear(60)); Interpolation window

STREAMS SIGNALS UNIFORM

11

slide-11
SLIDE 11

Bringing Array Abstractions to DSP Users

  • Initial idea: Window & Unwindow sample operators
  • Window() creates a stream of arrays
  • Unwindow() projects arrays back in time
  • Performance problems
  • Creates dependencies between window semantics and system performance
  • No data sharing across overlapping arrays
  • Unclear language semantics
  • e.g., stream of arrays: is it a signal or not?

Time

var s = uniformSignal.Window(5,3).FFT()…

Window = 5 samples Hop = 3 samples

12

slide-12
SLIDE 12
  • Expose arrays only inside the windowing operator

Windowing Operator for DSP Users

var query = uniformSignal .Window(512, 256, w => w.FFT().Select(a => f(a)).IFFT(), a => a.Sum()) ) Uniform signal Uniform signal

UNWIN AGG

FFT f IFFT

WIN

  • DSP pipeline & arrays instantiated only once ➞ better data management

13

slide-13
SLIDE 13

User-Defined Operator Framework

  • DSP experts write array-array operators
  • Matches their expectations
  • Allows optimized array-based logic (e.g., SIMD)
  • Incremental DSP operators
  • Framework uses circular arrays to avoid data

copying with hopping windows

  • New & old data available for incremental

computation

OLD NEW Window Hop

FFT f IFFT

14

slide-14
SLIDE 14

Grouped Computation

  • Group-aware operators
  • Online processing of intertwined signals
  • One state per each group
  • E.g., interpolator keeps a history of samples for each group
  • Streaming MapReduce in Trill
  • Parallel execution on each sub-stream

corresponding to a distinct grouping key

var q = signal .Map(s => s.Select(e => e.Value), e => e.SensorId) .Reduce(s => s.Window(512, 256, w => w.FFT().Select(a => f(a)).IFFT(), a => a.Sum()))

15

slide-15
SLIDE 15

Performance: FFT with tumbling window

2 4 6 8 10 12 128 256 512 1024 2048 WINDOW SIZE

TrillDSP WaveScope MATLAB R

Window ➞ FFT ➞ Unwindow

RUNNING TIME (secs)

Pre-loaded datasets in memory Pure DSP task

  • TrillDSP uses FFTW library

Comparable to best DSP tools

16

slide-16
SLIDE 16

4 8 16 32 64 128 256 230 179 128 76 25 HOP SIZE TrillDSP (1 core) MATLAB SparkR (16 cores) SciDB-R (16 cores)

Performance: Grouping + DSP

Per sensor: Windowed FFT ➞ Function ➞ Inverse FFT ➞ Unwindow

NORMALIZED TIME TO TRILLDSP ON 16 CORES

Pre-loaded datasets in memory

  • 100 groups in stream

Up to 2 OOM faster than others Performance benefits from:

  • Efficient group processing,

group-aware DSP windowing

  • Using circular arrays to manage
  • verlapping windows
  • TrillDSP uses FFTW library

17

slide-17
SLIDE 17

Conclusion

  • Apps mix relational & signal logic
  • Per device: find periodicity in signals, interpolate missing data, recover noisy data
  • Different data models: relational vs. array
  • Existing query processors integrated with R
  • Impedance mismatch ➞ high performance overhead ➞ not suitable for real-time
  • TrillDSP = Relational processing + Signal processing
  • Unified query model for relational and signal data, for both real-time and offline
  • Gives users the view they are comfortable with
  • Avoids impedance mismatch between components

18

Up to 2 OOM faster than systems integrated w/ R