Summary Extraction on Data Streams in Embedded Systems Sebastian - - PowerPoint PPT Presentation

summary extraction on data streams in embedded systems
SMART_READER_LITE
LIVE PREVIEW

Summary Extraction on Data Streams in Embedded Systems Sebastian - - PowerPoint PPT Presentation

Summary Extraction on Data Streams in Embedded Systems Sebastian Buschj ager and Katharina Morik TU Dortmund University - Computer Science - Artificial Intelligence Group September 18, 2017 1 So... IoT hype?! 2016 Ericsson Maritime ICT


slide-1
SLIDE 1

Summary Extraction on Data Streams in Embedded Systems

Sebastian Buschj¨ ager and Katharina Morik

TU Dortmund University - Computer Science - Artificial Intelligence Group

September 18, 2017

1

slide-2
SLIDE 2

So... IoT hype?!

2016

Ericsson Maritime ICT connects over 350 cargo vessels on one freighter

Summary Extraction on Data Streams in Embedded Systems 2

slide-3
SLIDE 3

So... IoT hype?!

2016

Daimler Trucks has deployed 400000 trucks with 400 sensors each

Summary Extraction on Data Streams in Embedded Systems 3

slide-4
SLIDE 4

So... IoT hype?!

2016

Virgin Atlantic announces fleet of fully connected Boeing 787 machines and cargo

Summary Extraction on Data Streams in Embedded Systems 4

slide-5
SLIDE 5

IoT means large autonomous systems

Common intuition There will be more devices We will get more data Systems will become more autonomous

Summary Extraction on Data Streams in Embedded Systems 5

slide-6
SLIDE 6

IoT means large autonomous systems

Common intuition There will be more devices We will get more data Systems will become more autonomous Question What to do if something unexpected happens?

Summary Extraction on Data Streams in Embedded Systems 5

slide-7
SLIDE 7

Goal Monitor systems

Clear Nobody can monitor all the sensor data on the fly But To detect unexpected behavior we need to monitor all data

Summary Extraction on Data Streams in Embedded Systems 6

slide-8
SLIDE 8

Goal Monitor systems

Clear Nobody can monitor all the sensor data on the fly But To detect unexpected behavior we need to monitor all data Idea Compute summaries on the fly while sensor data is generated

Summary Extraction on Data Streams in Embedded Systems 6

slide-9
SLIDE 9

Goal Monitor systems

Then Human expert can inspect summaries Perform operations on summary etc.

Summary Extraction on Data Streams in Embedded Systems 7

slide-10
SLIDE 10

Goal Monitor systems

Then Human expert can inspect summaries Perform operations on summary etc. Constraint Different data types + theoretically sound

Summary Extraction on Data Streams in Embedded Systems 7

slide-11
SLIDE 11

Data summarization Some theory

Intuition Use set function f to measures expressiveness of summary S Goal max

S⊆V,|S|≤k f(S)

Summary Extraction on Data Streams in Embedded Systems 8

slide-12
SLIDE 12

Data summarization Some theory

Intuition Use set function f to measures expressiveness of summary S Goal max

S⊆V,|S|≤k f(S)

Gain Let f : V → R and let e ∈ V and S ⊆ V : ∆f(e|S) = f(S ∪ {e}) − f(S)

Summary Extraction on Data Streams in Embedded Systems 8

slide-13
SLIDE 13

Summarization Sieve-Streaming

Badanidiyuru et al. 2014 Sieve-Streaming Item e arrives one at a time Immediately decide if e should be added to summary

Summary Extraction on Data Streams in Embedded Systems 9

slide-14
SLIDE 14

Summarization Sieve-Streaming

Badanidiyuru et al. 2014 Sieve-Streaming Item e arrives one at a time Immediately decide if e should be added to summary Idea Introduce novelty threshold v. Add e if ∆f(e|S) > v

Summary Extraction on Data Streams in Embedded Systems 9

slide-15
SLIDE 15

Summarization Sieve-Streaming

Badanidiyuru et al. 2014 Sieve-Streaming Item e arrives one at a time Immediately decide if e should be added to summary Idea Introduce novelty threshold v. Add e if ∆f(e|S) > v Challenge What is the “optimal” v?

Summary Extraction on Data Streams in Embedded Systems 9

slide-16
SLIDE 16

Summarization Sieve-Streaming

Idea Manage multiple summaries i = 1, 2, 3 . . . with multiple vi → “sieve” out unimportant elements

Summary Extraction on Data Streams in Embedded Systems 10

slide-17
SLIDE 17

Summarization Sieve-Streaming

Idea Manage multiple summaries i = 1, 2, 3 . . . with multiple vi → “sieve” out unimportant elements By sumodularity vi ∈ [m, km] with m = maxe∈V f({e}) Then solution is 1

2 − ε approximation

Note This is independent from f

Summary Extraction on Data Streams in Embedded Systems 10

slide-18
SLIDE 18

Submodular maximization The right function

Question What submodular function f captures summarization? Herbrich et al. 2003 / Seeger 2004 Informative Vector Machine f(S) = 1 2 log det

  • I + σ−2ΣS
  • Summary Extraction on Data Streams in Embedded Systems

11

slide-19
SLIDE 19

Submodular maximization The right function

Question What submodular function f captures summarization? Herbrich et al. 2003 / Seeger 2004 Informative Vector Machine f(S) = 1 2 log det

  • I + σ−2ΣS
  • k × k identiy matrix

kernel matrix K = [k(ei, ej)]i,j

Summary Extraction on Data Streams in Embedded Systems 11

slide-20
SLIDE 20

IVM for data summarization

Since we know f, reduce interval! Note Assume k(ei, ei) = 1 Least-expressive summary All off-diagonal elements are 1

Summary Extraction on Data Streams in Embedded Systems 12

slide-21
SLIDE 21

IVM for data summarization

Since we know f, reduce interval! Note Assume k(ei, ei) = 1 Least-expressive summary All off-diagonal elements are 1 f(S) = 1 2 log det

  • I + σ−2ΣS
  • = 1

2 log det

  • I + σ−211T

Summary Extraction on Data Streams in Embedded Systems 12

slide-22
SLIDE 22

IVM for data summarization

Since we know f, reduce interval! Note Assume k(ei, ei) = 1 Least-expressive summary All off-diagonal elements are 1 f(S) = 1 2 log det

  • I + σ−2ΣS
  • = 1

2 log det

  • I + σ−211T

= 1 2 log

  • 1 + σ−21T 1
  • = 1

2 log

  • 1 + σ−2k
  • Summary Extraction on Data Streams in Embedded Systems

12

slide-23
SLIDE 23

IVM for data summarization

Since we know f, reduce interval! Note Assume k(ei, ei) = 1 Most-expressive summary All off-diagonal elements are 0

Summary Extraction on Data Streams in Embedded Systems 13

slide-24
SLIDE 24

IVM for data summarization

Since we know f, reduce interval! Note Assume k(ei, ei) = 1 Most-expressive summary All off-diagonal elements are 0 f(S) = 1 2 log det

  • I + σ−2ΣS
  • = 1

2 log det

  • I(1 + σ−2)
  • Summary Extraction on Data Streams in Embedded Systems

13

slide-25
SLIDE 25

IVM for data summarization

Since we know f, reduce interval! Note Assume k(ei, ei) = 1 Most-expressive summary All off-diagonal elements are 0 f(S) = 1 2 log det

  • I + σ−2ΣS
  • = 1

2 log det

  • I(1 + σ−2)
  • =

1 2 log

  • (1 + σ−2)k det (I)
  • = k

2 log

  • 1 + σ−2k
  • Summary Extraction on Data Streams in Embedded Systems

13

slide-26
SLIDE 26

Sieve-Streaming enhancements

Result Number of sieves reduced without performance loss

Summary Extraction on Data Streams in Embedded Systems 14

slide-27
SLIDE 27

Sieve-Streaming enhancements

Result Number of sieves reduced without performance loss Default vi ∈ [ 1

2 log(1 + σ−2), k 2 log(1 + σ−2)]

Reduced vi ∈ [ 1

2 log(1 + kσ−2), k 2 log(1 + σ−2)]

Summary Extraction on Data Streams in Embedded Systems 14

slide-28
SLIDE 28

Sieve-Streaming enhancements

Result Number of sieves reduced without performance loss Default vi ∈ [ 1

2 log(1 + σ−2), k 2 log(1 + σ−2)]

Reduced vi ∈ [ 1

2 log(1 + kσ−2), k 2 log(1 + σ−2)]

More improvements Reopen sieves once full Sieves with small threshold will quickly be full Save summary, and reopen sieve with larger threshold

Summary Extraction on Data Streams in Embedded Systems 14

slide-29
SLIDE 29

Sieve-Streaming enhancements

Result Number of sieves reduced without performance loss Default vi ∈ [ 1

2 log(1 + σ−2), k 2 log(1 + σ−2)]

Reduced vi ∈ [ 1

2 log(1 + kσ−2), k 2 log(1 + σ−2)]

More improvements Reopen sieves once full Sieves with small threshold will quickly be full Save summary, and reopen sieve with larger threshold ⇒ Increase utility value with same number of sieves

Summary Extraction on Data Streams in Embedded Systems 14

slide-30
SLIDE 30

Experiments Questions

Question 1 Are summaries with IVM really expressive?

Summary Extraction on Data Streams in Embedded Systems 15

slide-31
SLIDE 31

Experiments Questions

Question 1 Are summaries with IVM really expressive? → Summaries should contain “hidden” states of data → Extract summary of classification task → Then each class represents one “hidden” state

Summary Extraction on Data Streams in Embedded Systems 15

slide-32
SLIDE 32

Experiments Questions

Question 1 Are summaries with IVM really expressive? → Summaries should contain “hidden” states of data → Extract summary of classification task → Then each class represents one “hidden” state Question 2 How perform enhancements compared to default?

Summary Extraction on Data Streams in Embedded Systems 15

slide-33
SLIDE 33

Experiments Data

Synthetic data GMM with 4 dimensions and 4 classes. Use K = 10, . . . , 24, ε = 0.1, σ = 1, k(ei, ej) = exp −||ei−ej||2

2

10

  • UJIndoor Location

Predict (semantic) location, e.g. room number based on GPS. Use K = 80, . . . , 130, ε = 0.1, σ = 1, k(ei, ej) = exp −||ei−ej||2

2

0.005

  • MNIST

Handwritten digit recognition task. Use K = 8, . . . , 16, ε = 0.1, σ = 1, k(ei, ej) = exp −||ei−ej||2

2

784

  • Summary Extraction on Data Streams in Embedded Systems

16

slide-34
SLIDE 34

Experiments Results

1012141618202224 0.5 1

Recall synthtic data.

80 90100110120130 0.2 0.4 0.6 0.8 1

Recall UJIndoorLoc

8 10 12 14 16 0.2 0.4 0.6 0.8 1

Recall MNIST.

1012141618202224 3 4 5 6

Utility synthtic data.

80 90100110120130 40 50 60 70

Utility UJIndoorLoc

8 10 12 14 16 3 3.5 4 4.5

Utility MNIST.

1012141618202224 0.4 0.45 0.5

Runtime synthtic data.

80 90100110120130 1 1.5

Runtime UJIndoorLoc

8 10 12 14 16 0.6 0.8 1

Runtime MNIST.

Pure Reduced Dynamic Summary Extraction on Data Streams in Embedded Systems 17

slide-35
SLIDE 35

Outlook

Question 1 Are summaries with IVM really expressive? Yes! At least we have reasonable recall

Summary Extraction on Data Streams in Embedded Systems 18

slide-36
SLIDE 36

Outlook

Question 1 Are summaries with IVM really expressive? Yes! At least we have reasonable recall Question 2 How well perform enhancements compared to vanilla? Quite well! Computation decreased + utility and recall increased

Summary Extraction on Data Streams in Embedded Systems 18

slide-37
SLIDE 37

Outlook

Question 1 Are summaries with IVM really expressive? Yes! At least we have reasonable recall Question 2 How well perform enhancements compared to vanilla? Quite well! Computation decreased + utility and recall increased Next to come Better kernel functions? Streaming with concept drift? ⇒ Maybe “forget“ items? Use summaries for model learning?

Summary Extraction on Data Streams in Embedded Systems 18