Data Streams in Higher Dimensions Michael Geilke, Andreas Karwath, - - PowerPoint PPT Presentation

data streams in higher dimensions
SMART_READER_LITE
LIVE PREVIEW

Data Streams in Higher Dimensions Michael Geilke, Andreas Karwath, - - PowerPoint PPT Presentation

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Michael Geilke, Andreas Karwath, and Stefan Kramer Johannes Gutenberg University Mainz, Germany September 22, 2016 Online Density Estimation of Heterogeneous Data


slide-1
SLIDE 1

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Michael Geilke, Andreas Karwath, and Stefan Kramer

Johannes Gutenberg University Mainz, Germany

September 22, 2016

slide-2
SLIDE 2

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

2

Smart

slide-3
SLIDE 3

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

3

Smart

slide-4
SLIDE 4

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

4

Smart

slide-5
SLIDE 5

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

5

Smart

  • 1000 sensors
  • 5 measurements per second
  • 5 years

 more than 2 billion measurements  about 2 GBs of data

slide-6
SLIDE 6

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

6

  • energy supplier
  • 1 million households

 about 2 PBs of data  constant update of patterns

slide-7
SLIDE 7

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

7

Smart

f EDDO

slide-8
SLIDE 8

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

8

f EDDO F Inference

Smart

slide-9
SLIDE 9

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

9

Smart

F Query3 Query1 Query2 Knowledge

Query: Return the probability distribution for sensors in the living room during the week days.

slide-10
SLIDE 10

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

10

𝑔(𝑌1, … , 𝑌𝑜)

f EDDO F Inference

Weaknesses of EDDO

slide-11
SLIDE 11

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

11

f(𝑌1) ∙ 𝑔 𝑌𝑗 𝑌1, … , 𝑌𝑗−1

𝑜 𝑗=2

f EDDO F Inference

Weaknesses of EDDO

slide-12
SLIDE 12

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

12

f(𝑌1) ∙ 𝑔 𝑌𝑗 𝑌1, … , 𝑌𝑗−1

𝑜 𝑗=2

f EDDO F Inference

Weaknesses of EDDO

slide-13
SLIDE 13

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

13

f(𝑌1) ∙ 𝑔 𝑌𝑗 𝑌1, … , 𝑌𝑗−1

𝑜 𝑗=2

f EDDO F Inference

Weaknesses of EDDO

  • nly for discrete random variables
slide-14
SLIDE 14

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

14

Goals

A density estimator that

  • estimates joint densities from data streams
  • is able to deal with heterogeneous data, and
  • and works for higher dimensional data.

For density estimation, 100 variables is high dimensional.

slide-15
SLIDE 15

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Main Idea

ℝ𝑜 ∋ 𝑦 𝑌1 × 𝑌2 × … × 𝑌𝑜 ∋ 𝑦 𝑤 ∈ ℝ𝑛 𝑔 𝑕 ℎ𝑀

15

slide-16
SLIDE 16

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Main Idea

ℝ𝑜 ∋ 𝑦 𝑌1 × 𝑌2 × … × 𝑌𝑜 ∋ 𝑦 𝑤 ∈ ℝ𝑛 𝑔 𝑕 ℎ𝑀

16

slide-17
SLIDE 17

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

17

Online Density Estimation using Representatives (RED)

landmark instance

𝑢𝑓𝑛𝑞𝑓𝑠𝑏𝑢𝑣𝑠𝑓 = 20, ℎ𝑣𝑛𝑗𝑒𝑗𝑢𝑧 = 50, … = ∈ ℝ𝑜

slide-18
SLIDE 18

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

18

Online Density Estimation using Representatives (RED)

ℝ𝑜 ∋ 𝑦 = (𝑦1, 𝑦2, … , 𝑦𝑜) 𝑤 = (𝑤1, 𝑤2, 𝑤3, 𝑤4) ∈ ℝ𝑛

landmark instance

slide-19
SLIDE 19

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

19

Online Density Estimation using Representatives (RED)

𝐽 = 𝑦 ∈ ℝ𝑜 ℎ𝑀 𝑦 = 𝑤 = (𝑤1, 𝑤2, 𝑤3, 𝑤4) 𝑕 𝑤 = 𝑔 (𝑦 )

𝑦 ∈𝐽

ℝ𝑜 ℝ𝑛

slide-20
SLIDE 20

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

20

Online Density Estimation using Representatives (RED)

landmark representative instance

𝑢𝑓𝑛𝑞𝑓𝑠𝑏𝑢𝑣𝑠𝑓 = 10, ℎ𝑣𝑛𝑗𝑒𝑗𝑢𝑧 = 80, … = ∈ ℝ𝑜

slide-21
SLIDE 21

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

21

Online Density Estimation using Representatives (RED)

landmark representative instance

𝑦 ∈ ℝ𝑜 ℎ𝑀 𝑦 = (𝑤1, 𝑤2, 𝑤3, 𝑤4)

slide-22
SLIDE 22

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

22

Online Density Estimation using Representatives (RED)

landmark representative instance

Mahalanobis distance: 𝑦 − 𝑤 𝑈 Σ−1 𝑦 − 𝑤

slide-23
SLIDE 23

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

23

Online Density Estimation using Representatives (RED)

landmark representative instance

𝑤 = (𝑤1, 𝑤2, 𝑤3, 𝑤4)

slide-24
SLIDE 24

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

24

Online Density Estimation using Representatives (RED)

landmark representative instance

𝑤 = (𝑤1, 𝑤2, 𝑤3, 𝑤4)

slide-25
SLIDE 25

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

25

Online Density Estimation using Representatives (RED)

landmark representative instance

f(𝑊

1) ∙ 𝑔 𝑊 𝑗 𝑊 1, … , 𝑊 𝑗−1 𝑛 𝑗=2

𝑤 = (𝑤1, 𝑤2, 𝑤3, 𝑤4)

slide-26
SLIDE 26

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Online Density Estimation using Representatives (RED)

26

ℝ𝑜 ∋ 𝑦 𝑤 ∈ ℝ𝑛 𝑕 ℎ𝑀 ℎ𝑀 𝑦 = ℎ𝑀(𝑧 ) but 𝑦 ≠ 𝑧 𝑔 𝑌1 × 𝑌2 × … × 𝑌𝑜 ∋ 𝑦

slide-27
SLIDE 27

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Online Density Estimation using Representatives (RED)

ℝ𝑜 ∋ 𝑦 𝑌1 × 𝑌2 × … × 𝑌𝑜 ∋ 𝑦 𝑤 ∈ ℝ𝑛 𝑔 𝑕 ℎ𝑀

27

𝑕 𝑤 = 𝑔 (𝑦 )

𝑦 ∈𝐽

ℎ𝑀 𝑦 = ℎ𝑀(𝑧 ) but 𝑦 ≠ 𝑧

slide-28
SLIDE 28

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Choice of Landmarks

28

Main idea:

  • theoretical foundation
  • landmarks are orthogonal to each other
  • if 𝑀 = d + 1, then consistent estimator
  • back translation by system of linear equations
slide-29
SLIDE 29

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Evaluation: Parameter Setting

Parameters:

  • 𝜄𝐷→𝑆 = 100
  • Euclidean norm
  • 𝑀 ∈ 2, 3, 5, 10, 20
  • 𝑁 ∈ 0.1, 0.5, 1.0, 2.0, 5.0, 10.0

Datasets Synthetic Gaussian mixtures Real-World Covertype Electricity Letter Shuttle

29

slide-30
SLIDE 30

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Evaluation: 𝑀

30

slide-31
SLIDE 31

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Evaluation: Mahalanobis (1 Gaussian)

31

slide-32
SLIDE 32

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Evaluation: Mahalanobis (10 Gaussians)

32

slide-33
SLIDE 33

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Evaluation: Parameter Setting

  • 𝑀 depends on dimensionality of data
  • small 𝑁 partition the space better
  • but at some point too few instances per region

33

slide-34
SLIDE 34

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Evaluation: Performance

  • KDE:
  • online Kernel Density Estimator
  • for multi-variate densities
  • for continuous variables
  • by Kristan et al. (2011)

Datasets Synthetic Gaussian mixtures Real-World Covertype Electricity Letter Shuttle

34

slide-35
SLIDE 35

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

electricity (9 attributes) shuttle (11 attributes) letter (17 attributes) covertype (54 attributes)

slide-36
SLIDE 36

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Conclusions

  • online density estimation in higher dimensions
  • heterogeneous data stream
  • theoretical foundation
  • comparable to the state of the art

Future Work:

  • new strategies for landmarks selection
  • outlier detection
  • detection of emerging trends

36

slide-37
SLIDE 37

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Thank you for your attention

37

slide-38
SLIDE 38

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Online Density Estimation using Representatives (RED)

38

slide-39
SLIDE 39

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

Online Density Estimation using Representatives (RED)

39

ℝ𝑜 ∋ 𝑦 𝑤 ∈ ℝ𝑛 𝑕 ℎ𝑀 ℎ𝑀 𝑦 = ℎ𝑀(𝑧 ) but 𝑦 ≠ 𝑧 𝑔 𝑌1 × 𝑌2 × … × 𝑌𝑜 ∋ 𝑦 𝑑𝑝𝑠𝑠

𝑘 𝑦𝑘 𝑤1, … , 𝑤𝑞 𝑞 𝑘=𝑗 −∞ −∞

𝑒𝑦𝑗+1𝑒𝑦𝑗+2 … 𝑒𝑜

slide-40
SLIDE 40

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions

40

Online Density Estimation using Representatives (RED)

𝐽 = 𝑦 ∈ ℝ𝑜 ℎ𝑀 𝑦 = 𝑤 = (𝑤1, 𝑤2, 𝑤3, 𝑤4)

landmark instance