data streams in higher dimensions
play

Data Streams in Higher Dimensions Michael Geilke, Andreas Karwath, - PowerPoint PPT Presentation

Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Michael Geilke, Andreas Karwath, and Stefan Kramer Johannes Gutenberg University Mainz, Germany September 22, 2016 Online Density Estimation of Heterogeneous Data


  1. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Michael Geilke, Andreas Karwath, and Stefan Kramer Johannes Gutenberg University Mainz, Germany September 22, 2016

  2. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Smart 2

  3. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Smart 3

  4. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Smart 4

  5. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions  1000 sensors  5 measurements per second  5 years Smart  more than 2 billion measurements  about 2 GBs of data 5

  6. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions  energy supplier  1 million households  about 2 PBs of data  constant update of patterns 6

  7. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions EDDO f Smart 7

  8. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions EDDO Inference f F Smart 8

  9. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions F Query1 Query2 Query3 Smart Knowledge Query: Return the probability distribution for sensors in the living room during the week days. 9

  10. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Weaknesses of EDDO 𝑔(𝑌 1 , … , 𝑌 𝑜 ) EDDO Inference f F 10

  11. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Weaknesses of EDDO 𝑜 f(𝑌 1 ) ∙ 𝑔 𝑌 𝑗 𝑌 1 , … , 𝑌 𝑗−1 EDDO Inference 𝑗=2 f F 11

  12. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Weaknesses of EDDO 𝑜 f(𝑌 1 ) ∙ 𝑔 𝑌 𝑗 𝑌 1 , … , 𝑌 𝑗−1 EDDO Inference 𝑗=2 f F 12

  13. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Weaknesses of EDDO 𝑜 f(𝑌 1 ) ∙ 𝑔 𝑌 𝑗 𝑌 1 , … , 𝑌 𝑗−1 EDDO Inference 𝑗=2 f F only for discrete random variables 13

  14. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Goals A density estimator that  estimates joint densities from data streams  is able to deal with heterogeneous data, and  and works for higher dimensional data. For density estimation, 100 variables is high dimensional. 14

  15. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Main Idea 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 𝑕 15

  16. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Main Idea 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 𝑕 16

  17. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark instance = ∈ ℝ 𝑜 𝑢𝑓𝑛𝑞𝑓𝑠𝑏𝑢𝑣𝑠𝑓 = 20, ℎ𝑣𝑛𝑗𝑒𝑗𝑢𝑧 = 50, … 17

  18. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark instance ℝ 𝑜 ∋ 𝑦 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) ∈ ℝ 𝑛 = (𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 ) 𝑤 18

  19. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) ∈ ℝ 𝑜 ℎ 𝑀 𝑦 𝐽 = 𝑦 = 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) (𝑦 = 𝑕 𝑤 𝑔 ) ∈𝐽 𝑦 ℝ 𝑜 ℝ 𝑛 19

  20. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance = ∈ ℝ 𝑜 𝑢𝑓𝑛𝑞𝑓𝑠𝑏𝑢𝑣𝑠𝑓 = 10, ℎ𝑣𝑛𝑗𝑒𝑗𝑢𝑧 = 80, … 20

  21. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance ∈ ℝ 𝑜 ℎ 𝑀 𝑦 𝑦 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) 21

  22. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance 𝑈 Σ −1 𝑦 𝑦 − 𝑤 − 𝑤 Mahalanobis distance: 22

  23. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) 23

  24. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) 24

  25. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance 𝑛 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) f(𝑊 1 ) ∙ 𝑔 𝑊 𝑗 𝑊 1 , … , 𝑊 𝑗−1 𝑗=2 25

  26. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) ℎ 𝑀 𝑦 = ℎ 𝑀 (𝑧 ) but 𝑦 ≠ 𝑧 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 𝑕 26

  27. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) ℎ 𝑀 𝑦 = ℎ 𝑀 (𝑧 ) but 𝑦 ≠ 𝑧 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 𝑕 (𝑦 = 𝑕 𝑤 𝑔 ) 𝑦 ∈𝐽 27

  28. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Choice of Landmarks Main idea:  theoretical foundation  landmarks are orthogonal to each other  if 𝑀 = d + 1, then consistent estimator  back translation by system of linear equations 28

  29. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Parameter Setting Datasets Parameters: Synthetic  𝜄 𝐷→𝑆 = 100 Gaussian mixtures  Euclidean norm Real-World 𝑀 ∈ 2, 3, 5, 10, 20  Covertype  𝑁 ∈ 0.1, 0.5, 1.0, 2.0, 5.0, 10.0 Electricity Letter Shuttle 29

  30. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: 𝑀 30

  31. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Mahalanobis (1 Gaussian) 31

  32. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Mahalanobis (10 Gaussians) 32

  33. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Parameter Setting 𝑀 depends on dimensionality of data   small 𝑁 partition the space better  but at some point too few instances per region 33

  34. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Performance Datasets oKDE: Synthetic  online Kernel Density Estimator Gaussian mixtures  for multi-variate densities Real-World  for continuous variables Covertype  by Kristan et al. (2011) Electricity Letter Shuttle 34

  35. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions electricity (9 attributes) shuttle (11 attributes) letter (17 attributes) covertype (54 attributes)

  36. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Conclusions  online density estimation in higher dimensions  heterogeneous data stream  theoretical foundation  comparable to the state of the art Future Work:  new strategies for landmarks selection  outlier detection  detection of emerging trends 36

  37. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Thank you for your attention 37

  38. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) 38

  39. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) ℎ 𝑀 𝑦 = ℎ 𝑀 (𝑧 ) but 𝑦 ≠ 𝑧 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 𝑕 𝑞 −∞ 𝑑𝑝𝑠𝑠 𝑘 𝑦 𝑘 𝑤 1 , … , 𝑤 𝑞 𝑒𝑦 𝑗+1 𝑒𝑦 𝑗+2 … 𝑒 𝑜 −∞ 39 𝑘=𝑗

  40. Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark instance ∈ ℝ 𝑜 ℎ 𝑀 𝑦 𝐽 = 𝑦 = 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend