Creating Probabilistic Databases from Imprecise Time-Series Data - PowerPoint PPT Presentation

Creating Probabilistic Databases from Imprecise Time-Series Data Saket Sathe, Hoyoung Jeung, Karl Aberer EPFL, Switzerland 13th April, 2011 S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 1 / 15

Outline raw_values Probability time x y distribution p(R) 1 1.1 2.3 ? showing Alice’s 2 1.3 2.1 : : : position : : : prob_view y room 1 room 2 time room probability time = 1 1 1 0.5 1 2 0.1 1 3 0.3 3 σ area 1 4 0.1 room 4 as a reasonable 2 1 0.2 boundary x room 2 2 2 0.4 room 1 μ y 2 3 0.1 time = 2 2 4 0.3 p ( R ) dR room 4 room 3 room4 ∩ 3 σ area x S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 2 / 15

Outline raw_values Probability time x y distribution p(R) 1 1.1 2.3 ? showing Alice’s 2 1.3 2.1 : : : position : : : prob_view y room 1 room 2 time room probability time = 1 1 1 0.5 1 2 0.1 1 3 0.3 3 σ area 1 4 0.1 room 4 as a reasonable 2 1 0.2 boundary x room 2 2 2 0.4 room 1 μ y 2 3 0.1 time = 2 2 4 0.3 p ( R ) dR room 4 room 3 room4 ∩ 3 σ area x Dynamic Density Metrics Approximating Gaussian distributions using σ –cache Measure of Quality Parameter setting under provable guarantees Efficiently creating probabilistic views Experiments S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 2 / 15

Problem Setting  H S p t (R t )  1 t values r t-1 r t time t-H-1 t-1 t Dynamic Density Metric Given S H t − 1 , the dynamic density metric infers time-dependent probability distributions p t ( R t ) at time t , where R t is a random variable associated with r t .  S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 3 / 15  σ 

    GARCH Metric p t (R t =r t ) ˆ p t (R t ) ~ N(r t , σ t ˆ ˆ 2 ) p t (R t =r t ) H S  1 t ˆ r t values r t time t-H-1 t-1 t r t is modeled using an ARMA model ˆ σ 2 ˆ t is modeled using a GARCH model σ 2 Thus p t ( R t ) is a N (ˆ r t , ˆ t ) . We refer to this approach as ARMA-GARCH S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 4 / 15

Quality of Dynamic Density Metrics σ 2 ˆ r t ˆ t ARMA-GARCH ARMA GARCH Uniform Thresholding (UT) ARMA u (user-specified) Variable Thresholding (VT) ARMA sample variance of S H t − 1 Kalman-GARCH Kalman Filter GARCH S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 5 / 15

Quality of Dynamic Density Metrics σ 2 r t ˆ ˆ t ARMA-GARCH ARMA GARCH Uniform Thresholding (UT) ARMA u (user-specified) sample variance of S H Variable Thresholding (VT) ARMA t − 1 Kalman-GARCH Kalman Filter GARCH Problem: The true density ˆ p t ( R t ) is not observable S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 5 / 15

Quality of Dynamic Density Metrics σ 2 ˆ r t ˆ t ARMA-GARCH ARMA GARCH Uniform Thresholding (UT) ARMA u (user-specified) sample variance of S H Variable Thresholding (VT) ARMA t − 1 Kalman-GARCH Kalman Filter GARCH Indirect Method Suppose p 1 ( R 1 ) , . . . , p T ( R T ) are the inferred densities and let z t = P ( R t ≤ r t ) then z t is uniformly distributed between (0 , 1) when p t ( R t ) = ˆ p t ( R t ) [Deibold et. al.]. � 1 � � � d { U Z ( z ) , Q Z ( z ) } = ( U Z ( x ) − Q Z ( x )) 2 , (1) � x =0 where U Z ( z ) is the ideal uniform cdf between (0 , 1) and Q Z ( z ) is the observed cdf of z t . We call d { U Z ( z ) , Q Z ( z ) } the density distance. S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 5 / 15

Probabilistic View Generation CREATE VIEW prob_view AS DENSITY r OVER t OMEGA delta=2, n=2 FROM raw_values WHERE t >= 1 AND t <= 3 Framework dynamic density metrics probabilistic view r = 10.2 generation query t = 2 user Ω―View builder r t Ω Λ ω 1 [2:4] r 1 0.50 ˆ σ ˆ t r r ω 2 [0:2] 0.01 sensor 1 4.2 4.0 0.3 ω 1 [4:6] r 2 0.23 2 5.9 6.0 3.2 ω 2 [2:4] 0.08 σ ― cache 3 7.1 7.0 2.9 ω 1 [5:7] r 3 0.25 4 7.9 7.7 0.2 ω 2 [3:5] 0.16 raw_values p t ( R t ) prob_view S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 6 / 15

Probabilistic View Generation CREATE VIEW prob_view AS DENSITY r OVER t OMEGA delta=2, n=2 FROM raw_values WHERE t >= 1 AND t <= 3 Problem: Large computational cost when time interval and n are large and ∆ is small (finer granularity) Framework dynamic density metrics probabilistic view r = 10.2 generation query t = 2 user Ω―View builder r t Λ Ω ω 1 [2:4] σ r 1 0.50 ˆ ˆ t r r ω 2 [0:2] 0.01 sensor 1 4.2 4.0 0.3 ω 1 [4:6] 0.23 r 2 2 5.9 6.0 3.2 ω 2 [2:4] 0.08 3 7.1 7.0 2.9 σ ― cache r 3 ω 1 [5:7] 0.25 4 7.9 7.7 0.2 ω 2 [3:5] 0.16 raw_values p t ( R t ) prob_view S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 7 / 15

Probabilistic View Generation CREATE VIEW prob_view AS DENSITY r OVER t OMEGA delta=2, n=2 FROM raw_values WHERE t >= 1 AND t <= 3 Idea: Cache and reuse computation of probability values from earlier times dynamic density metrics Framework probabilistic view r = 10.2 generation query t = 2 user Ω―View builder r t Ω Λ ω 1 [2:4] r 1 0.50 ˆ σ ˆ t r r ω 2 [0:2] 0.01 sensor 1 4.2 4.0 0.3 ω 1 [4:6] r 2 0.23 2 5.9 6.0 3.2 ω 2 [2:4] 0.08 σ ― cache 3 7.1 7.0 2.9 ω 1 [5:7] r 3 0.25 4 7.9 7.7 0.2 ω 2 [3:5] 0.16 raw_values p t ( R t ) prob_view S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 7 / 15

Constraint-Aware Caching σ 2 σ 2 Given: p t ( R t ) and p t ′ ( R t ′ ) are Gaussian with (ˆ r t , ˆ t ) and (ˆ r t ′ , ˆ t ′ ) Aim: Approximate values of p t ′ ( R t ′ ) by p t ( R t ) when t ′ > t S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 8 / 15

Constraint-Aware Caching σ 2 σ 2 Given: p t ( R t ) and p t ′ ( R t ′ ) are Gaussian with (ˆ r t , ˆ t ) and (ˆ r t ′ , ˆ t ′ ) Aim: Approximate values of p t ′ ( R t ′ ) by p t ( R t ) when t ′ > t Distance constraint guarantees that the maximum approximation error is upper bounded by the distance constraint when the cache is used Memory constraint guarantees that the cache does not use more memory than that specified by the memory constraint S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 8 / 15

Constraint-Aware Caching σ 2 σ 2 Given: p t ( R t ) and p t ′ ( R t ′ ) are Gaussian with (ˆ r t , ˆ t ) and (ˆ r t ′ , ˆ t ′ ) Aim: Approximate values of p t ′ ( R t ′ ) by p t ( R t ) when t ′ > t Distance constraint guarantees that the maximum approximation error is upper bounded by the distance constraint when the cache is used Memory constraint guarantees that the cache does not use more memory than that specified by the memory constraint ρ λ remains unchanged ˆ  ˆ 2 P ( R ; r , ) ˆ ˆ  2 t t t t P ( R ; r , ) Δ t ' t ' t ' t Δ a' b' a b ˆ ˆ r t' r t a'=r + λΔ b'=r + ( λ +1) Δ a=r + λΔ b=r + ( λ +1) Δ ˆ ˆ ˆ ˆ S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 8 / 15

Guaranteeing Distance Constraint We use the Hellinger distance denoted H [ · , · ] as a distance measure. 0 ≤ H ≤ 1 . Theorem: Distance Constraint Given a user-defined distance constraint H ′ , we guarantee that H [ p t ( R t ) , p t ′ ( R t ′ )] ≤ H ′ , if ˆ σ t ′ ≤ d s · ˆ σ t and ˆ σ t ′ > ˆ σ t where the parameter d s is chosen as any value satisfying, � 1 − H ′ 2 � 4 � 1 + 1 − d s ≤ . 1 − H ′ 2 � 2 � We call d s the ratio threshold. Example Suppose H ′ = 0 . 2 , then d s ≤ 1 . 5 Choose, say, d s = 1 . 4 then if ˆ σ t ′ σ t ≤ d s then H [ p t ( R t ) , p t ′ ( R t ′ )] ≤ 0 . 2 ˆ S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 9 / 15

ˆ       ˆ   ˆ   ˆ Δ Initializing the σ –cache Let max (ˆ σ t ) and min (ˆ σ t ) be the maximum and minimum standard deviations observed in a probabilistic view generation query σ t ) = d Q Compute Q , such that, max (ˆ s · min (ˆ σ t ) ⌈Q⌉ gives us the maximum number of distributions that we should cache S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 10 / 15

Initializing the σ –cache Let max (ˆ σ t ) and min (ˆ σ t ) be the maximum and minimum standard deviations observed in a probabilistic view generation query ˆ σ t ) = d Q Compute Q , such that, max (ˆ s · min (ˆ σ t ) ⌈Q⌉ gives us the maximum number of distributions that we should cache   - cached values    min  Q ˆ d ( ) s t cache memory   2 ˆ d m i n ( ) s t   ˆ ) 1 d m i n ( s t n Δ σ t ′ < d q +1 Find d q s · min (ˆ σ t ) such that d q s · min (ˆ σ t ) ≤ ˆ · min (ˆ σ t ) s S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 10 / 15

Creating Probabilistic Databases from Imprecise Time-Series Data - PowerPoint PPT Presentation

Creating Probabilistic Databases from Imprecise Time-Series Data Saket Sathe, Hoyoung Jeung, Karl Aberer EPFL, Switzerland 13th April, 2011 S. Sathe, H. Jeung, K. Aberer (2011) EPFL, Switzerland 1 / 15 Outline raw_values Probability time

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Imprecise Gaussian Discriminant Classification 11th International Symposium on Imprecise

Existence of Simple Tours through Imprecise Points Maarten L offler Center for Geometry,

Challenges on the use of Imprecise Prior for Imprecise Inference on Poisson Sampling Models Chel

Imprecise probabilistic models for inference in exponential families Erik Quaeghebeur Gert de

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Imprecise Probabilities as a Semantics for Intuitive Probabilistic Reasoning Harry Crane

Open-World Probabilistic Databases Guy Van den Broeck GCAI Oct 21, 2017 Overview 1. Why

Open-World Probabilistic Databases Guy Van den Broeck FLAIRS May 23, 2017 Overview 1. Why

Probabilistic Databases Guy Van den Broeck Scalable Uncertainty Management (SUM) Sep 21, 2016

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Systemic risk and financial regulations J on Dan elsson Systemic Risk Centre London

Application of GARCH: risk modelling Fredrik Armerin, Alexander Aurell May 2, 2016 OMXS30

Why ARMAX-GARCH Linear Models Additivity (cont-d) Successfully Describe Complex Nonlinear Taking

Volatility in Several Series The various xARCH models provide many ways to model volatility

Advanced Simulation - Lecture 2 Patrick Rebeschini January 17th, 2018 Patrick Rebeschini

Astroparticle Physics at the DUNE Experiment Ins Gil-Botella CIEMAT Madrid on behalf of

Augmenting simple models with machine learning Jim Savage Data Science Lead Lendable

I Am Not An Attorney What Well Cover: Land Trusts Corporations Asset Protection