Gaussian Model Trees for Traffic Imputation Sebastian Buschjger, - PowerPoint PPT Presentation

Artificial Intelligence Group Gaussian Model Trees for Traffic Imputation Sebastian Buschjäger, Thomas Liebig and Katharina Morik TU Dortmund University - Artifical Intelligence Group February 20, 2019 1 / 17

Artificial Intelligence Group Motivation: Smart Cities 2 / 17

Artificial Intelligence Group Motivation: Smart Cities Idea Distribute small devices across the entire city to monitor specific locations 3 / 17

Artificial Intelligence Group Motivation: Smart Cities Idea Distribute small devices across the entire city to monitor specific locations Design requirements 1. Sensing devices should be as small and as energy efficient as possible to minimize costs 2. Sensing devices should be low-priced to minimize initial investment costs 3. Data should not be processed globally to minimize communication and maximize privacy 4. Prediction models should be small, but accurate enough to be used on the sensing devices 5. The system should report possible sensor locations with respect to its accuracy. 3 / 17

Artificial Intelligence Group Traffic Imputation Our focus here Count the number of vehicles at a given coordinate (latitude / longitude) Formally Imputation problem, where we impute missing sensor values 4 / 17

Artificial Intelligence Group Traffic Imputation Our focus here Count the number of vehicles at a given coordinate (latitude / longitude) Formally Imputation problem, where we impute missing sensor values Popular method Gaussian Processes p ( y |D , � x ) ∼ N( f (� x ) , ·) with x , D) K (D) − 1 , � f (� x ) = � K (� y � 4 / 17

Artificial Intelligence Group Traffic Imputation Our focus here Count the number of vehicles at a given coordinate (latitude / longitude) Formally Imputation problem, where we impute missing sensor values Popular method Gaussian Processes p ( y |D , � x ) ∼ N( f (� x ) , ·) with Kernel vector [ k ( x , x 1 ) , . . . , k ( x , x N )] T Target vector [ y 1 , . . . , y N ] N x , D) K (D) − 1 , � f (� x ) = � K (� y � Kernel matrix including noise [ k ( x i , x j )] i , j + σ n I 4 / 17

Artificial Intelligence Group Traffic Imputation Our focus here Count the number of vehicles at a given coordinate (latitude / longitude) Formally Imputation problem, where we impute missing sensor values Popular method Gaussian Processes p ( y |D , � x ) ∼ N( f (� x ) , ·) with Kernel vector [ k ( x , x 1 ) , . . . , k ( x , x N )] T Target vector [ y 1 , . . . , y N ] N x , D) K (D) − 1 , � f (� x ) = � K (� y � Kernel matrix including noise [ k ( x i , x j )] i , j + σ n I Challenges ◮ GPs do not scale well, due to matrix inversion (runtime O( N 3 ) ) ◮ GPs do not have a traffic-flow model, e.g. by using map data 4 / 17

Artificial Intelligence Group State of the art GPs Scaleable GPs Well-studied problem with solutions utilizing subset of data points, sparse kernels, sparse approximation, implicit and explicit block structures, . . . Important for us Each local sensing device should execute one small expert model Deisenroth 2015 Distributed Gaussian Processes (DGP) Idea Factorize global likelihood into product of m individual likelihoods m � p ( y |D) ≈ β k p k ( y |D k ) k = 1 5 / 17

Artificial Intelligence Group State of the art GPs Scaleable GPs Well-studied problem with solutions utilizing subset of data points, sparse kernels, sparse approximation, implicit and explicit block structures, . . . Important for us Each local sensing device should execute one small expert model Deisenroth 2015 Distributed Gaussian Processes (DGP) Idea Factorize global likelihood into product of m individual likelihoods m � p ( y |D) ≈ β k p k ( y |D k ) k = 1 Expert weight Small GP with samples D k ⊂ D 5 / 17

Artificial Intelligence Group State of the art GPs Scaleable GPs Well-studied problem with solutions utilizing subset of data points, sparse kernels, sparse approximation, implicit and explicit block structures, . . . Important for us Each local sensing device should execute one small expert model Deisenroth 2015 Distributed Gaussian Processes (DGP) Idea Factorize global likelihood into product of m individual likelihoods m � p ( y |D) ≈ β k p k ( y |D k ) k = 1 Expert weight Small GP with samples D k ⊂ D Nice Problematic + p k ( y |D k ) are independent from − All experts need to be evaluated each other to compute p ( y |D) + D k can potentially be small − D k is randomly sampled 5 / 17

Artificial Intelligence Group Gaussian Model Trees: Key questions So far DGPs offer small expert models, which only require communication of local predictions But 1 Is there a better way to sample D k ? But 2 Can we get away without any communication at all? 6 / 17

Artificial Intelligence Group GP induction as loss minimization problem 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� 7 / 17

Artificial Intelligence Group GP induction as loss minimization problem Noise assumption from GP 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� Regularization: Norm of f in RKHS H MSE of GP model 7 / 17

Artificial Intelligence Group GP induction as loss minimization problem Noise assumption from GP 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� Regularization: Norm of f in RKHS H MSE of GP model Goal Decompose optimization problem into two independent problems. ◮ Let A ⊆ D denote a set of c inducing points. Let B = D \ A ◮ Assume k (� x i , � x j ) ≈ 0 for � x i ∈ A and � x j ∈ B 7 / 17

Artificial Intelligence Group GP induction as loss minimization problem Noise assumption from GP 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� Regularization: Norm of f in RKHS H MSE of GP model Goal Decompose optimization problem into two independent problems. ◮ Let A ⊆ D denote a set of c inducing points. Let B = D \ A ◮ Assume k (� x i , � x j ) ≈ 0 for � x i ∈ A and � x j ∈ B Then we can split the optimization problem into two problems 1 1 x ) � 2 + 2 || f A || 2 � arg min � y − f A (� H + 2 σ 2 f A ∈H , f B ∈H n x , y )∈A (� 1 1 x ) � 2 2 || f B || 2 � � y − f B (� H + 2 σ 2 n x , y )∈B (� 7 / 17

Artificial Intelligence Group GP induction as loss minimization problem Noise assumption from GP 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� Regularization: Norm of f in RKHS H MSE of GP model Goal Decompose optimization problem into two independent problems. ◮ Let A ⊆ D denote a set of c inducing points. Let B = D \ A ◮ Assume k (� x i , � x j ) ≈ 0 for � x i ∈ A and � x j ∈ B Then we can split the optimization problem into two problems 1 1 x ) � 2 + 2 || f A || 2 � arg min � y − f A (� H + 2 σ 2 f A ∈H , f B ∈H x , A) K (A) − 1 , � f (� x ) = � K (� y � n x , y )∈A (� 1 1 x ) � 2 2 || f B || 2 � � y − f B (� x , B) K (B) − 1 , � f (� x ) = � K (� y � H + 2 σ 2 n x , y )∈B (� 7 / 17

Artificial Intelligence Group Subset selection (1) Question How to find sets A and B ? 8 / 17

Artificial Intelligence Group Subset selection (1) Question How to find sets A and B ? x k x i x j x j 8 / 17

Artificial Intelligence Group Subset selection (1) Question How to find sets A and B ? x i x j Observation If kernel is stationary, then k (� x i , � x j ) ≈ 0 ⇒ k (� x i , � x k ) ≈ 0 for k (� x j , � x k ) ≈ 1. Thus Points � x j and � x k that are similar to each other, will have similar dissimilarity with � x i 8 / 17

Artificial Intelligence Group Subset selection (2) Thus It is enough to store a reference point for each set A and B . Conclusion We need to find reference points which are maximally dissimilar to each other 9 / 17

Artificial Intelligence Group Subset selection (2) Thus It is enough to store a reference point for each set A and B . Conclusion We need to find reference points which are maximally dissimilar to each other Idea Formulate another maximization problem 1 � k 11 k 12 = 1 � 2 log det 2 log ( k 11 · k 22 − k 12 · k 21 ) → max if k 12 = k 21 ≈ 0 k 21 k 22 9 / 17

Artificial Intelligence Group Subset selection (2) Thus It is enough to store a reference point for each set A and B . Conclusion We need to find reference points which are maximally dissimilar to each other Idea Formulate another maximization problem 1 � k 11 k 12 = 1 � 2 log det 2 log ( k 11 · k 22 − k 12 · k 21 ) → max if k 12 = k 21 ≈ 0 k 21 k 22 More formally 1 arg max 2 log det ( I + aK (A)) A⊂D , |A | = c 9 / 17

Artificial Intelligence Group Subset selection (2) Thus It is enough to store a reference point for each set A and B . Conclusion We need to find reference points which are maximally dissimilar to each other Idea Formulate another maximization problem 1 � k 11 k 12 = 1 � 2 log det 2 log ( k 11 · k 22 − k 12 · k 21 ) → max if k 12 = k 21 ≈ 0 k 21 k 22 More formally 1 arg max 2 log det ( I + aK (A)) A⊂D , |A | = c Still This is a very difficult problem, since we need to check all possible subsets of A ⊂ D Lawrence 2003 1 2 log det (I + aK (A)) is sub-modular 9 / 17

Gaussian Model Trees for Traffic Imputation Sebastian Buschjger, - PowerPoint PPT Presentation

Artificial Intelligence Group Gaussian Model Trees for Traffic Imputation Sebastian Buschjger, Thomas Liebig and Katharina Morik TU Dortmund University - Artifical Intelligence Group February 20, 2019 1 / 17 Artificial Intelligence Group

Imputation by Gaussian Copula Model with an Application to Incomplete Customer Satisfaction Data

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Consistent Variance Estimates for Multiple Multiple imputation Imputation in R MI alternative

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

Ausbil Investment Roadshow 2018 Rising volatility risks and opportunities May 2018

How To Design A Signature Talk: Part 1 How To Design Your Signature Talk: Part 1 Your Signature

Water and Environmental avp.aalto.fi Engineering Week 5: Entrepreneurship Program of the day

Taxes and Financing Decisions Jonathan Lewellen & Katharina Lewellen Overview Taxes and

Income and Consumption Taxes in Taiwan Jain-Rong Su Professor in Public Finance, National Taipei

RECURRENT KALMAN NETWORKS Factorized Inference in High-Dimensional Deep Feature Spaces Philipp

Distributional National Accounts for Uruguay 2009-2014 Falling inequality through the lens of

Engineering Big Data Solutions Audris Mockus Avaya Labs Research audris@avaya.com

Gaussian Model Trees for Traffic Imputation Sebastian Buschjger, - PowerPoint PPT Presentation

Artificial Intelligence Group Gaussian Model Trees for Traffic Imputation Sebastian Buschjger, Thomas Liebig and Katharina Morik TU Dortmund University - Artifical Intelligence Group February 20, 2019 1 / 17 Artificial Intelligence Group

Imputation by Gaussian Copula Model with an Application to Incomplete Customer Satisfaction Data

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Consistent Variance Estimates for Multiple Multiple imputation Imputation in R MI alternative

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

Ausbil Investment Roadshow 2018 Rising volatility risks and opportunities May 2018

How To Design A Signature Talk: Part 1 How To Design Your Signature Talk: Part 1 Your Signature

Water and Environmental avp.aalto.fi Engineering Week 5: Entrepreneurship Program of the day

Taxes and Financing Decisions Jonathan Lewellen &amp; Katharina Lewellen Overview Taxes and

Income and Consumption Taxes in Taiwan Jain-Rong Su Professor in Public Finance, National Taipei

RECURRENT KALMAN NETWORKS Factorized Inference in High-Dimensional Deep Feature Spaces Philipp

Distributional National Accounts for Uruguay 2009-2014 Falling inequality through the lens of

Engineering Big Data Solutions Audris Mockus Avaya Labs Research audris@avaya.com

Taxes and Financing Decisions Jonathan Lewellen & Katharina Lewellen Overview Taxes and