Ordinary Least Squares for Histogram Data based on Wasserstein - PowerPoint PPT Presentation

Ordinary Least Squares for Histogram Data based on Wasserstein Distance Rosanna Verde Antonio Irpino Dipartimento di Studi Europei e Mediterranei Seconda Università degli Studi di Napoli (ITALY) [rosanna.verde] [antonio.irpino]@unina2.it COMPSTAT 2010 - Paris - August 22-27

Outline  Histogram data  A regression model for histogram variables  Properties of the Wasserstein distance  Ordinary Least Square fitting  Tools for the interpretation  An application on real data COMPSTAT 2010 - Paris - August 22-27

Sources of histogram data  Result of summary/clustering procedures From surveys  From large databases  From sensors  0.6 0.5 Temperatures  0.5 Pollutant concentration  0.4 Network activity  0.3 0.2 0.2 0.2  Data streams 0.1 0.1 Description of time windows 0   Image analysis Color bandwidths   Confidentiality data 0.6 0.5 0.5 Summary data – non punctual 0.4  0.4 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0 COMPSTAT 2010 - Paris - August 22-27

Histogram data as a particular case of modal symbolic descriptions [ Bock and Diday (2000) ] Histogram data is a kind of symbolic representation which allows to describe an individual by means of a histogram In Bock and Diday (2000) Histogram variable is one of the three definition of modal numerical variables :  [ Histogram variable ] The description is a classic histogram where the support is partitioned into intervals. Each interval is weighted by the empirical density;  [ Empirical distribution function variable ] The description is done according to an empirical distribution function;  [ Model of distribution variable ] The description is done according to a predefined model of random variable. COMPSTAT 2010 - Paris - August 22-27

Histogram variable Let Y be a continuous variable defined on a finite support where: are the minimum and maximum values of the variable domain. The variable Y is partitioned into a set of contiguous intervals (bins) Given n observations of the variable Y, each semi-open interval, is associated with a random variable equal to It is possible to associate with an empirical distribution A histogram of Y is the representation in which each pair (for h = 1 , …,H ) is represented by a vertical bar, with base interval along the horizontal axis and the area proportional to COMPSTAT 2010 - Paris - August 22-27

A Regression model for histogram variables  In order to study the dependence structure of a histogram variable Y (dependent) to the another X (independent) we introduce a new regression approach based on the Ordinary Least Square estimation method According to the nature of the variables, we propose to compute the squared deviations (in the least squares function) by using the Wasserstein distance. COMPSTAT 2010 - Paris - August 22-27

A Regression model for histogram variables Data = Model Fit + Residual  Linear regression is a general method for estimating/describing association between a continuous outcome variable (dependent) and one or multiple predictors in one equation. Easy conceptual task with classic data But what does it means when dealing with histogram data? COMPSTAT 2010 - Paris - August 22-27

Simple linear regression Classic data Histogram data COMPSTAT 2010 - Paris - August 22-27

Regression between histograms: a proposal A solution was given by Billard and Diday (2006)  The model fit a linear regression line throught the mixture of the n bivariate distributions  Given a punctual value of X it is possible to predict the punctual value of Y COMPSTAT 2010 - Paris - August 22-27

Regression between histograms: our approach  Given a histogram description for X, we search for a li linear r trasfo sformat rmation ion of the description which allows us to predict the histogram description of Y  For example: given the temperature ature histogr ogram am observed ved in a region on during ng a m month, Is it possible to predict the e dist stribution tion of the temper perature ature of a another her month using a linear transformation of the histogram variable? A histogram by a histogram COMPSTAT 2010 - Paris - August 22-27

Wasserstein distance  We propose to use the Wasserstein-Kantorovich metric in Least Square Function. Expecially the derived L 2 - Mallow’s distance between two quantile functions 1     2      1 1 d x ,x F (t ) F (t ) dt W i j i j 0 COMPSTAT 2010 - Paris - August 22-27

An interpretative decomposition of the L 2 -Wasserstein metric 1    2      2 1 1 d ( x x , ) : F ( ) t F ( ) t dt W i j i j 0     2 2            x x 2 (1 ( x x , ) i j i j i j i j Shape Location Siz e QQ plot 100 95 90 85 80 75 70 65 If the two distribution have the same shape: 60 0 10 20 30 40     2 ( , 2 2         d x x ) : W i j i j i j Location Size   2 ( , 2     If they have the same size and shape: d x x ) : W i j i j Location COMPSTAT 2010 - Paris - August 22-27

Some simplifications and notations quantile function of the i-th Mean and variance macro-unit x i   1 of the distribution/ x t ( ) F ( ) t (histogram/distribution data) i i histogram data 1 1 1                    2 2 2 2 2 2 x x t dt ( ) and x t ( ) dt x x t ( ) dt x i i x i i i x i i i 0 0 0 Average distribution/ histogram data 1 1 n n n 1 1 1            x t ( ) x t ( ) t [0,1]; x x t dt ( ) x x t dt ( ) i i i n n n    i 1 i 1 i 1 0 0 1   x t x t dt ( ) ( ) x x i j i j 1          0 ( , x x ) x t x t dt ( ) ( ) ( , x x ) x x   i j i j i j x x i j i j x x 0 i j Correlation between pair of distribution/histogram data (x i , x j ) COMPSTAT 2010 - Paris - August 22-27

Fitting with a linear model  Given two variables Y and X regression model is here proposed to perform a linear transformation of X which better fit Y         y t ( ) x ( ) t ( ) t t [0,1] i i i ˆ y i  Considering the error as close as possible to zero:    ˆ ( ) t y t ( ) y t ( ) i i i COMPSTAT 2010 - Paris - August 22-27

The error term in the classic case  Classic case (Euclidean norm)            2 ˆ ˆ ˆ 2 2 y y y y d y , y i i i i i i E i i ˆ i y Error y i x i COMPSTAT 2010 - Paris - August 22-27

The error term of the model (our approach)  Histogram case (Wasserstein distance)         2   2   ˆ ˆ ( ) t y t ( ) y ( ) t ( ) t y t ( ) y ( ) t t [0,1 ] i i i i i i 1       2  ˆ ˆ 2 y t ( ) y ( ) t dt d y , y i i W i i 0 ˆ ( ) y t predicted i (squared) Error   , ˆ 2 d y y W i i y t ( ) observed i x t ( ) i COMPSTAT 2010 - Paris - August 22-27

Fitting a linear model: histograms  We propose to find a linear transformation of the quantile function of x i (histogram data) in order to predict the quantile function of y i i.e.:        ˆ y ( ) t f x t ( ( )) x t ( ) t [0, 1 ] i i i It is worth noting the linear transformation is unique: the parameters  and  are estimated for all the i macro-units x i and y i  A first problem: ˆ ( ) Only if  >0 a quantile function can be derived. y t i In order to overcome this problem, we propose a solution based on the decomposition of the Wasserstein distance. COMPSTAT 2010 - Paris - August 22-27

Solution to <0  The quantile function can be decomposed as:   c x ( ) t x x t ( ) where i i i   c x t ( ) x ( ) t x is the centered quantile functi n o i i i  Then, we propose the following model:         c y t ( ) x x t ( ) ( ). t i 1 i 2 i i  Using the Wasserstein distance it is possible to set up a OLS method that returns three coefficients. We demonstrate  2 is always greater or equal to zero. COMPSTAT 2010 - Paris - August 22-27

The error term: a property of the Wasserstein distance decomposition  The (squared) error can be written according the two components 1         2 ˆ ˆ 2 2 d ( y , y ) y t ( ) y ( t ) dt i W i i i i 0   2 ˆ    ˆ 2 c c y y d ( y , y ) i i W i i COMPSTAT 2010 - Paris - August 22-27

Ordinary Least Squares for Histogram Data based on Wasserstein - PowerPoint PPT Presentation

Ordinary Least Squares for Histogram Data based on Wasserstein Distance Rosanna Verde Antonio Irpino Dipartimento di Studi Europei e Mediterranei Seconda Universit degli Studi di Napoli (ITALY) [rosanna.verde] [antonio.irpino]@unina2.it

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

. Surajit Ray Ray SAMSI, June 3 2005 - slide #1 Outline Outline Recap of (ordinary)

Squares of function spaces and function spaces on squares Miko laj Krupski University of

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Least Squares (outline) Standard regression: Fit data with weighted sum of regressors.

Linear least squares Non-consistent systems Ax = b , b / R ( A ) This means b or a part of it

1 Least Squares Regression Suppose someone hands you a stack of N vectors, { x N } , each of

Least Squares Regression October 30, 2019 October 30, 2019 1 / 22 Finding the Best Line We

COMS 4721: Machine Learning for Data Science Lecture 2, 1/19/2017 Prof. John Paisley Department

JUST THE MATHS SLIDES NUMBER 18.4 STATISTICS 4 (The principle of least squares) by

Simple Linear Regression and Correlation Model for designed experiment: Y i = 0 + 1 x i +

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares Sham M Kakade c 2018