Regression modelling using I-priors Haziq Jamil Supervisors: Dr. - PowerPoint PPT Presentation

Regression modelling using I-priors Haziq Jamil Supervisors: Dr. Wicher Bergsma & Prof. Irini Moustaki Social Statistics (Year 1) London School of Economics & Political Science 19 May 2015 PhD Presentation Event

Outline 1 Introduction 2 I-prior theory 3 Estimation methods 4 Examples of I-prior modelling Simple linear regression 1-dimensional smoothing Multilevel modelling Longitudinal modelling 5 Further work Structural Equation Models Models with structured error covariances Logistic models Haziq Jamil (LSE) I-prior regression 19 May 2015 2 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression • Consider a set of data points { ( y 1 , x 1 ) , . . . , ( y n , x n ) } . • A model is linear if the relationship between y i and the independent variables is linear. ◮ y i = β 0 + β 1 x i + ǫ i ✓ ◮ y i = β 0 + β 1 x i + β 2 x 2 i + ǫ i ✓ ◮ y i = β 0 x β 1 +2 β 2 + ǫ i ✗ i ◮ In other words, the equations must be linear in the parameters. Haziq Jamil (LSE) I-prior regression 19 May 2015 3 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression • Definition (The linear regression model) y i = f ( x i ) + ǫ i y i ∈ R , real-valued observations x i ∈ X , a set of characteristics for unit i (1) f ∈ F , a vector space of functions over the set X ( ǫ 1 , . . . , ǫ n ) ∼ N ( 0 , Ψ − 1 ) i = 1 , . . . , n Note: For iid observations, Ψ = ψ I n . In general, Ψ = ( ψ ij ). Haziq Jamil (LSE) I-prior regression 19 May 2015 4 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression THE BIG BAG OF LINES Haziq Jamil (LSE) I-prior regression 19 May 2015 5 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Estimation methods How to pick the best line from the bag of stuff? • Many ways - Least squares, maximum likelihood, Bayesian... • When dimensionality is large, may overfit. Solutions: ◮ Dimension reduction ◮ Random effects models ◮ Regularization ...all require additional assumptions • I-priors An I-prior on f is a distribution π on f such that its covariance matrix is the Fisher information of f . Also, assign a “best guess” on the prior mean, e.g. f 0 = 0. Haziq Jamil (LSE) I-prior regression 19 May 2015 6 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Example: multiple regression f � �� y = α + X β + ǫ ǫ ∼ N ( 0 , ψ − 1 I n ) We know from linear regression theory that I [ β ] = ψ X T X . An I-prior on β is then β ∼ N ( β 0 , λ 2 ψ X T X ) . Equivalently, β = β 0 + λ X T w w ∼ N ( 0 , ψ I n ) . Thus, an I-prior on f is f = α + X β 0 + λ XX T w w ∼ N ( 0 , ψ I n ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 7 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Inner products Functional vector spaces Kernel methods Reproducing kernels Hilbert spaces Gaussian random vectors I-prior theory Fisher Information Krein spaces Means of random functions Feature maps Variances of random functions Moore-Aronszajn Theorem Random functions Haziq Jamil (LSE) I-prior regression 19 May 2015 8 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Definition (Inner products) Let F be a vector space R . A function �· , ·� F : F × F → R is said to be an inner product on F if all of the following are satisfied: ◮ Symmetry: � f , g � F = � g , f � F ◮ Linearity: � af 1 + bf 2 , g � F = a � f 1 , g � F + b � f 2 , g � F ◮ Non-degeneracy: � f , g � F = 0 ⇒ f = 0 for all f , f 1 , f 2 , g ∈ F and a , b ∈ R . Additionally, an inner product is positive definite (negative definite) if � f , f � F ≥ 0 ( ≤ 0) . An inner product is indefinite if it is neither positive nor negative definite. • Definition (Hilbert space) A positive definite inner product space which is complete, i.e. contains the limits of all Cauchy sequences. • Definition (Krein space) An (indefinite) inner product space which generalizes Hilbert spaces by dropping the positive definite restriction. Haziq Jamil (LSE) I-prior regression 19 May 2015 9 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Definition (Kernels) Let X be a non-empty set. A function h : X × X → R is called a kernel if there exists a Hilbert space F and a map φ : X → F such that ∀ x , x ′ ∈ X , h ( x , x ′ ) = � φ ( x ) , φ ( x ′ ) � . • Definition (Reproducing kernels) Let F be a Hilbert/Krein space of functions over a non-empty set X . A function h : X × X → R is called a reproducing kernel of F , and F a RKHS/RKKS, if h satisfies ◮ ∀ x ∈ X , h ( · , x ) ∈ F ◮ ∀ x ∈ X , f ∈ F , � f , h ( · , x ) � F = f ( x ). • Kernel algorithms have many important uses in Machine Learning literature, such as pattern recognition, kernel PCA, finding distances of means in feature space, and many more. Haziq Jamil (LSE) I-prior regression 19 May 2015 10 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Theorem (Gaussian I-priors) [Bergsma, 2014] For the linear regression model (1) , let F be the RKKS with kernel h : X × X → R . Then, assuming it exists, the Fisher information for f is given by n n � � I [ f ]( x i , x ′ ψ kl h ( x i , x k ) h ( x ′ i ) = i , x l ) . k =1 l =1 Let π be a Gaussian I-prior on f with prior mean f 0 and variance I [ f ] . Then π is called an I-prior for f , and a random vector f ∼ π has the random effect representation n � f ( x i ) = f 0 ( x i ) + h ( x i , x k ) w k k =1 ( w 1 , . . . , w n ) ∼ N ( 0 , Ψ ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 11 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . • Choose different RKHS/RKKS F and corresponding h to suit the type/characteristic of the x s in order to do regression modelling. Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . • Choose different RKHS/RKKS F and corresponding h to suit the type/characteristic of the x s in order to do regression modelling. THE BAG OF BAG OF BAG OF BIG BAG LINES FOR STRAIGHT SMOOTH OF EACH LINES LINES LINES GROUP Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Toolbox of RKHS/RKKS X = { x i } Characteristic/Uses Vector space F Kernel h ( x i , x k ) 1) Categorical covariates; I [ x i = x k ] − 1 where p i = Nominal Pearson p i 2) In a multilevel setting, P [ X = x i ] x i = group no. of unit i . As in classical regression, Real x i = real-valued covariate Canonical x i x k associated with unit i . Fractional As in (1-dim) smoothing, | x i | 2 γ + | x k | 2 γ −| x i − x k | 2 γ Real x i = data point associated Brownian with γ ∈ (0 , 1) with observation y i . Motion (FBM) • We can construct new RKHS/RKKS from existing ones. ◮ Example (ANOVA RKKS) Set of x i = ( x 1 i , x 2 i ) of Nominal + Real characteristics. Then h ( x i , x ′ i ) = h 1 ( x 1 i , x ′ 1 i ) + h 2 ( x 2 i , x ′ 2 i ) + h 1 ( x 1 i , x ′ 1 i ) h 2 ( x 2 i , x ′ 2 i ) Haziq Jamil (LSE) I-prior regression 19 May 2015 13 / 27

Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Parameters to be estimated • Let’s choose a prior mean of zero (or set an overall constant/intercept to be estimated). • For the I-prior linear model n � y i = α + h λ ( x i , x k ) w k + ǫ i k =1 ǫ i ∼ N (0 , ψ − 1 ) (2) w i ∼ N (0 , ψ ) i = 1 , . . . , n , the parameters to be estimated are θ = ( α, λ, ψ ) T . • λ is introduced to resolve the arbitrary scale of an RKKS/RKHS F over a set X . Number of λ parameters = number of kernels used, not interactions nor covariates. Haziq Jamil (LSE) I-prior regression 19 May 2015 14 / 27

Regression modelling using I-priors Haziq Jamil Supervisors: Dr. - PowerPoint PPT Presentation

Regression modelling using I-priors Haziq Jamil Supervisors: Dr. Wicher Bergsma & Prof. Irini Moustaki Social Statistics (Year 1) London School of Economics & Political Science 19 May 2015 PhD Presentation Event Outline 1 Introduction

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation,

Informative Priors for Graphical Model Structure James Cussens, University of York

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Efficient sampling for Gaussian linear regression with arbitrary priors P. Richard Hahn (Arizona

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Tunable topological phononic crystals Zeguo Chen ( ) and Ying Wu ( ) King

Tuning 2 ND AND 3 RD Order Exceptional Points by Kerr- Nonlinearity By : Shahab Ramezanpour

Barclays CEO Energy-Power Conference September 12, 2013 Cautionary Language This presentation

Kiama Alpine Club 2013 Annual General Meeting Saturday 4 May 2013 The Pavilion Kiama Liz Wynn

Implementing an MPS mtGenome Panel into Casework in a Missing Persons DNA Program California

Disclosures Expression May Distinguish Expression May Distinguish Eosinophilic Esophagitis from

Uniform Data System Calendar Year 2013 Bureau of Primary Health Care Agenda Brief Introduction

Successful Grantseeking with Grant Connect Delta Oakville Arts Council - October 2018 Agenda