regression modelling using i priors
play

Regression modelling using I-priors Haziq Jamil Supervisors: Dr. - PowerPoint PPT Presentation

Regression modelling using I-priors Haziq Jamil Supervisors: Dr. Wicher Bergsma & Prof. Irini Moustaki Social Statistics (Year 1) London School of Economics & Political Science 19 May 2015 PhD Presentation Event Outline 1 Introduction


  1. Regression modelling using I-priors Haziq Jamil Supervisors: Dr. Wicher Bergsma & Prof. Irini Moustaki Social Statistics (Year 1) London School of Economics & Political Science 19 May 2015 PhD Presentation Event

  2. Outline 1 Introduction 2 I-prior theory 3 Estimation methods 4 Examples of I-prior modelling Simple linear regression 1-dimensional smoothing Multilevel modelling Longitudinal modelling 5 Further work Structural Equation Models Models with structured error covariances Logistic models Haziq Jamil (LSE) I-prior regression 19 May 2015 2 / 27

  3. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression • Consider a set of data points { ( y 1 , x 1 ) , . . . , ( y n , x n ) } . • A model is linear if the relationship between y i and the independent variables is linear. ◮ y i = β 0 + β 1 x i + ǫ i ✓ ◮ y i = β 0 + β 1 x i + β 2 x 2 i + ǫ i ✓ ◮ y i = β 0 x β 1 +2 β 2 + ǫ i ✗ i ◮ In other words, the equations must be linear in the parameters. Haziq Jamil (LSE) I-prior regression 19 May 2015 3 / 27

  4. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression • Definition (The linear regression model) y i = f ( x i ) + ǫ i y i ∈ R , real-valued observations x i ∈ X , a set of characteristics for unit i (1) f ∈ F , a vector space of functions over the set X ( ǫ 1 , . . . , ǫ n ) ∼ N ( 0 , Ψ − 1 ) i = 1 , . . . , n Note: For iid observations, Ψ = ψ I n . In general, Ψ = ( ψ ij ). Haziq Jamil (LSE) I-prior regression 19 May 2015 4 / 27

  5. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression THE BIG BAG OF LINES Haziq Jamil (LSE) I-prior regression 19 May 2015 5 / 27

  6. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Estimation methods How to pick the best line from the bag of stuff? • Many ways - Least squares, maximum likelihood, Bayesian... • When dimensionality is large, may overfit. Solutions: ◮ Dimension reduction ◮ Random effects models ◮ Regularization ...all require additional assumptions • I-priors An I-prior on f is a distribution π on f such that its covariance matrix is the Fisher information of f . Also, assign a “best guess” on the prior mean, e.g. f 0 = 0. Haziq Jamil (LSE) I-prior regression 19 May 2015 6 / 27

  7. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Example: multiple regression f � �� � y = α + X β + ǫ ǫ ∼ N ( 0 , ψ − 1 I n ) We know from linear regression theory that I [ β ] = ψ X T X . An I-prior on β is then β ∼ N ( β 0 , λ 2 ψ X T X ) . Equivalently, β = β 0 + λ X T w w ∼ N ( 0 , ψ I n ) . Thus, an I-prior on f is f = α + X β 0 + λ XX T w w ∼ N ( 0 , ψ I n ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 7 / 27

  8. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Inner products Functional vector spaces Kernel methods Reproducing kernels Hilbert spaces Gaussian random vectors I-prior theory Fisher Information Krein spaces Means of random functions Feature maps Variances of random functions Moore-Aronszajn Theorem Random functions Haziq Jamil (LSE) I-prior regression 19 May 2015 8 / 27

  9. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Definition (Inner products) Let F be a vector space R . A function �· , ·� F : F × F → R is said to be an inner product on F if all of the following are satisfied: ◮ Symmetry: � f , g � F = � g , f � F ◮ Linearity: � af 1 + bf 2 , g � F = a � f 1 , g � F + b � f 2 , g � F ◮ Non-degeneracy: � f , g � F = 0 ⇒ f = 0 for all f , f 1 , f 2 , g ∈ F and a , b ∈ R . Additionally, an inner product is positive definite (negative definite) if � f , f � F ≥ 0 ( ≤ 0) . An inner product is indefinite if it is neither positive nor negative definite. • Definition (Hilbert space) A positive definite inner product space which is complete, i.e. contains the limits of all Cauchy sequences. • Definition (Krein space) An (indefinite) inner product space which generalizes Hilbert spaces by dropping the positive definite restriction. Haziq Jamil (LSE) I-prior regression 19 May 2015 9 / 27

  10. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Definition (Kernels) Let X be a non-empty set. A function h : X × X → R is called a kernel if there exists a Hilbert space F and a map φ : X → F such that ∀ x , x ′ ∈ X , h ( x , x ′ ) = � φ ( x ) , φ ( x ′ ) � . • Definition (Reproducing kernels) Let F be a Hilbert/Krein space of functions over a non-empty set X . A function h : X × X → R is called a reproducing kernel of F , and F a RKHS/RKKS, if h satisfies ◮ ∀ x ∈ X , h ( · , x ) ∈ F ◮ ∀ x ∈ X , f ∈ F , � f , h ( · , x ) � F = f ( x ). • Kernel algorithms have many important uses in Machine Learning literature, such as pattern recognition, kernel PCA, finding distances of means in feature space, and many more. Haziq Jamil (LSE) I-prior regression 19 May 2015 10 / 27

  11. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Theorem (Gaussian I-priors) [Bergsma, 2014] For the linear regression model (1) , let F be the RKKS with kernel h : X × X → R . Then, assuming it exists, the Fisher information for f is given by n n � � I [ f ]( x i , x ′ ψ kl h ( x i , x k ) h ( x ′ i ) = i , x l ) . k =1 l =1 Let π be a Gaussian I-prior on f with prior mean f 0 and variance I [ f ] . Then π is called an I-prior for f , and a random vector f ∼ π has the random effect representation n � f ( x i ) = f 0 ( x i ) + h ( x i , x k ) w k k =1 ( w 1 , . . . , w n ) ∼ N ( 0 , Ψ ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 11 / 27

  12. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� � � �� � λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27

  13. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� � � �� � λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . • Choose different RKHS/RKKS F and corresponding h to suit the type/characteristic of the x s in order to do regression modelling. Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27

  14. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� � � �� � λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . • Choose different RKHS/RKKS F and corresponding h to suit the type/characteristic of the x s in order to do regression modelling. THE BAG OF BAG OF BAG OF BIG BAG LINES FOR STRAIGHT SMOOTH OF EACH LINES LINES LINES GROUP Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27

  15. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Toolbox of RKHS/RKKS X = { x i } Characteristic/Uses Vector space F Kernel h ( x i , x k ) 1) Categorical covariates; I [ x i = x k ] − 1 where p i = Nominal Pearson p i 2) In a multilevel setting, P [ X = x i ] x i = group no. of unit i . As in classical regression, Real x i = real-valued covariate Canonical x i x k associated with unit i . Fractional As in (1-dim) smoothing, | x i | 2 γ + | x k | 2 γ −| x i − x k | 2 γ Real x i = data point associated Brownian with γ ∈ (0 , 1) with observation y i . Motion (FBM) • We can construct new RKHS/RKKS from existing ones. ◮ Example (ANOVA RKKS) Set of x i = ( x 1 i , x 2 i ) of Nominal + Real characteristics. Then h ( x i , x ′ i ) = h 1 ( x 1 i , x ′ 1 i ) + h 2 ( x 2 i , x ′ 2 i ) + h 1 ( x 1 i , x ′ 1 i ) h 2 ( x 2 i , x ′ 2 i ) Haziq Jamil (LSE) I-prior regression 19 May 2015 13 / 27

  16. Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Parameters to be estimated • Let’s choose a prior mean of zero (or set an overall constant/intercept to be estimated). • For the I-prior linear model n � y i = α + h λ ( x i , x k ) w k + ǫ i k =1 ǫ i ∼ N (0 , ψ − 1 ) (2) w i ∼ N (0 , ψ ) i = 1 , . . . , n , the parameters to be estimated are θ = ( α, λ, ψ ) T . • λ is introduced to resolve the arbitrary scale of an RKKS/RKHS F over a set X . Number of λ parameters = number of kernels used, not interactions nor covariates. Haziq Jamil (LSE) I-prior regression 19 May 2015 14 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend