 
              Regression modelling using I-priors Haziq Jamil Supervisors: Dr. Wicher Bergsma & Prof. Irini Moustaki Social Statistics (Year 1) London School of Economics & Political Science 19 May 2015 PhD Presentation Event
Outline 1 Introduction 2 I-prior theory 3 Estimation methods 4 Examples of I-prior modelling Simple linear regression 1-dimensional smoothing Multilevel modelling Longitudinal modelling 5 Further work Structural Equation Models Models with structured error covariances Logistic models Haziq Jamil (LSE) I-prior regression 19 May 2015 2 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression • Consider a set of data points { ( y 1 , x 1 ) , . . . , ( y n , x n ) } . • A model is linear if the relationship between y i and the independent variables is linear. ◮ y i = β 0 + β 1 x i + ǫ i ✓ ◮ y i = β 0 + β 1 x i + β 2 x 2 i + ǫ i ✓ ◮ y i = β 0 x β 1 +2 β 2 + ǫ i ✗ i ◮ In other words, the equations must be linear in the parameters. Haziq Jamil (LSE) I-prior regression 19 May 2015 3 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression • Definition (The linear regression model) y i = f ( x i ) + ǫ i y i ∈ R , real-valued observations x i ∈ X , a set of characteristics for unit i (1) f ∈ F , a vector space of functions over the set X ( ǫ 1 , . . . , ǫ n ) ∼ N ( 0 , Ψ − 1 ) i = 1 , . . . , n Note: For iid observations, Ψ = ψ I n . In general, Ψ = ( ψ ij ). Haziq Jamil (LSE) I-prior regression 19 May 2015 4 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Linear regression THE BIG BAG OF LINES Haziq Jamil (LSE) I-prior regression 19 May 2015 5 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Estimation methods How to pick the best line from the bag of stuff? • Many ways - Least squares, maximum likelihood, Bayesian... • When dimensionality is large, may overfit. Solutions: ◮ Dimension reduction ◮ Random effects models ◮ Regularization ...all require additional assumptions • I-priors An I-prior on f is a distribution π on f such that its covariance matrix is the Fisher information of f . Also, assign a “best guess” on the prior mean, e.g. f 0 = 0. Haziq Jamil (LSE) I-prior regression 19 May 2015 6 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Example: multiple regression f � �� � y = α + X β + ǫ ǫ ∼ N ( 0 , ψ − 1 I n ) We know from linear regression theory that I [ β ] = ψ X T X . An I-prior on β is then β ∼ N ( β 0 , λ 2 ψ X T X ) . Equivalently, β = β 0 + λ X T w w ∼ N ( 0 , ψ I n ) . Thus, an I-prior on f is f = α + X β 0 + λ XX T w w ∼ N ( 0 , ψ I n ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 7 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Inner products Functional vector spaces Kernel methods Reproducing kernels Hilbert spaces Gaussian random vectors I-prior theory Fisher Information Krein spaces Means of random functions Feature maps Variances of random functions Moore-Aronszajn Theorem Random functions Haziq Jamil (LSE) I-prior regression 19 May 2015 8 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Definition (Inner products) Let F be a vector space R . A function �· , ·� F : F × F → R is said to be an inner product on F if all of the following are satisfied: ◮ Symmetry: � f , g � F = � g , f � F ◮ Linearity: � af 1 + bf 2 , g � F = a � f 1 , g � F + b � f 2 , g � F ◮ Non-degeneracy: � f , g � F = 0 ⇒ f = 0 for all f , f 1 , f 2 , g ∈ F and a , b ∈ R . Additionally, an inner product is positive definite (negative definite) if � f , f � F ≥ 0 ( ≤ 0) . An inner product is indefinite if it is neither positive nor negative definite. • Definition (Hilbert space) A positive definite inner product space which is complete, i.e. contains the limits of all Cauchy sequences. • Definition (Krein space) An (indefinite) inner product space which generalizes Hilbert spaces by dropping the positive definite restriction. Haziq Jamil (LSE) I-prior regression 19 May 2015 9 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Definition (Kernels) Let X be a non-empty set. A function h : X × X → R is called a kernel if there exists a Hilbert space F and a map φ : X → F such that ∀ x , x ′ ∈ X , h ( x , x ′ ) = � φ ( x ) , φ ( x ′ ) � . • Definition (Reproducing kernels) Let F be a Hilbert/Krein space of functions over a non-empty set X . A function h : X × X → R is called a reproducing kernel of F , and F a RKHS/RKKS, if h satisfies ◮ ∀ x ∈ X , h ( · , x ) ∈ F ◮ ∀ x ∈ X , f ∈ F , � f , h ( · , x ) � F = f ( x ). • Kernel algorithms have many important uses in Machine Learning literature, such as pattern recognition, kernel PCA, finding distances of means in feature space, and many more. Haziq Jamil (LSE) I-prior regression 19 May 2015 10 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Definitions & theorem • Theorem (Gaussian I-priors) [Bergsma, 2014] For the linear regression model (1) , let F be the RKKS with kernel h : X × X → R . Then, assuming it exists, the Fisher information for f is given by n n � � I [ f ]( x i , x ′ ψ kl h ( x i , x k ) h ( x ′ i ) = i , x l ) . k =1 l =1 Let π be a Gaussian I-prior on f with prior mean f 0 and variance I [ f ] . Then π is called an I-prior for f , and a random vector f ∼ π has the random effect representation n � f ( x i ) = f 0 ( x i ) + h ( x i , x k ) w k k =1 ( w 1 , . . . , w n ) ∼ N ( 0 , Ψ ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 11 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� � � �� � λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� � � �� � λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . • Choose different RKHS/RKKS F and corresponding h to suit the type/characteristic of the x s in order to do regression modelling. Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Back to the multiple regression example • We saw the I-prior method applied to multiple regression: � n k =1 h ( x i , x k ) w k f 0 ( x i ) � �� � � �� � λ ( XX T ) i w f ( x i ) = α + x i β 0 + w := ( w 1 , . . . , w n ) ∼ N ( 0 , ψ I n ) . • Choose different RKHS/RKKS F and corresponding h to suit the type/characteristic of the x s in order to do regression modelling. THE BAG OF BAG OF BAG OF BIG BAG LINES FOR STRAIGHT SMOOTH OF EACH LINES LINES LINES GROUP Haziq Jamil (LSE) I-prior regression 19 May 2015 12 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Toolbox of RKHS/RKKS X = { x i } Characteristic/Uses Vector space F Kernel h ( x i , x k ) 1) Categorical covariates; I [ x i = x k ] − 1 where p i = Nominal Pearson p i 2) In a multilevel setting, P [ X = x i ] x i = group no. of unit i . As in classical regression, Real x i = real-valued covariate Canonical x i x k associated with unit i . Fractional As in (1-dim) smoothing, | x i | 2 γ + | x k | 2 γ −| x i − x k | 2 γ Real x i = data point associated Brownian with γ ∈ (0 , 1) with observation y i . Motion (FBM) • We can construct new RKHS/RKKS from existing ones. ◮ Example (ANOVA RKKS) Set of x i = ( x 1 i , x 2 i ) of Nominal + Real characteristics. Then h ( x i , x ′ i ) = h 1 ( x 1 i , x ′ 1 i ) + h 2 ( x 2 i , x ′ 2 i ) + h 1 ( x 1 i , x ′ 1 i ) h 2 ( x 2 i , x ′ 2 i ) Haziq Jamil (LSE) I-prior regression 19 May 2015 13 / 27
Introduction I-prior theory Estimation methods Examples of I-prior modelling Further work End Parameters to be estimated • Let’s choose a prior mean of zero (or set an overall constant/intercept to be estimated). • For the I-prior linear model n � y i = α + h λ ( x i , x k ) w k + ǫ i k =1 ǫ i ∼ N (0 , ψ − 1 ) (2) w i ∼ N (0 , ψ ) i = 1 , . . . , n , the parameters to be estimated are θ = ( α, λ, ψ ) T . • λ is introduced to resolve the arbitrary scale of an RKKS/RKHS F over a set X . Number of λ parameters = number of kernels used, not interactions nor covariates. Haziq Jamil (LSE) I-prior regression 19 May 2015 14 / 27
Recommend
More recommend