Factor and Independent Component Analysis Michael Gutmann - PowerPoint PPT Presentation

Factor and Independent Component Analysis Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018

Recap ◮ Model-based learning from data ◮ Observed data as a sample from an unknown data generating distribution ◮ Learning using parametric statistical models and Bayesian models, ◮ Their relation to probabilistic graphical models ◮ Likelihood function, maximum likelihood estimation, and the mechanics of Bayesian inference ◮ Classical examples to illustrate the concepts Michael Gutmann FA and ICA 2 / 27

Applications of factor and independent component analysis ◮ Factor analysis and independent component analysis are two classical methods for data analysis. ◮ The origins of factor analysis (FA) are attributed to a 1904 paper by psychologist Charles Spearman. It is used in fields such as ◮ Psychology, e.g intelligence research ◮ Marketing ◮ Wide range of physical and biological sciences . . . ◮ Independent component analysis (ICA) has mainly been developed in the 90s. It can be used where FA can be used. Popular applications include ◮ Neuroscience (brain imaging, spike sorting) and theoretical neuroscience ◮ Telecommunications (de-convolution, blind source separation) ◮ Finance (finding hidden factors) . . . Michael Gutmann FA and ICA 3 / 27

Directed graphical model underlying FA and ICA FA: factor analysis ICA: independent component analysis h 1 h 2 h 3 v 1 v 2 v 3 v 4 v 5 ◮ The visibles v = ( v 1 , . . . , v D ) are independent from each other given the latents h = ( h 1 , . . . , h H ), but generally dependent under the marginal p ( v ). ◮ Explains statistical dependencies between (observed) v i through (unobserved) h i . ◮ Different assumptions on p ( v | h ) and p ( h ) lead to different statistical models, and data analysis methods with markedly different properties. Michael Gutmann FA and ICA 4 / 27

Program 1. Factor analysis 2. Independent component analysis Michael Gutmann FA and ICA 5 / 27

Program 1. Factor analysis Parametric model Ambiguities in the model (factor rotation problem) Learning the parameters by maximum likelihood estimation Probabilistic principal component analysis as special case 2. Independent component analysis Michael Gutmann FA and ICA 6 / 27

Parametric model for factor analysis ◮ In factor analysis (FA), all random variables are Gaussian. ◮ Importantly, the number of latents H is assumed smaller than the number of visibles D . ◮ Latents: p ( h ) = N ( h ; 0 , I ) (uncorrelated standard normal) ◮ Conditional p ( v | h ; θ ) is Gaussian p ( v | h ; θ ) = N ( v ; Fh + c , Ψ Ψ Ψ) Parameters θ are ◮ Vector c ∈ R D : sets the mean of v ◮ F = ( f 1 , . . . f H ): D × H matrix with D > H Columns f i are called “factors”, its elements the “factor loadings”. ◮ Ψ Ψ Ψ: diagonal matrix Ψ Ψ Ψ = diag(Ψ 1 , . . . , Ψ D ) Tuning parameter: the number of factors H Michael Gutmann FA and ICA 7 / 27

Parametric model for factor analysis ◮ p ( v | h ; θ ) = N ( v ; Fh + c , Ψ Ψ Ψ) is equivalent to v = Fh + c + ǫ H � = f i h i + c + ǫ ǫ ∼ N ( ǫ ; 0 , Ψ Ψ Ψ) i =1 ◮ Data generation: Add H < D factors weighted by h i to the constant vector c , and corrupt the “signal” Fh + c by additive Gaussian noise. ◮ Fh spans a H dimensional subspace of R D Michael Gutmann FA and ICA 8 / 27

Interesting structure of the data is contained in a subspace Example for D = 2 , H = 1. 14 data 12 c f 10 8 6 v 2 4 2 0 -2 -4 -1 0 1 2 3 4 5 v 1 Michael Gutmann FA and ICA 9 / 27

Interesting structure of the data is contained in a subspace Example for D = 3 , H = 2 (“pancake” in the 3D space) 4 4 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −4 −4 −2 −2 0 0 6 6 2 2 4 4 2 2 4 4 0 0 6 −2 6 −2 −4 −4 Black points: Fh + c Red points: Fh + c + ǫ (points below the plane not shown ) (Figures courtesy of David Barber) Michael Gutmann FA and ICA 10 / 27

Basic results that we need ◮ If x has density N ( x ; µ µ x , C x ), z density N ( z ; µ µ z , C z ), and µ µ x ⊥ ⊥ z then y = Ax + z has density µ z , AC x A ⊤ + C z ) N ( y ; A µ µ µ x + µ µ (see e.g. Barber Result 8.3) ◮ An orthonormal (orthogonal) matrix R is a square matrix for which the transpose R ⊤ equals the inverse R − 1 , i.e. R ⊤ = R − 1 R ⊤ R = RR ⊤ = I or (see e.g. Barber Appendix A.1) ◮ Orthonormal matrices rotate points. Michael Gutmann FA and ICA 11 / 27

Factor rotation problem ◮ Using the basic results, we obtain v = Fh + c + ǫ = F ( RR ⊤ ) h + c + ǫ = ( FR )( R ⊤ h ) + c + ǫ = ( FR )˜ h + c + ǫ ◮ Since p ( h ) = N ( h ; 0 , I ) and R is orthonormal, p (˜ h ) = N (˜ h ; 0 , I ), and the two models v = ( FR )˜ v = Fh + c + ǫ h + c + ǫ produce data with exactly the same distribution. Michael Gutmann FA and ICA 12 / 27

Factor rotation problem ◮ Two estimates ˆ F and ˆ FR explain the data equally well. ◮ Estimation of the factor matrix F is not unique. ◮ With the Gaussianity assumption on h , there is a rotational ambiguity in the factor analysis model. ◮ The columns of F and FR span the same subspace, so that the FA model is best understood to define a subspace of the data space. ◮ The individual columns of F (factors) carry little meaning by themselves. ◮ There are post-processing methods that choose R after estimation of F so that the columns of FR have some desirable properties to aid interpretation, e.g. that they have as many zeros as possible (sparsity). Michael Gutmann FA and ICA 13 / 27

Likelihood function ◮ We have seen that the FA model can be written as v = Fh + c + ǫ h ∼ N ( h ; 0 , I ) ǫ ∼ N ( ǫ ; 0 , Ψ Ψ Ψ) with ǫ ⊥ ⊥ h ◮ From the basic results on multivariate Gaussians: v is Gaussian with mean and variance equal to V [ v ] = FF ⊤ + Ψ E [ v ] = c Ψ Ψ ◮ Likelihood is given by likelihood for multivariate Gaussian (see Barber Section 21.1) ◮ But due to the form of the covariance matrix of v , closed form solution is not possible and iterative methods are needed (see Barber Section 21.2, not examinable). Michael Gutmann FA and ICA 14 / 27

Probabilistic principal component analysis as special case ◮ In FA, the variances Ψ i of the additive noise ǫ can be different for each dimension. ◮ Probabilistic principal component analysis (PPCA) is obtained for Ψ i = σ 2 Ψ = σ 2 I Ψ Ψ ◮ FA has a richer description of the additive noise than PCA. Michael Gutmann FA and ICA 15 / 27

Comparison of FA and PPCA (Based on a slide from David Barber) The parameters were estimated from handwritten “7s” for FA and PPCA. After learning, samples can be drawn from the model via � N ( ǫ ; 0 ; ˆ Ψ Ψ Ψ) for FA v = ˆ Fh + ˆ ǫ ∼ c + ǫ σ 2 I ) N ( ǫ ; 0 ; ˆ for PPCA Figures below show samples. Note how the noise variance for FA depends on the pixel, being zero for pixels on the boundary of the image. (a) Factor Analysis (b) PPCA Michael Gutmann FA and ICA 16 / 27

Program 1. Factor analysis Parametric model Ambiguities in the model (factor rotation problem) Learning the parameters by maximum likelihood estimation Probabilistic principal component analysis as special case 2. Independent component analysis Michael Gutmann FA and ICA 17 / 27

Program 1. Factor analysis 2. Independent component analysis Parametric model Ambiguities in the model sub-Gaussian and super-Gaussian pdfs Learning the parameters by maximum likelihood estimation Michael Gutmann FA and ICA 18 / 27

Parametric model for independent component analysis ◮ In ICA, unlike in FA, the latents are assumed to be non-Gaussian. (one latent can be assumed to be Gaussian) ◮ The latents h i are assumed to be statistically independent � p h ( h ) = p h ( h i ) i ◮ Conditional p ( v | h ; θ ) is generally Gaussian p ( v | h ; θ ) = N ( v ; Fh + c , Ψ Ψ Ψ) or v = Fh + c + ǫ Called “noisy” ICA ◮ The number of latents H can be larger than D (“overcomplete” case) or smaller (“undercomplete” case). ◮ We here consider the widely used special case where the noise is zero and H = D . Michael Gutmann FA and ICA 19 / 27

Parametric model for independent component analysis In ICA, the matrix F is typically denoted by A and called the “mixing” matrix. The model is D � v = Ah p h ( h ) = p h ( h i ) i =1 where the h i are typically assumed to have zero mean and unit variance. Michael Gutmann FA and ICA 20 / 27

Ambiguities ◮ Denote the columns of A by a i . ◮ From D D D ( a i α i ) 1 � � � v = Ah = a i h i = a i k h i k = h i α i i =1 i =1 k =1 it follows that the ICA model has an ambiguity regarding the ordering of the columns of A and their scaling. ◮ The unit variance assumption on the latents fixes the scaling but not the ordering ambiguity. ◮ Note: for non-Gaussian latents, there is no rotational ambiguity. Michael Gutmann FA and ICA 21 / 27

Non-Gaussian latents: variables with sub-Gaussian pdfs ◮ Sub-Gaussian pdf: (assume variables have mean zero) pdf that is less peaked at zero than a Gaussian of the same variance. ◮ Example: pdf of a uniform distribution Samples ( h 1 , h 2 ) Samples ( v 1 , v 2 ) Horizontal axes: h 1 and v 1 . Vertical axes h 2 and v 2 . Not in the same scale (Figures 7.5 and 7.6 from Independent Component Analysis by Hyvärinen, Karhunen, and Oja) . Michael Gutmann FA and ICA 22 / 27

Factor and Independent Component Analysis Michael Gutmann - PowerPoint PPT Presentation

Factor and Independent Component Analysis Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap Model-based learning from data Observed data as a

Triadic Factor Analysis Cynthia Glodeanu Institute of Algebra, TU Dresden October 19, 2010.

Independent Component Analysis Aleix M. Martinez aleix@ece.osu.edu Independent Component

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Independent Component Independent Component Analysis y Class 20. 8 Nov 2012 Instructor: Bhiksha

Week 7 Video 5 Factor Analysis Factor Analysis You have a whole lot of variables Can

Factor Analysis and Beyond Chris Williams School of Informatics, University of Edinburgh October

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Certainty Factor certainty factor CF (is the certainty factor in the hypothesis H due to

Attribute Grammars intermediate syntax semantics representation Language Implementation 2

(IHBG) Competitive NOFA Training Rating Factor 3: Soundness of Approach 1 Rating Factor 3

Predicting condition specific transcription factors for target gene. Kaur Alasoo 19.09.2012

Virtual-CPU Scheduling in the Quest Operating System Matt Danish, Ye Li and Richard West

Neuhd Neuhd tho;it tho;it Nehf;fp..... Nehf;fp..... Mf;fj;Jf;fhd MNyhrid lhf;lu; epky;

Zero Tax Rate at the Top Re-established Tomer Blumkin, Efraim Sadka and Yotam Shem-Tov April 2012,

Exploration Update February, 2017 Strictly Private and Confidential Disclaimer Forward-Looking

Assessment of Intrastate Contributions to Ozone Nonattainment Monitors in Atlanta, GA Byeong-Uk

Income ome A Appr pproach Reser serve f e for Replacemen ents an operating expense

DISSOLUTION PROFILE SIMILARITY FACTOR, F 2 * Yi Tsong (FDA) *: Disclaimer: This presentation

Chaos, Random Matrix Theory and Spectral Properties of the SYK Model Jacobus Verbaarschot

Factor and Independent Component Analysis Michael Gutmann - PowerPoint PPT Presentation

Factor and Independent Component Analysis Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap Model-based learning from data Observed data as a

Triadic Factor Analysis Cynthia Glodeanu Institute of Algebra, TU Dresden October 19, 2010.

Independent Component Analysis Aleix M. Martinez aleix@ece.osu.edu Independent Component

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &amp;

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Independent Component Independent Component Analysis y Class 20. 8 Nov 2012 Instructor: Bhiksha

Week 7 Video 5 Factor Analysis Factor Analysis You have a whole lot of variables Can

Factor Analysis and Beyond Chris Williams School of Informatics, University of Edinburgh October

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Certainty Factor certainty factor CF (is the certainty factor in the hypothesis H due to

Attribute Grammars intermediate syntax semantics representation Language Implementation 2

(IHBG) Competitive NOFA Training Rating Factor 3: Soundness of Approach 1 Rating Factor 3

Predicting condition specific transcription factors for target gene. Kaur Alasoo 19.09.2012

Virtual-CPU Scheduling in the Quest Operating System Matt Danish, Ye Li and Richard West

Neuhd Neuhd tho;it tho;it Nehf;fp..... Nehf;fp..... Mf;fj;Jf;fhd MNyhrid lhf;lu; epky;

Zero Tax Rate at the Top Re-established Tomer Blumkin, Efraim Sadka and Yotam Shem-Tov April 2012,

Exploration Update February, 2017 Strictly Private and Confidential Disclaimer Forward-Looking

Assessment of Intrastate Contributions to Ozone Nonattainment Monitors in Atlanta, GA Byeong-Uk

Income ome A Appr pproach Reser serve f e for Replacemen ents an operating expense

DISSOLUTION PROFILE SIMILARITY FACTOR, F 2 * Yi Tsong (FDA) *: Disclaimer: This presentation

Chaos, Random Matrix Theory and Spectral Properties of the SYK Model Jacobus Verbaarschot

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &