Exploratory Factor Analysis Applied Multivariate Statistics Spring - PowerPoint PPT Presentation

Exploratory Factor Analysis Applied Multivariate Statistics – Spring 2012

Latent-variable models  Large number of observed (manifest) variables should be explained by a few un-observed (latent) underlying variables  E.g.: Scores on several tests are influenced by “general academic ability”  Assumes local independence: Manifest variables are independent given latent variables Latent variables Manifest Variables Continuous Categorical Continuous Factor Analysis Latent Profile Analysis Categorical Item Response Theory Latent Class Analysis

Overview  Introductory example  The general factor model for x and Σ  Estimation  Scale and rotation invariance  Factor rotation: Varimax  Factor scores  Comparing PCA and FA 2

Introductory example: Intelligence tests  Six intelligence tests (general, picture, blocks, maze, reading, vocab) on 112 persons  Sample correlation matrix  Can performance in and correlation between the six tests be explained by one or two variables describing some general concept of intelligence? 3

Introductory example: Intelligence tests f : Common factor (“ability”) Model: x 1 i = ¸ 1 f i + u 1 i x 2 i = ¸ 2 f i + u 2 i u: Random disturbance specific to each exam ::: x 6 i = ¸ 6 f i + u 6 i ¸ : Factor loadings - Importance of f on x j Key assumption: u 1 , u 2 , u 3 are uncorrelated Thus x 1 , x 2 , x 3 are conditionally uncorrelated given f 4

General Factor Model To be determined from x:  General model for one individual: • Number q of common factors x 1 = ¹ 1 + ¸ 11 f 1 + ::: + ¸ 1 q f q + u 1 • Factor loadings ¤ ::: • Specific variances ª x p = ¹ p + ¸ p 1 f p + ::: + ¸ pq f q + u p • Factor scores f  In matrix notation for one individual: x = ¹ + ¤ f + u  In matrix notation for n individuals: x i = ¹ + ¤ f i + u i ( i = 1 ;:::;n )  Assumptions: - Cov(u j , f s ) = 0 for all j, s - E[u] = 0, Cov(u) = ª is a diagonal matrix (diagonal elements = «uniquenesses»)  Convention: - E[f] = 0, Cov(f) = identity matrix (i.e. factors are scaled) Otherwise, ¤ and ¹ are not well determined 5

Representation in terms of covariance matrix  Using formulas and assumptions from previous slide: § = ¤¤ T + ª x = ¹ + ¤ f + u ,  Factor model = particular structure imposed on covariance matrix “communality”: variance  Variances can be split up: due to common factors j = P q var ( x j ) = ¾ 2 k =1 ¸ 2 jk + Ã j “specific variance”, “uniqueness”  “Heywood case” (= kind of estimation error): Ã j < 0 6

Estimation: MLE  Assume x i follows multivariate normal distribution  Choose Λ, Ψ to maximize the log-likelihood: 𝑜 𝑚 = log 𝑀 = − 𝑜 − 1 2 𝑦 𝑗 − 𝜈 𝑈 Σ −1 𝑦 𝑗 − 𝜈 2 log Σ 𝑗=1  Iterative solution, difficult in practice (local maxima) 7

Number of factors  MLE approach for estimation provides test: 𝐼 𝑟 : 𝑟 − 𝑔𝑏𝑑𝑢𝑝𝑠 𝑛𝑝𝑒𝑓𝑚 ℎ𝑝𝑚𝑒𝑡 𝑤𝑡 𝐼 𝑣 : Σ 𝑗𝑡 𝑣𝑜𝑑𝑝𝑜𝑡𝑢𝑠𝑏𝑗𝑜𝑓𝑒  Modelling strategy: Start with small value of q and increase successively until some 𝐼 𝑟 is not rejected.  (Multiple testing problem: Significance levels are not correct)  Example revisited 8

Intelligence tests revisited: Number of factors Part of output of R function “ factanal ”: Hypothesis can not be rejected; for simplicity, we thus use two factors 9

Scale invariance of factor analysis  Suppose y j = c j x j or in matrix notation y = Cx (C is a diagonal matrix); e.g. change of measurement units C § C T = Cov ( y ) = C (¤¤ T + ª) C T = = ( C ¤)( C ¤) T + C ª C T = = ¤ T + ^ ¤^ ^ = ª I.e., loadings and uniquenesses are the same if expressed in new units  Thus, using cov or cor gives basically the same result  Common practice: - use correlation matrix or - scale input data (This is done in “ factanal ”) 10

Rotational invariance of factor analysis  Rotating the factors yields exactly the same model  Assume 𝑁𝑁 𝑈 and transform 𝑔 ∗ = 𝑁 𝑈 𝑔, Λ ∗ = Λ𝑁  This yields the same model: 𝑦 ∗ = Λ ∗ 𝑔 ∗ + 𝑣 = Λ𝑁 𝑁 𝑈 𝑔 + 𝑣 = Λ𝑔 + 𝑣 = 𝑦 Σ ∗ = Λ ∗ Λ ∗𝑈 + Ψ = Λ𝑁 Λ𝑁 𝑈 + Ψ = ΛΛ 𝑈 + Ψ = Σ  Thus, the rotated model is equivalent for explaining the covariance matrix  Consequence: Use rotation that makes interpretation of loadings easy  Most popular rotation: Varimax rotation Each factor should have few large and many small loadings 11

Intelligence tests revisited: Interpreting factors Part of output of R function “ factanal ”: Spatial reasoning Verbal intelligence Interpretation of factors is generally debatable 12

Estimating factor scores  Scores are assumed to be random variables: Predict values for each person  Two methods: - Bartlett (option “Bartlett” in R): Treat f as fix (ML estimate) - Thompson (option “regression” in R): Treat f as random (Bayesian estimate)  No big difference in practice 13

Case study: Drug use Social drugs Amphetamine Smoking Hard drugs Hashish Inhalants ? Significance vs. Relevance: Might keep less than six factors if fit of correlation matrix is good enough 14

Comparison: PC vs. FA  PCA aims at explaining variances , FA aims at explaining correlations  PCA is exploratory and without assumptions FA is based on statistical model with assumptions  First few PCs will be same regardless of q First few factors of FA depend on q  FA: Orthogonal rotation of factor loadings are equivalent This does not hold in PCA Assume we only keep the PCs in Γ  More mathematically: 1 PCA: 𝑦 = 𝜈 + Γ 1 𝑨 1 + Γ 2 𝑨 2 = 𝜈 + Γ 1 𝑨 1 + 𝑓 FA: 𝑦 = 𝜈 + Λ𝑔 + 𝑣 Cov(u) is diagonal by assumption, Cov(e) is not  ! Both PCA and FA only useful if input data is correlated ! 15

Concepts to know  Form of the general factor model  Representation in terms of covariance matrix  Scale and Rotation invariance, varimax  Interpretation of loadings 16

R functions to know  Function “ factanal ” 17

Exploratory Factor Analysis Applied Multivariate Statistics Spring - PowerPoint PPT Presentation

Exploratory Factor Analysis Applied Multivariate Statistics Spring 2012 Latent-variable models Large number of observed (manifest) variables should be explained by a few un-observed (latent) underlying variables E.g.: Scores on

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Exploratory Factor Analysis: A Practical Guide James H. Steiger Department of Psychology and

Triadic Factor Analysis Cynthia Glodeanu Institute of Algebra, TU Dresden October 19, 2010.

Introduction to Data Science: x (1) x 1 x 2 x ( n ) x i n 1 1 Size: size

1 12 Exploratory Factor Analysis (EFA): Brief Overview with Illustrations Topics 1. Logic of EFA

Exploratory Data Analysis Paul Cohen ISTA 370 Spring, 2012 Paul Cohen ISTA 370 () Exploratory

CME/STATS 195 CME/STATS 195 Lecture 5: Exploratory Data Analysis Lecture 5: Exploratory Data

Exploratory Monitoring at Bing AUTOMATED SYNTHETIC EXPLORATORY MONITORING OF DYNAMIC WEB SITES

VISUALIZATION Jeff Goldsmith, PhD Department of Biostatistics 1 Exploratory data analysis

Exploratory Data Analysis Exploratory Data Analysis for Ecological Modelling and for Ecological

Subgroup Discovery Exploratory Data Analysis Exploratory Data Analysis Classification:

Exploratory Data Analysis Maneesh Agrawala CS 448B: Visualization Fall 2018 1 A2: Exploratory

Factor Analysis Professor Patrick Sturgis Plan Measuring concepts using latent variables

Week 7 Video 5 Factor Analysis Factor Analysis You have a whole lot of variables Can

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Attribute Grammars intermediate syntax semantics representation Language Implementation 2

Our Road to Continuous Delivery @ Tango Amit Mathur Tango Overview Founded in 2009 and

Introduction to Artificial Intelligence ITK 340, Spring 2012 For Thursday Read Russell and

Reinventing Mobility with Artificial Intelligence Pascal Van Hentenryck University of Michigan

An easy problem: two attributes provide most of the information Poisonous: 44 Edible: 46 ODOR is

1 FuzzCon Europe 2020 2018 2019 2020 Started as a meetup FuzzCon Europe 2019 - FuzzCon Europe

Pre-Reading Quiz Pre-Reading Quiz How does depth-first search instantiate these search

Learning Targets Understand varied approaches to identification of gifted learners, with

7. Intelligence Throughout the Life Span 7.1 Concepts of Intelligence and Creativity 7.2