HIERAR HIERARCHICAL CHICAL LINEAR MODELLING LINEAR MODELLING - PowerPoint PPT Presentation

St Steps eps 1. Clarifying the research questions 2. Choosing appropriate parameter estimator 3. Assessing the need for MLM 4. Building the level-1 model 5. Building the level-2 model 6. Multilevel effect size reporting 7. Likelihood ratio model testing

St Steps in R eps in Running HLM Analysis nning HLM Analysis Three models are typically run: 1. Fully unconditional model – No independent variables are specified – Used to determine if there is sufficient variance among groups to justify using HLM (intraclass correlation) 2. Partially conditional model – Predictors are added at level 1 3. Fully conditional model – Level 2 (and 3) predictors are modeled on the intercept and/or slopes to determine their effects on the outcome measure or on relationships between predictors and outcome

Time for some Algebra! ■ You mus u must learn some of the basic mathematical learn some of the basic mathematical no notations used in multile tations used in multilevel l modeling. modeling. – As we will see, the program HLM uses this notations to express the models that you estimate. – Understanding these basic symbols and expressions will allow you to tackle more complex analyses, and understand other researchers’ more complex analyses.

A level-1 model: multiple students in one school (familiar OLS equation) Is student’s Math achievement score Is average achievement within school (intercept) Is average effect of SES on achievement (slope) Is student’s standardized SES (independent variable) Is unique effect for student i (error term) ■ Student is viewed as having average achievement in the school, plus a positive deviation due to SES, plus a positive or negative deviation due to the unique circumstances of the student.

A level-1 model: multiple students in multiple schools Is student’s achievement in school number j Is average achievement within school j Is average effect of SES on achievement for school j Is student’s standardized SES of student i in school j Is unique effect for student i in school j  Now we are estimating the equation from before for each school. Each school can have a different average achievement (or intercept), and a different impact of SES on achievement (or slope).

Need to make some additional assumptions about the coefficients, because they vary  2 r ~ N ( 0 , ) ij     E(β ) , Var( β ) 0j 00 0j 00     E(β ) , Var (β ) 1j 10 1j 11   Cov(β β ) , 0j 1j 01 ■ Student-level errors are normally distributed. ■ Gamma’s: we expect the average achievement for school j to be equal to the average school mean for all j schools, and the slope of SES for school j to equal the average of the slopes for all j schools. ■ Tau’s: these are the variances of the intercepts and slopes, and the covariance between them.

Level-2 model: explaining the Level-1 coefficients ■ Since our intercepts and slopes vary by school, we can now model why they vary. ■ Suppose we hypothesize that levels of achievement and impact of SES are related to whether a school is public or Catholic. ■ We need equations for the intercept and slope to describe our hypothesis:      β W u (intercept ) 0j 00 01 j 0j      β W u (slope coefficien t) 1j 10 11 j 1j β is average achievemen t within school j 0j β is average effect of SES on achievemen t for school j 1j

Level-2 model (continued)

So math achievement of an individual student in school j is explained by … mean achievement in public schools, plus impact of a school being Catholic on mean achievement (if j is Catholic) the effect of SES on achievement, plus the impact of a school being Catholic on how SES affects achievement (again, if j is Catholic) student- and school-specific error terms

Multile Multilevel R el Regression Model gression Model Some examples from multilevel regression modeling: Lowest (individual) level: ■ Y ij = b 0j + b 1j X ij + e ij and at the Second (group) level: ■ b 0j = g 00 + g 01 Z j + u 0j ■ b 1j = g 10 + g 11 Z j + u 1j Combining: ■ Y ij = g 00 + g 10 X ij + g 01 Z j + g 11 Z j X ij + u 1j X ij + u 0j + e ij

Hands-on Session with HLM Software: HLM 7 Student Version

Star Starting HLM ting HLM • Prepare data; • Identify variables to be included in the model; • Develop hypothesized model; and • Install HLM program

HSB DATA Our data file is a subsample from the 1982 High School and Beyond Survey and is used extensively in Hierarchical Linear Models by Raudenbush and Bryk. The data file, called hsb , consists of 7185 students nested in 160 schools. The outcome variable of interest is the student-level (level 1) math achievement score ( mathach ). The variable ses is the socio-economic status of a student and therefore is at the student level. The variable meanses is the group-mean centered version of ses and therefore is at the school level (level 2). The variable sector is an indicator variable indicating if a school is public or catholic and is therefore a school-level variable. There are 90 public schools (sector=0) and 70 catholic schools (sector=1) in the sample.

Exam Example ple ■ Using HSB-data ■ Questions:  Is multilevel modeling needed for mathematics achievement scores?  Is there a relationship between SES and student level Mathematics achievement scores?  Does the effect of SES on Mathematics achievement scores vary significantly across schools?  Is the effect of SES on ACHMATH moderated by the MEANSES and SECTOR?

Inf Inform HLM of the in rm HLM of the input and Mak put and Make MDM f MDM file le

Inform HLM with the data and analysis command

STEPS 7 1 2 3 4 5 6 8 9 10 10

Choose V Choose Variables f riables for Le r Level_1 l_1 Data file : HSB1.sav

Choose V Choose Variables f riables for Le r Level_2 l_2 Data file : HSB2.sav

Specify the Model Specify the Model 1 2

NULL/UNCONDITIONAL NULL/UNCONDITIONAL MODEL MODEL Also kno Also known as Random Ef n as Random Effect Model ct Model

Fully Unconditional Model lly Unconditional Model ■ Fully unconditional model is run with no predictors to determine if a significant portion of the variance in achievement is between schools – indicating HLM should be used to analyze these data.

Purpose of N Purpose of Null Model ll Model ■ It is used as the baseline model to compare the results of more elaborate models, ■ It can estimate the grand mean of mathematics achievement ( γ oo ) with adjustment for clustering of students within schools and for different sample sizes across schools, ■ It can estimate variance components at student ( σ 2 ) and school level ( τ oo ).

Null/U ll/Unconditional Model nconditional Model ■ Null model is used for two purposes: (1) It is the basis for calculating the intra-class correlation coefficient (ICC), which is the usual test of whether multilevel modelling is needed; and (2) It outputs the deviance statistic (-2LL) and other coefficients used as a baseline for comparing later, more complex models.

Null Model ll Model The level-1 model Y ij = β oj + r ij (1) Where Y ij = Mathematics achievement for student i in school j, Β oj = The average mathematics achievement for school j, * r ij = error term representing a unique effect associated with student i in school j . * Assumed to have a normal distribution with a mean of zero and a level-one variance, σ 2

Null Model ll Model The Level-2 Model, β oj = γ oo + u o j (2) Where, γ oo = The intercept represents grand mean or overall average of mathematics achievement, *u o j = The error term represents a unique effect associated with school j . * Assumed to have a normal distribution with a mean of zero and a level-two variance, τ 00

Null Model ll Model ■ Combine the two equations (Mixed-model), Y ij = γ oo + u oj + r ij

Null or Unconditional Model ll or Unconditional Model The level 1 intercept term, expressed as β 0 j in output, is a function of a random intercept term at level 2 ( γ 00 ) and a level 1 residual error term (r ij ). The level 1 intercept, in turn, is a function of the grand mean ( γ 00 ) across level 2 units, which are agencies in this example, plus a random error term (u 0j ), signifying the intercept is modelled as a random effect. Substituting the right-hand side of the level 2 equation into the level 1 equation gives the mixed model equation for the null random intercept model.

Annotat Anno tated R ed Results 1 sults 1 ■ Reliability in HLM ≠ ordinary reliability ■ Reliability for the intercept in HLM indicates to what extent the intercept measures can discriminate among schools in their average achievement. ■ Low reliability does not mean lack of precision.

Annotat Anno tated R ed Results 2 sults 2 ■ The reliability of the random effect of the level 1 intercept is the average reliability of the level 2 units. ■ It measures the overall reliability of the OLS estimates for each of the intercepts. The reliability estimate for this model is .901. ■ This indicates that the sample means is tend to be quite reliable as indicator of the true school mean.

Anno Annotat tated R ed Results 3 sults 3 ■ In Final estimation of fixed effect: The intercept is 12.64 ( SE =.24) and differ from zero [This value indicate the grand mean of Mathematics Achievement] ■ To measure the magnitude of the variation among schools in their mean achievement levels, we can calculate the plausible values range for these means based on the between variance we obtained from the model: 12.64 ± 1.96*(0.24) = (12.17, 13.11).

Anno Annotat tated R ed Results 4 sults 4 ■ The estimated between variance, τ 2 , corresponds to the term intercept1 in the output of final estimation of variance components and the estimated within variance, σ 2 , corresponds to the term level-1 in the same output section. For this model, τ 2 is 8.61 and σ 2 is 39.15.

Annotat Anno tated R ed Results 5 sults 5 Final Final estimation of V estimation of Variance riance At the school level, is the variance of the true school means , around the grand mean, .The estimated variability in these school means is To measure the magnitude of the variation among schools in their mean achievement levels, we can calculate the plausible values range for these means. Under normality assumption, we would expect 95% of the school means to fall within the range: Indicates a substantial range in average achievement level s among schools in the Which yields sample data 12.64 ± 1.96 ( ) 1/2 = (6.89, 18.39)

Anno Annotat tated R ed Results 6 sults 6 ■ Statistically significant between-school variance (variance at school level) indicates that school average mathematics achievement varies significantly across schools.

Ef Effectiv ctive sam e sample size le size ■ A higher ICC value indicates greater dependence among observations within schools – Effective sample size is smaller than observed sample size ■ Effective n= mk / (1 + ICC*(m-1)) – where n=sample size, m= number of students per schools and k= number of schools ■ If ICC=1, effective n is equal to the # of schools (k) ■ If ICC=0, effective n is equal to the observed n (i.e., mk) ■ In general, effective n lies between k and mk

Calculating ICC Calculating ICC ■ Based on the covariance estimates, we can compute the intra-class correlation: 8.61431/(8.61431 + 39.14831) = .18. ■ This tells us the portion of the total variance that occurs between schools.

Calculating the Intra-class Correlation Calculating the Intra-class Correlation coef coefficient (ICC) ficient (ICC)

ADDING V ADDING VARIABLE A RIABLE AT THE S THE STUDENT LEVEL UDENT LEVEL Random Coef Random Coefficient Model ficient Model

Adding Predict ding Predictors at Le at Level_1 l_1

No Notes on the R s on the Results 1 sults 1 ■ The model we fit was mathach ij = β 0j + β 1j (SES - meanses) + r ij β 0j = γ 00 + u 0j β 1j = γ 10 + u 1j ■ Filling in the parameter estimates we get mathach ij = β 0j + β 1j (SES - meanses) + r ij β 0j = 12.64 + u 0j β 1j = 2.19 + u 1j V(u 0j ) = 8.68 V(u 1j ) = .68 V(r ij ) = 36.7 ■ In a single equation our model would be written as: mathach ij = γ 00 + u 0j + ( γ 10 + u 1j )(SES - meanses) + r ij = γ 00 + γ 10 *(SES - meanses) + u 0j + u 1j *(SES – meanses) + r ij

Notes on the R No s on the Results 2 sults 2 ■ The estimate for the variance of the slope for group- centered SES is 0.68. The p-value is .003. Because the test is statistically significant, we reject the hypothesis that there is no difference in slopes among schools. ■ The 95% plausible value range for the school means and school-specific SES achievement slope is 12.64 ± 1.96 *(8.68) 1/2 = (6.87, 18.41). ■ The 95% plausible value range for the SES -achievement slope is 2.19 ± 1.96 *(.68) 1/2 = (.57, 3.81).

No Notes on the R s on the Results 3 sults 3 ■ The coefficient for the constant is the predicted math achievement when all predictors are 0; hence, when the average school SES is 0, the students' math achievement is predicted to be 12.65

No Notes on the R s on the Results 4 sults 4 ■ Notice that the residual variance is now 36.70, compared to the residual variance of 39.15 in the one- way ANOVA with random effects (unconditional means) model. ■ We can compute the proportion variance explained at level 1 as (39.15 - 36.70) / 39.15 = .063. This suggests using student-level SES as a predictor of math achievement reduced the within-school variance by 6.3%. ■ The correlation between the intercept and the slope is .019. It seems that they are not highly correlated.

Calculating Pr Calculating Propor oportion of V tion of Variance riance Explained Explained

THE INTER THE INTERCEPT EPT AND SL AND SLOPE OPE AS THE OUT AS THE OUTCOME OME MODEL MODEL Final Model Final Model

This model is referred to as an intercepts- and slopes-as-outcomes model

Resear search Questions ch Questions ■ Do MEANSES and SECTOR significantly predict the intercept? ■ Do MEANSES and SECTOR significantly predict the within school slope? ■ How much variation in the intercepts and slopes is explained using SECTOR and MEANSES as predictors?

Int Interpre rpreting the Final R ting the Final Results sults For intercept: ■ The MEANSES is positively related to school mean math achievement. ■ Catholic schools have higher mean achievement than do public schools, after controlling the effect of MEANSES.

Interpre Int rpreting the Final R ting the Final Results sults For slope ■ School with higher MEANSES have a larger slope than low MEANSES. ■ Catholic schools have significantly weaker SES slopes on average than do public schools.

Repor porting in T ing in Table ble

Final Model Final Model ■ The estimate for the variance of the SES slope is .15 with p-value .369; hence, we fail to reject the null hypothesis that there is no significant variation among the slopes of MEANSES remain unexplained after controlling the MEANSES and SECTOR effects. ■ The correlation between the level-1 intercept and the slope for SES is given as .32 from the earlier part of the output. ■ There is variation remain unexplained even after controlling the MEANSES and SECTOR effects.

Assessing Model Fit Assessing Model Fit ■ Using Deviance Statistics ■ Using Proportion of variance explained ■ Using other indicators AIC and BIC

Estimation Specification Estimation Specif ication ■ REML (restricted maximum likelihood) versus FML (full maximum likelihood) – REML and FML will usually produce similar results for the level-1 residual ( σ 2 ), but there can be noticeable differences for the variance-covariance matrix of the random effects. – REML is the default estimation method in HLM. – If the number of level-2 units is large, then the difference will be small. – If the number of level-2 units is small, then FML variance estimates will be smaller than REML, leading to artificially short confidence interval and problematic significant tests. ■ Nested models – If the fixed effects are the same, and there are fewer random effects in the reduced model, then both REML or FML are fine. – If one model has fewer fixed effects and possibly fewer random effects, then use FML to compare models.

SOME PRA SOME PRACTICAL TICAL ASPECTS OF ASPECTS OF MUL MULTILEVEL ILEVEL MODELING MODELING

Questions t Questions to Answ Answer er ■ Can you use multilevel techniques to study your dependent variable? ■ Should you use multilevel techniques to study your dependent variable? ■ How will you center your level-1 and level-2 predictors? ■ Which of the level-1 coefficients will be explained at level-2? I.e., are they fixed or random? ■ How does my model perform?

Can I use HLM? Can I use HLM? ■ HLM requires a large amount of data . ■ Minimum:  number of groups: 30, but most recommend 50+  number of individuals within groups: 5-10, but can have low as 1.  average group size: 10, obviously more is better .

Should I Use HLM? Should I Use HLM? ■ How much of the variance in your dependent variable is explained by group membership? ■ Intra-class correlation coefficient (ICC) = var between groups (var between groups + var within groups)       2 /( ) 00 00  Remember, is the variance of the intercepts , or the 00  2 school means, and is the student - level variance ■

Cent Centering v ering variables riables ■ Whether and how you center is a very important decision: interpretation of results depends on your choice. ■ Important because the intercept at level-1 is also a dependent variable. ■ Centering – Refers to subtracting a mean from your independent variables. – The transformed value for an individual measures how much they deviate (+/-) from the mean.

Cent Centering v ering variables riables ■ Suppose we center verbal SAT scores around Actual Centered a student mean of 500. score score Steve 800 300 Claire 750 250 ■ How would we interpret a regression coefficient if Bill 500 0 all variables were Paul 200 -300 similarly transformed? 91

Cent Centering v ering variables riables ■ Why would we want to center? – Variable may lack a natural zero point, such as SAT score. – Stability of estimates at level-1 affected by location of variables. – Location at level-2 is less important. • Centering in multilevel models presents a unique challenge because different centering choices have a significant impact on how the parameter estimate is interpreted.

Cent Centering v ering variables riables ■ Generally two types of centering are used in HLM for a specific variable: – Grand mean centering – subtract the mean for the entire sample from each observation in the sample. – Group mean centering – subtract the mean for each group from each member of the group. ■ To fully understand the implications of centering, see the discussion in Bryk and Raudenbush (2002) pp. 134-149.

Grand Mean Centering Grand Mean Cent ering ■ Grand mean centering is scoring variables as deviations from their sample means. ■ An example would be scoring occupational status as a deviation from the mean occupational status in the entire sample—scoring how high or low people are relative to the average. ■ In multivariate analyses, predictor variables that are grand-mean centered generate mathematically identical predicted values to those from the same model estimated on the original, conventionally scored variables.

Grand Mean Centering Grand Mean Cent ering ■ Some writers still claim that grand mean centering reduces multicollinearity, particularly when the regression includes many interactions, and most especially when these are cross-level interactions, (Bickel 2007; Preacher 2003). ■ Another advantage of grand mean centering is that it allows one to interpret the intercept as the predicted mean on the dependent variable when all the predictors are set to zero (Paccagnella 2006).

Grand Mean Cent Grand Mean Centering ering ■ It is also sometimes said that grand-mean centering facilitates regression coefficient interpretation, particularly for cross-level interactions when a variable is continuous (Bickel 2007; Kenny et al. 1998; Hox 2010). ■ Hox (2010) reports that convergence tends to be achieved more frequently and analyses run faster using grand-mean centering.

Grand Mean Cent Grand Mean Centering ering

Gr Group Mean Cent oup Mean Centering ering ■ Group mean centering refers to scoring variables in multi-level models as deviations from the mean of their macro-level group. ■ An example would be scoring occupational status as a deviation from the mean status in respondent’s own country. ■ In group-mean centering, Nigerian clerks would be high (because they are high compared to the average Nigerian) while Swiss clerks would be low (because they are low compared to the average Swiss).

Group Mean Cent Gr oup Mean Centering ering ■ Bryk (2002) also posit that group-mean centering can reduce bias in random component variance estimates. ■ Paccagnella (2006) alleges the benefit of group-mean centering for researchers interested in ‘‘separating the between-group and the within-group components from the total variation to investigate how groups (contexts) affect the student performances, explicitly accounting for the group structure into the model’’ ■ Group-mean centering (also known as within-group deviation scoring) is widely used in many disciplines, and widely recommended.

Gr Group Mean Cent oup Mean Centering ering

HIERAR HIERARCHICAL CHICAL LINEAR MODELLING LINEAR MODELLING - PowerPoint PPT Presentation

HIERAR HIERARCHICAL CHICAL LINEAR MODELLING LINEAR MODELLING Expectation Expectation After completing the workshop you will be able to: understand the data structure for multilevel data analysis; develop the appropriate models to

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Hierarchical Multidimensional Modelling Hierarchical Multidimensional Modelling in the Concept-

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval wi with th Hier Hierar

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval wi with th Hier Hierar

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

The Modelling and Simulation Process 1. History of Modelling and Simulation 2. Modelling and

(Modelling) Semantics of Modelling Languages Hans Vangheluwe 7 September 2010, Lisboa, Portugal

Modelling with Differential Equations Modelling with Differential Equations Modelling with

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Centre for

Hierarchical linear models Dr. Jarad Niemi STAT 544 - Iowa State University April 30, 2019

Estimating Distributional Parameters in Hierarchical Models Introduction: Variability in

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical

Elementary Estimators for High-Dimensional Linear Regression Eunho Yang EUNHO @ CS . UTEXAS . EDU

Non-linear Difference-in-Differences Models for Policy and Program Evaluation Claude M. Setodji,

Coupling Index and Stocks Mohamed Sbai Joint work with Benjamin Jourdain Universit e

Preliminary verifjcation of ensemble precipitation forecast over South America Cristina T

Coverage Adjustment Methodology Census Division General Register Office for Scotland Coverage

Sunthud Pornprasertmanit W. Joel Schneider Sample Size Estimation Approach Power

Project Cost Task Force Knowns and Unknowns Overview May 25, 2011 PCTF Known and Unknown

Louisiana Impact Estimate of Federal Health Care Reform 2010 Louisiana Department of Health and