Latent Transition Analysis Exercises We are going to use Mplus to - - PDF document

▶

Nov 24, 2023 374 likes •530 views

1 Latent Transition Analysis Exercises We are going to use Mplus to conduct some analyses in a dataset: gus_sdq_trim2.dta.dat This is a fictional dataset based on real data from the Growing Up in Scotland longitudinal study. Mplus operates

SLIDE 1

1 Latent Transition Analysis Exercises We are going to use Mplus to conduct some analyses in a dataset:

gus_sdq_trim2.dta.dat

This is a fictional dataset based on real data from the Growing Up in Scotland longitudinal study. Mplus operates by creating input files: these instruct the software on how to read the file, the type

f variables, the type of analysis , algorithm and estimator to be used, the model to be estimated.

The input file also instructs the software on the results and parameters to be displayed in the

utput file, the creation of attached files with graphics, the creation of other files with derived

parameters. The principal commands in Mplus input files are: TITLE: DATA: VARIABLE: ANALYSIS: MODEL: OUTPUT: PLOT: SAVEDATA: These are all followed by a colon. Within each command, it is possible to provide further command options. These are separated by semicolon at the end of each option. Other common rules in Mplus are that “are/is” or “=” are allowed to specify some options. For example, in specifying the file where the data is one can write DATA: File is dat.dat;

SLIDE 2

2 DATA: File = dat.dat; Hyphen (-) are used to indicate a list of variables or numbers. Items in a list can be separated by space or commas. Exclamation marks (!) are used to write comments in the input which are not read by the software. What follows the ! in the line is skipped by the software. A SHORT INTRODUCTION TO MPLUS COMMANDS TITLE: allows you to provide a title to the file (text is free). DATA: is used to specify where the data are (the file(s) to be used) and their format and other options (if necessary). It is common to create .dat files (or .txt etc.) that contain data in person-level format (one row for each individual, with values for variables in columns). Do not include variable names in the data file columns. It is possible to specify a path to the data (e.g. C:\my files\datac1.dat). An important option in the DATA: command is LISTWISE= ON, which instructs Mplus to conduct analyses with listwise deletion of cases with one or more missing data in the variables of interest. From version 5 of Mplus the default option is to include cases with missing values on some of the variables in the model. VARIABLE: This command is used to provide names to the variables in the dataset and specify what types of variables they are (their scale) etc. Important options are: NAMES = list the names of the variables in the dataset. e.g.: NAMES = a b c d male; USEVARIABLE = list the variables that will be used in the analyses or models. Not necessary if you will be using all the variables in the dataset. Note that if a variable is specified after USEVARIABLE it will be included in the models specified under MODEL: even if not invoked specifically in the MODEL: command. CATEGORICAL, NOMINAL, COUNT = are used to define a list of dependent variables as

rdered categorical (or binary), nominal (unordered), or count variables respectively. Categorical,

nominal and count variables that are not treated as dependent should NOT be listed under the CATEGORICAL etc. command. For example, variable “male” should not be listed under CATEGORICAL since it is not going to be used as a dependent variable in the model. e.g. CATEGORICAL = a, b, c, d;

SLIDE 3

3 MISSING= specified the value (or a character such as . or *) used to identify a missing value for one or more variables. If all the variables use the same missing value indicator (e.g. -999), write: MISSING = all (-999); IDVARIABLE is used to indicate the identifier for each observation or case in the dataset. This is necessary if you want to create data files after the analyses (SAVEDATA: command) and want to use them for further analyses: in this case, the file saved with SAVEDATA will contain IDs for the

bservations. e.g.:

IDVARIABLE = fullid; CLUSTER is used to indicate a clustering variable (e.g. school, group). This is necessary for multilevel analyses or for analysis that adjust for clustering (option: COMPLEX in ANALYSIS: command) CLASSES is used to specify names of latent categorical variables and the number of classes (between parentheses). If we want to estimate a model with a latent variable called mastery with 2 classes we could write: CLASSES = mastery (2); ANALYSIS: Is used to specify the type of analysis and other options in the analyses (e.g. type of estimator). TYPE of analysis invokes a specific type of analysis among a range of options (e.g. EFA = exploratory factor analysis) TYPE=BASIC invokes descriptive statistics on the variables included by USEVARIABLE. In order to run Latent Class Analysis, Latent Transition Analysis, or other mixture models (GMM or LCGA, etc.) one has to invoke TYPE=MIXTURE; The default estimator for this type of analysis is MLR (maximum likelihood with robust standard errors and chi-square). Once can invoke another estimator by writing the name of the estimator after ESTIMATOR =....; In the ANALYSIS one can also change the number of initial stage starts and final stage optimizations

f the EM algorithm by using option STARTS, for example:

STARTS = 100 20; It is also possible to change the number of initial stage iterations in the EM algorithm using option STITERATIONS, for example specifying: STITERATIONS = 20; PROCESSORS can also be used to devote more computer memory resources (processors) to the estimation process (default is PROCESSORS = 1). MODEL: The model command allows to specify a model, constrain parameters, test parameter constraints. ON is short for “regressed on” and defines regression relationships e.g.: y on x; y on male; y on x male;

SLIDE 4

4 Variables within brackets refer to the variable means (if interval variables) or to variable thresholds in the case of categorical variables. The different thresholds of a categorical variable are labelled using $ followed by the threshold number. If variable a has three response categories, it will have 2 thresholds indicated by: [a$1]; [a$2]; The star sign * is used to free a parameter, if followed by a number it uses a user-specified starting

value. For example, to provide starting value of -1 for first threshold of a, one would write:

[a$1*-1]; The at sign @ fixes a parameter at a user-specified value. For example, to fix the second threshold

f a to -2, one would write:

[a$2@-2]; Parentheses are used to name or to constrain a parameter. Names are provided when letters are within parentheses. E.g. to name the thresholds of variable “a” p1 and p2: [a$1] (p1); [a$2] (p2); These parameters can then be constrained using the MODEL CONSTRAINT: option in the MODEL

command. For example, to impose equality constraints on these two thresholds for variable “a”:

MODEL CONSTRAINT: p1 = p2; The same equality constraint can be imposed using numbers between parentheses: if thresholds of variables a and b are constrained equal, write: [a$1] (1); [b$1] (1); OUTPUT: Specifies the information to be reported in the output file. SAVEDATA: Instructs the programme to create a file with parameters estimated. FILE is used to give a name to new file. For example: FILE IS analsis1.dat; SAVE is uses to specify what information can be saved in the file. CPROB will save posterior latent class probabilities and modal class assignment for each individual included in the analyses. If an IDVARIABLE is specified in VARIABLE command, the file will contain ID variable information, and it will be possible to match-merge the file with other files for further analyses. E.g.: FILE is analysis1.dat; SAVE = CPROB;

SLIDE 5

5 Exercise 1 Descriptive statistics It is always better to check the Mplus file to make sure the software is reading it properly and nothing went wrong while creating the file you are going to use for analyses (whether you are creating using STATA, SPSS, etc.). If you use STATA, the process of creating an MPLUS data file and a standard input file is facilitated by the stata2mplus command (see http://www.ats.ucla.edu/stat/stata/faq/stata2mplus.htm) In your files you will find a BASIC.inp file. Open it with Mplus (from the main menu FILE Open: ) The file has already the variable names, and the missing value indicator. (NOTE that Mplus imposes a limit of 8 characters on the variable names). The variables of interest are: sw4emo SDQ Emotional symptom sweep 4 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw4con SDQ Conduct problems sweep 4 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw4hyp SDQ Hyperactivity sweep 4 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw4peer SDQ Peer problems sweep 4 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw4pros SDQ Prosocial behaviour sweep 4 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw5emo SDQ Emotional symptom sweep 5 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw5con SDQ Conduct problems sweep 5 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw5hyp SDQ Hyperactivity sweep 5 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw5peer SDQ Peer problems sweep 5 ( 0 = Normal score; 1=Borderline; 2=Abnormal) sw5pros SDQ Prosocial Behaviour sweep 5 ( 0 = Normal score; 1=Borderline; 2=Abnormal) male gender, 0 = female; 1 = male pregsmok maternal smoking during pregnancy (0 = No; 1 = Yes); ZDePicSAS z scores in the BAS picture similarities (wave 5) ZDeNamVAS z scores in the BAS naming vocabulary (wave 5 ) TASKS:

1. Specify the variables above as the variables to be included in analyses
2. Specify which variables are categorical (at least for descriptive analyses)
3. use Analysis: TYPE = BASIC; to invoke descriptive analyses.
4. How many observations are in the dataset? _______
5. What is the proportion of males and females? _____________
6. What is the proportion of children whose mother reported smoking during pregnancy? ____

SLIDE 6

6

7. What are the means of the Picture Similarity and the Naming vocabulary z scores?

_____________

8. Are there missing data in any of these variables ? ____________

SLIDE 7

7 EXERCISE 2 Latent class analysis We are going to define a latent class variable with 2 classes for the SDQ indicators in wave 4 . Rather than using the 3-categories in the dataset, we want to create a binary variable for each of these indicators (Normal vs. Borderline/Abnormal). To do that without creating a new dataset, we can use the DEFINE: command in Mplus and specify that the cut point for the variable is score 0 : this will divide the variable in two categories (category 1 = score 0; category 2 = score > 0). The command is:

DEFINE: CUT sw4emo-sw4pros (0);

It does not matter where you put this command in Mplus, it can go after the VARIABLE command, after the ANALYSIS, etc. We can call the latent variable x using the option CLASS in the command VARIABLE: CLASSES = x (2); Remember to invoke a mixture model algorithm: you need to specify ANALYSIS: TYPE=mixture; Often data are clustered (e.g. pupils from schools). In this case it is possible to adjust standard errors considering the clustering. To do this, specify the clustering variable using option CLUSTER in VARIABLE command. E.g.: CLUSTER = school; while in the ANALYSIS command, include COMPLEX after TYPE. E.g.: ANALYSIS: TYPE = mixture complex; We are not going to use this function for the exercises, although the data were stratified. Make sure you include in the OUTPUT options : TECH1 (report arrays containing parameter specifications and starting values for all free parameters in the model) TECH10 (reports univariate, bivariate and response pattern model fit information for the categorical dependent variables in the model) TECH11 (reports the Vu-Luong-Mendell-Rubin likelihood ratio of test model fit that compares the estimated model with a model with one less class) TECH14 (reports the parametric bootstrapped likelihood ratio test that compares the estimated model to a model with one less class than the estimated model). TASKS:

1. Using as a basis the previous “BASIC.inp” file, specify a 2-class latent variable model using

the binary SDQ indicators at wave 4.

SLIDE 8

8

2. Check the log-likelihood values of the final stage optimisation: does it appear as a

trustworthy solution?

3. What are the proportions of individuals in the two classes based on the estimated model?
4. How can you interpret the two classes?
5. What does the parametric bootstrapped test indicate?
6. Run a model with 3 classes
7. Check the loglikelihood values of the solution: does it appear as a trustworthy solution?
8. Change the number of starts and final stage interactions and check the loglikelihood values
f the solution
9. Compare the information criteria, and other parameters of the 2- and 3-class solutions:

which appears to provide a better fit?

10. Inspect the graph of estimated probabilities for conditional item responses of item category

2 in the 3-class solution

SLIDE 9

9 EXERCISE 3 Testing measurement invariance across waves We are going to accept that there are two classes of children in wave 4 and wave 5 (normative vs. difficulty) . We can call the latent class in sweep 4 x, and the latent class in sweep 5 y. To do this, in the VARIABLE command we include this statement: CLASSES = x(2) y(2); To ensure that SDQ indicators of wave 4 are regressed on x and SDQ indicators of wave 5 are regressed on y, we have to instruct the software to estimate thresholds (hence: conditional probabilities) of wave 4 indicators in the model part specific to latent variable x, while we will instruct the software to estimate thresholds of wave 5 indicators in the model part specific to latent variable y. The MODEL: command is used to specify models, their restrictions etc. If more than one latent variables are present, the part of the model that applies to both variables comes after the %overall% statement, while the model parts specific to latent variable x and y come after statements: MODEL x: MODEL y: Within each latent variable specific part, %x#1% is used to indicate the part of the model relative to class 1 (#1) of latent variable x; %x#2% indicates the part of the model relative to class 2 of variable x, and so on (if more than 2 classes are estimated). To make sure indicators are regressed on their relative latent variables write:

MODEL:

%overall% MODEL x: %x#1% [sw4emo$1] ; [sw4con$1] ; [sw4hyp$1] ; [sw4peer$1] ; [sw4pros$1] ; %x#2% [sw4emo$1];

SLIDE 10

10 [sw4con$1] ; [sw4hyp$1] ; [sw4peer$1]; [sw4pros$1]; MODEL y: %y#1% [sw5emo$1] ; [sw5con$1] ; [sw5hyp$1] ; [sw5peer$1] ; [sw5pros$1] ; %y#2% [sw5emo$1]; [sw5con$1] ; [sw5hyp$1] ; [sw5peer$1]; [sw5pros$1];

Note that we have made all indicators binary, therefore we need only one threshold. Had the

indicators retained their 3 categories, we would have had to indicate two thresholds within each

class. E.g.:

%x#1% [sw4emo$1]; [sw4emo$2]; .... It is possible to assign user-selected starting values for the thresholds by using . This can help order the latent classes in a specific way, which can be useful when constraints on specific classes of a latent variable are to be imposed. For example, if we want class 1 of latent variable x to be the “normative” class, we assign to the thresholds of indicators in this class a value of 3 (indicating that individuals in this class may have a high probability to be in category 1 of the indicator, which is the normal score range). We can instead assign a threshold below 0 to indicators in class 2 of x (indicating lower probability of being in category 1 of the indicators, the “normal range” category). E.g.: %x#1% [sw4emo$13]; [sw4emo$23]; .... %x#2% [sw4emo$1-1]; [sw4emo$2*-1]; ...

SLIDE 11

11 Before we impose a relationship between the two latent variables we are going to investigate the hypothesis of measurement invariance across the two waves. Full measurement invariance means that we assume the indicators have the same thresholds (same conditional response probabilities) in the respective classes of the latent variables x and y. In other words, the probability of being in the normal range for an indicator (e.g. emotive symptoms) in wave 4 and in wave 5 is the same for the “normative” class of latent variable x and for the “normative” class of latent variable y. The probability of being in the normal range of an indicator is the same for the “difficulty” class of latent variable x and the “difficult” class of latent variable y (obviously, these probabilities differ for normative vs. difficulty classes). Although measurement invariance simplifies estimation (less parameters are estimated) and interpretation (the classes have the same meaning in both waves), it may not be a tenable hypothesis (inspection of results of latent class estimation at each wave should also have given some hint as to whether this looks like a reasonable hypothesis). Remember that equality constraints are imposed by using numbers between parenthesis after the parameter (hence, before the semicolon). The same number after two parameters indicates equality of the two parameters. TASKS:

1. Estimate a model with full measurement invariance across wave 4 and wave 5
2. Estimate a model with full non-invariance
3. Use a Likelihood Ratio Test to test whether full measurement invariance is a tenable

assumption (don’t forget the correction necessary if you are using the MLR estimator, which is the default estimator for TYPE=MIXTURE analyses).

4. Estimate a model with partial measurement invariance: conditional item response

probability is invariant across waves for the “difficulty” class

SLIDE 12

12 Exercise 4 Specify a LTA model with 2 classes and partial measurement invariance Working on the basis of the partial measurement invariance model specified in the previous exercise, we are going to impose a relationship between the latent variables x and y (the structural part of the model). We want to specify regression of y on x. This is achieved by writing y ON x in the %overall% part of the MODEL: command in Mplus. A more precise way of specifying this regression is this:

MODEL:

%overall% [x#1]; [y#1]; y#1 ON x#1;

If x and y had more than two classes (e.g. 3 classes in each latent variable) , the statement could

have been written :

MODEL:

%overall% [x#1]; [x#2]; [y#1]; [y#2]; y#1 ON x#1; y#1 ON x#2; y#2 ON x#1; y#2 ON x#2;

The annotation above maps into the parameters necessary to calculate the conditional transition

probabilities from x to y: [y#1] and [y#2] are intercepts for y and the y#. ON x#. are the logistic regression coefficients.

SLIDE 13

13 TASKS:

1. Using the partial measurement invariance model of the last exercise with two classes in

wave 4 and wave 5, specify a latent transition model.

2. Inspect transition probabilities
3. Constrain the “normative” class to be an absorbing class null probability of transitioning

from “normative” to “difficulty” class

4. Is the constraint of null probability of leaving the normative class tenable based on LRT test?

SLIDE 14

14 EXERCISE 5 Include covariates We will work using the latent transition model with constraints on transition probabilities we have specified in the previous exercise. We will include effects of covariates gender (dummy coded as “male”) and exposure to tobacco during gestation (pregmsok: a binary variable, where 0 indicates no exposure to tobacco during gestation [ or else, that mother did not report smoking tobacco during pregnancy] and 1 indicates exposure to tobacco during gestation). TASKS: 1. Investigate the effects of gender

n latent statuses in wave 4

2. Investigate the effects of gender

n the transitions from wave 4 to wave 5

3. Report transitions matrices for males and females 4. Investigate the effects of exposure to tobacco during gestation while controlling for effects of gender

SLIDE 15

15 EXERCISE 6 Test differences in distal outcomes across latent statuses We are going to use the latent transition model with a constrained transition probability we specified in exercise 4 to test differences in means and proportions of outcomes across the two latent classes at the last measurement point (wave 5). We are not going to include covariates, although they can be easily incorporated. The dependent variables are ZDeNamVAS = z scores in the BAS naming test This variable is in the interval scale: we will test differences in its means between the difficulty and the normative group at wave 5. The other variable is: ZDePicSAS = z scores in the BAS picture test. We will transform this variable into a binary one (scores below 0, vs. above 0) and investigate the proportions of children in the difficulty and normative class in wave 5 that scored above 0. TASKS:

1. Estimate means of ZDeNamVAS for difficulty and normative classes at wave 5
2. Test if these means are significantly different
3. Estimate proportions of children in difficulty and normative groups at wave 5 that scores

1

Latent Transition Analysis Exercises We are going to use Mplus to conduct some analyses in a dataset: