- Unit19:RoadMap(VERBAL)
!"# $%&'()&'"* %'%+,+
- %&'(.&'"*
/-%0 % '+12+!3+4%(,+5 6-%0 7(%'+8# +,# 9 %'+ :';%!%+ :%::%'+ ! +( : *57%< 2*5%7%#:#"< 3*=777 < ,*=777 ! < >*5%::% ! 7(%: < 8*.'7:%%: !7(< *)'7:%' !7(< *5':':%9%%%< *=777 " ! <
? -( %#$:
Unit19:RoadMap(VERBAL) - - PowerPoint PPT Presentation
Unit19:RoadMap(VERBAL)
!"# $%&'()&'"* %'%+,+
/-%0 % '+12+!3+4%(,+5 6-%0 7(%'+8# +,# 9 %'+ :';%!%+ :%::%'+ ! +( : *57%< 2*5%7%#:#"< 3*=777 < ,*=777 ! < >*5%::% ! 7(%: < 8*.'7:%%: !7(< *)'7:%' !7(< *5':':%9%%%< *=777 " ! <
? -( %#$:
2
? -( %#$:
Continuous Polychotomous Dichotomous Continuous Regression Regression ANOVA Regression ANOVA T'tests Polychotomous Logistic Regression ChiSquares ChiSquares Dichotomous ChiSquares ChiSquares
Continuous Polychotomous Dichotomous Continuous Multiple Regression Regression ANOVA Regression ANOVA Polychotomous Logistic Regression ChiSquares ChiSquares Dichotomous ChiSquares ChiSquares
Units11'14,19: Dealingwith Assumption Violations
3 ? -( %#$:
i ij ij ij ij ij ij ij i i i ij ij ij
1 1 1 1 1 1 1 1 1 1 1
@ Wewilldothisstep'by'steptogetherinclass. @ Wewillrestructurethedatasettogetherstep'by'stepinclass. @ Youwilldummycodethevariablesbyyourselfbutwithasmuchhelpasyouneed. @ Youwillfitthemodelbyyourselfbutwithasmuchhelpasyou need. @ Wewillinterprettheresultstogetherstep'by'stepinclass.
, ? -( %#$:
A*$:''%B %/*$:%: :< )* ! %!:"+>8" &'* $%*".A%%C:A /-%*()*"1%'%:' 7:()*$+" ()*$+" D*
1
Wearegoingtoanswerthisfunkily abstract researchquestionusingthetoolsthatwe knowandlove.Thereisnothingnewinthis section.Whatmakesthisresearchquestion funkyismywithholdingofthemeaningof ()*.Ifyougetconfused,you canreplaceinyourmind()* with.So,insteadofthinkingabout Group1andGroup0,youcanthinkabout femalesandmales.
>
8
Onaverageinthepopulation,FunkyGroup1 tendstoscorehigherthanFunkyGroup0onthe IRTscaledreadingtest,(11854)=22.93," < .001.Basedon95%confidenceintervals,we concludethat,inthepopulation,theaverage scoreforFunkyGroup1( =52.0)isbetween 3.5and4.2pointshigherthantheaveragescore forFunkyGroup0( =48.2).
Notethatthe statisticissimplythe squareofthe statistic(insimple linearregression). Theintercept(aka,constant)isgoingtoplay averyimportantroleinthingstocome. Recallthattheinterceptisthemeanofour referencecategory.
1
They'interceptisrepresentedbyβ0,whichinturnrepresentsthemean
representwhentherearenopredictorsinthemodel?
Whentherearenopredictorsinthemodel,β0 representsthe(unconditional)meanof+ Recallthatintheabsenceoffurtherinformation,themeanisourbestguessforindividuals,butwe recognizethattheguessisinallprobabilitywrongbyacertainamount,sowemakesurethatwe haveanerrorterminourmodel,ε. Variance(i.e.,theaveragesquaredmeandeviation)isameasure ofhowwrongthemeanisasa predictorofindividuals.
Mean StandardDeviation Variance SumofSquared MeanDeviations NotethatSPSSdoesnot allowustofitunconditional OLSregressionmodels,soI madethisoutputbyhand.
2
Mean Conditionalon ()*=0 StandardDeviation*
Variance*ofthe Residuals SumofSquared Residuals(i.e., DeviationsFromthe RegressionLine) Inthismodel,wemake predictionsofouroutcome conditionalonourpredictor, whichis$, forus.
*Basically
3
@ Heteroskedasticity—Wecanjudgebylookingattherightgraphs. @ Independence—Wecannotjudgebylookingatany
graphs.Weneedtounderstandoursampleandourvariable(s).
@ Normality—Wecanjudgebylookingattherightgraphs. @ Linearity—Wecanjudgebylookingattherightgraphs. @ Outliers—Wecanjudgebylookingattherightgraphs.
But,whatisour variable?
,
;(/-%*()*"1%'%:' 7:()*$+" ()*$+" /-%*)"1%'%:77% :7('7( :)$+" 707( :)$+"# 5>8''>2'E%77''E% ''70'"#
Forexample,this kidandthiskid arethesamekid Ourobservationsare clustered(inpairs);thus,
assumptionisviolated.
StudentsNestedinClassrooms ClassroomsNestedInSchools SchoolsNestedinDistricts DistrictsNestedinStates Anew(multilevel)wayof thinking: ScoresNestedinStudents ChildrenNestedinFamilies FamiliesNestedin Neighborhoods NeighborhoodsNestedinCities BabiesNestedinNurseries NurseriesNestedinHospitals Wewilllearntohandletwo levelsatatime:
“Observations” Nestedin“Clusters”
>
Evenwhenahugesamplesize makesstatisticalsignificancea foregoneconclusion,westill wanttherightstandarderrors forourconfidenceintervals.
NotetheStandardErrors NotetheCorrelations
Mistakingorderfor chaosisnowayto goaboutthe businessoftruth.
http://onlinestat book.com/stat_si m/repeated_mea sures/index.html
8
Riddle:Aclassofstudentstakesamidtermexamandafinal exam.Theaveragescoreonthemidtermexamis78,and theaveragescoreonthefinalexamis92.Whatisthe correlationbetweenthetwosetsofexamscores?Canyou sayexactly?Canyouatleastsaythedirection? Answer:Wehavenoclue!Ifyouarelikeme,yourintuition isthatthecorrelation bepositive,butit! be negative.Imagineifallthepeoplewhodidtheworstonthe midtermexamwerejarredintoworkingharder(and smarter),sotheyendedupdoingthebestonthefinalexam.
Inthisdataset(n=7),thereisaperfectnegativecorrelation betweenthemidtermscoresandthefinalscores.Themeans aredifferent( =78and =92),andthestandarddeviations alsohappentobedifferent( =8.6and =2.2).But,the correlationdoesnotcare!Iteachthecorrelationcoefficient astheslopecoefficientfromtheregressionofastandardized
weforcethemeanstobezeroandthestandarddeviationsto beonesothatwecancompareapplestoapples.SeeUnit4 forarefresher.Algebraically,acorrelationistheaverageof theproductsofthez'scores:
= Y i n i X i XY
1
Wesubtract
anddivide awaythe standard deviations.
%#$:
A*:( : :# %/*$:: : :<.'7%<
Becausethisresearchquestionissobasic,wehaveawidechoiceoftools:pairedsamplest'tests, repeatedmeasuresANOVA,andmultilevelregressionmodeling.Wewilltryallthreeinorder fromsimple(andleastflexible)tocomplicated(andmostflexible).
)* ! %!:"+>2" &'* $%*".A%%C:A /-%*Fromthet'testperspectivethereisnorealpredictor,justtwo(paired)
samples.FromtheANOVAperspectivethereisnorealpredictor,justasinglerepeatedmeasures factor,asortoffusionofouroutcomeinformationandwaveinformation.However,fromthe regressionperspective,wegettothinkintermsofoutcomesand predictorsandapplyallour modelbuildingstrategies:
)"1%'%:77%7 (7) +' :%) + 70 :%# :D*
i ij ij ij
1
Noticethe- subscripts andasecondtypeoferror
Thisdatastructureisveryfamiliartous. Rowsrepresentkids.Weseethatthe firstkidinourdatasethas632790for anIDnumberandscores51.15pointson the1988(8th grade,baseline)reading testand70.06pointsonthe1990(10th grade,follow'up)readingtest.Columns representvariables.WehaveanID variabletohelpusidentifykids,andwe havetwotest'scorevariables. Formultilevelregressionmodeling,we willneedtorestructurethisdatasetinto a“person'perioddataset.” But,no worries,becauseSPSSwillbasicallydo theworkforus.Fornow,however, whileweworkthrough'testsand ANOVAs,we’llstayinthisfamiliar territory.
Standarderrorscomeinmanyflavors,butattheircoretheyare justspecial standarddeviations;theyarestandarddeviationsofsamplingdistributions. Thebiggerthesamplesize,thesmallerthestandarddeviationofthe samplingdistribution,soweestimatestandarderrorsbydividingour
Unit6forarefresher.Thereareslighttwistsfordifferenttests,andthe twisthereisthatwetakeintoconsiderationthecorrelation. Takesometimetoworkthroughthis.Hereisaspot wherethealgebracanbeinsightful.Forexample, weknowthatalargesamplesizeisgood.Seehow thesamplygoodnessofthesizeworksintothe equation.
http://onlinestatbook.com/stat_sim/repeated_measures/index.html
Notthatwhenthecorrelationiszero,theentire' 2r(sx)(sy)iszeroedout,andweendupwitharun'
2
T-TEST PAIRS=READING88 WITH READING90 (PAIRED) /CRITERIA=CI(.9500) /MISSING=ANALYSIS.
GotoAnalyze>CompareMeans>Paired'SamplesTTest… SelectyourfirstmeasureandassignittotheVariable1 column,andselectyoursecondmeasureandassignitto theVariable2column(shown). Clickpastwhenyouaredone,andrunyoursyntax.
2
GotoAnalyze>GeneralLinearModel>RepeatedMeasures…
Defineyourrepeatedmeasuresfactor(s):(1) Giveita name.(2) Notethenumberoflevels(i.e.,waves, measures).(3) Addit.(4) Click“Define.” BuildyourANOVAmodel.Thestructureofyourwithin' subjectsvariable(s)isallsetupfromthelastdialogue box,soallyouneedtodoitplugandplay.Click“Paste” whenyouaredone.
(Youmaynotethatthereisroomtoaddgoodoldbetween'subjects factorsandcovariates(i.e.,continuouscontrols).
1 2 3 4
22
GLM READING88 READING90 /WSFACTOR=R88vsR90 2 Simple /METHOD=SSTYPE(3) /CRITERIA=ALPHA(.05) /WSDESIGN=R88vsR90.
Thesyntaxisfairlysimple,andtheoutputshouldbevery simple,butSPSSproducesacraploadofdistractingoutput. Muchofthedistractingoutputhastodowiththesphericity assumption,whichyoucanreadaboutinChapter13ofthe OnlineStatBook.Com.Oftheumpteentables,thisistheonly reallyimportanttable,andstillit’sclutteredwithjunk.It shouldonlybetwolines:
Weconductedaone'waywithin'subjectsANOVAtodeterminewhetherIRTscales readingscoresimprovedfrom8th gradeto10th gradeinthepopulationofU.S. schoolchildrenofthelate’80sandearly’90s.Weobserveastatistically significant value,(1,5927)=2508.04," <.001,partialη2=.28.Acomparison
8th gradereadingtest( =48.15, =8.38)tothe199010th gradereadingtest ( =51.98, =9.75). AsalwayswithANOVA,we needtouseplannedcontrasts, graphicalplots,"$! tests, andotheroptionstogetthe juicydetails. Recallthatthe statisticisthe squareofthe$statistic.Thet statisticfromourpaired samples'testwas'50.08. '50.082=2508.043 InANOVA,thecorrelationgets workedinthroughthemean squares.(And,that’sallwe reallyneedtoknow.)
23
Thislooksverymuchliketheregressionmodelswithwhichwehavebeenworkingallalongthe way.Theonlydifferencesarethatnowwehave- subscriptsandaseconderrorterm.Inthenext fewslides,wewillexaminethe twodifferencesandtheirimplications. i ij ij ij
1
Notethatthesubscriptissueisreallyjustapickydetail,but Iwanttoemphasizeitinordertogetusthinkingabout cluster'observationdatastructure.Inparticular,wewanttothinkaboutstudent'score datastructures(aka,person' perioddatastructures)forourresearchquestion.Forotherresearchquestions,wemaywanttothinkabout mother'childdatastructuresorschool'studentdatastructures. Themagicofmultilevelregressionmodelinghappensinthecomplexerrorterm:wehaveoneerrortermforthe
score'levelerror.Thekeytoparsingtheerrorwillbetheunconditionalmodel:
i ij ij
ij i ij
Or,equivalently:
2,
i ij ij ij
1 Weuse- subscripts todistinguishourobservation'levelvariablesfromourcluster'levelvariables. Observation'levelvariablesgetan- subscript.Cluster'levelvariablesgetsimplyan subscript. Intheproblemathand,wehavescores(i.e.,ourobservations)nestedwithinstudents(i.e.,our clusters).However,thesystemwearegoingtodevelopisflexibleenoughtohandleanytwo'level nestedstructure.Forexample,wemighthavechildren(i.e.,our observations)nestedwithin mothers(i.e.,ourclusters),orwemighthavestudents(i.e.,ourobservations)nestedwithin schools(i.e.,ourclusters). )- representsthevalueofthe) variableforthe-th scoreoftheth student.E.g.,forthe 2nd scoreofthe896th student,) =1.
forthe2nd scoreofthe896th student, =61.
i ij i i i ij ij
4 3 2 1
representsthevalueofthe variablefortheth student.E.g.,forthe896th student, =0.(Notethatsincethisisastudent'levelvariable,thereisnoneedtoattachittoaparticularscore.)
2>
i ij ij i ij
2 1
Thisisastudyinwhichweaskwhethersmartermother’shaveheaviernewborns,controllingforlengthofgestation.We donotwantto ignorethefactthatnewbornsarenestedwithinmothers,because wehavetwinsandothersibsinourstudy.
..- representsthevalueofthe.. variableforthe-th childoftheth mother. E.g.,forthe3rd childofthe57th mother,.. =271. / representsthevalueofthe/ variablefortheth mother.E.g.,forthe57th mother, / =105.
i ij ij i ij ij
1 2 1
ThisisastudyinwhichweaskabouttheBlack/Whitemathachievementgapandwhetheritvariesbytheracialcompositionofschools.
*- representsthevalueofthe* variableforthe-th studentoftheth school.E.g.,forthe 83rd studentofthe5th school,* =1. *. representsthevalueofthe*. variablefortheth SCHOOL.E.g.,forthe5th school,*. =0.75.
28
OldStructure NewStructure
Notethatformostmultileveldata,thecluster'observationdatasetstructureisnatural.Person'perioddatasetsaretheexception.Forexample,inamother'childdata set,everychildwillhaveamotherIDandachildID,orinaschool'studentdataset,everystudentwillhaveaschoolIDandastudentID.
Aperson'perioddatasethasonetimesliceperrow,buttherowsaregroupedbyanidentifyingvariableanddistinguishedwithinthe groupsbyanindexvariable.Noinformationislostwhenconvertingtoperson'perioddatasets.
2
GotoData>Restructure andSPSSwillwalkyou throughallthesteps.
2
Nowthatwehaveoneoutcome,wecanaskaboutthemeanandvarianceofTHEoutcome.
However,weknowthatthereisamultilevel structuretoourdataand,consequently,to
variationinscoresisattributabletothefact thatsomestudentsarebetterreadersthan
thefactthatstudentsimprovedfromthe8th gradetothe10th grade.Inotherwords,we haveperson'levelvariationandperiod'level variation.Instillotherwords,wehave student'levelvariationandscore'level variation(where“score” referstothe differingscoresforeachstudentdepending
NotQuiteRight!
i ij ij
That’sRight!
ε representsthe residualforthe-th scoreoftheth studentoverand above,which representstheresidual forthethstudent.
2
i ij ij
ij i ij
MIXED READINGL /PRINT=SOLUTION /RANDOM INTERCEPT | SUBJECT(ID). CommandSPSStofitanintercept'onlymodel(i.e.,unconditionalmodel) thattakesintoconsiderationthemultilevelstructureofthedata. Wearenowtouchingonthedistinctionbetweenrandomeffectsand fixedeffectsinthegenerallinearmodel.Upuntilnow,wehave only dealtwithfixedeffectsmodels.Now,wearedealingwithamixed model:partfixed,partrandom.But,let’ssaveadeepdiscussionof randomeffectsforanotherday.(Weareindeepenoughalready!)
Specifyyouroutcomevariable. Specifyyourclusteringvariable.
Equivalent,“RandomIntercepts” Model: 86.4=24.7+61.7
3
i ij ij
Theintraclasscorrelation istheproportionoftotalvarianceattributabletothecluster level.Whentheintraclass correlationisextremelyhigh,alltheobservationswithineachclusterarebasicallythesamewithrespecttothe
togetherinnameonlysincenothingistyingtogethertheiroutcomevalues.
2 2 2 2 2
u e u u
ε
Whereasthe'testandANOVAusesthePearsoncorrelation, regressionusestheintraclasscorrelationtoaccountforthenon' independence(i.e.,clustering)ofobservations.
3
Instudiesofstudentsnestedwithinschools,whatistheintraclass correlation? Theanswerisgoingtodependonouroutcome.Readingscores?Emotional disorders?Communityservice?Selfesteem?Locusofcontrol?For giggles, supposethatouroutcomehastodowithschoolclothing,andour datainclude studentsclusteredwithinschools.Belowaretwoschool'clothingstudies,each withitsowndataset.Whichofthetwodatasetswillhavethehigher intraclass correlation?
32
MIXED READINGL WITH WAVE /PRINT=SOLUTION /FIXED=WAVE /RANDOM INTERCEPT | SUBJECT(ID). CommandSPSStofitamodel(i.e.,conditionalmodel)thattakes into considerationthemultilevelstructureofthedata.
Specifyyour predictorvariable(s).
i ij ij ij
1
Interpretyourfittedmultilevelregressionmodelasyouwouldinterpretanyfittedregressionmodel.But, dosowithmoreconfidencebecauseyouhavenotignoredtheindependenceassumption! Youmaynotethatestimateddifferencebetweenwaves0and1and theassociatedstandarderrorandt' valueisidenticaltothosefromthepairedsamplest'test.Wehavecomebackfullcircle.
Wecanuseallour MRmodelingskills (controlling, interacting,and taxonomizing)to buildthissimple,
modelintoafully' fledgedmultiple regressionmodel.
33
Wepresentandinterpretourfinalmodeljustaswewouldany regressionmodel,exceptweincludeourunconditionalmodel (i.e.,intercept'onlymodel)asabaseline.Fromthisbaseline,we cancomparecluster'levelvariancesandobservation'level variances.
2 2 2
u u u
2 2 2
ε ε ε
Thepseudo'R2 statisticisanice(butsometimesflawed)wayto describe thegoodnessoffit.ThetrueR2 statisticintheOLS regressiontowhichweareaccustomeddescribestheproportion
Nowthattherearetwovariancesassociatedwiththeoutcome,we wanttwoR2 statistics,oneforeachtypeofvariation—cluster'level variation(i.e.,between'clustervariation)andobservation'level variation(i.e.,within'clustervariation).However,weareno longerdoingordinaryleastsquaredregression(OLS).Insteadof fittingourmodelbasedontheleastsumofsquares,wearefitting
statisticisnotatrueR2 statisticbutapseudo'R2 statistic.Ina multilevelmodel,thepseudo'R2 statisticispronetobreaking downwhenweincludeonlycluster'levelvariablesoronly
Onaverage,inthepopulation,students improve3.8pointsontheIRTscaledreading testfromthe8th gradetothe10th grade. Basedonapseudo'R2 statisticof0.30,WAVE predicts30%ofthewithin'studentvariation inIRTscalesreadingscores.
3,
Hitherto,wehaveneglectthecrucialbookendstoregressionmodeling,exploratorydataanalysis andassumptionchecking.Wecan(andshould!)useallthetools thatwehavelearnedinthese regards,buttwiceover.Becausewehavetwolevels(thecluster'levelandtheobservation'level), wewanttoexploreeachlevelandchecktheresidualsassociationwitheachlevel.
@ SPLASH,DOLMASandABORTforthecluster' leveldata.Usethemeanobservationforeach cluster. @SPLASH,DOLMASandABORTforthe
butsubtractawaythemeanobservationfrom itsrespectivecluster.
@ ExamineRVFplotsusingresidualsfromthe clusterlevel,. @ExamineRVFplotsusingresidualsfromthe
*Obtaining mean observations for each cluster. *Obtaining observations minus cluster mean. Thisisnotfinished,butfornow,youcanfind theSPSScodeinthisarticle:
http://www.upa.pdx.edu/IOA/newsom/mlrclass/ho_cent ering%20in%20SPSS.pdf
*Obtaining cluster-level residuals. *Obtaining observation-level residuals. This is not finished, but for now, you can find the SPSS code in this article: http://www.cmm.bristol.ac.uk/learning' training/multilevel'm'software/reviewspss.pdf
3>
Question:If'tests,ANOVAsandregressionsyieldidenticalresults,whyeverchoose thecomplexANOVAortheevenmorecomplexregressionoverthesimplet'test0$ Answer:FLEXIBILITY
Once Repeated Measures
Yes Yes Yes
Multiply Repeated Measures
No Yes Yes
Categorical Predictors (with or without interactions)
No Yes Yes
Continuous Predictors (without interactions)
No Yes Yes
Continuous Predictors (with interactions) No No Yes Any Cluster-Observation Data (e.g., students within schools
No No Yes!
Multilevelregressionmodelingisknownbymanynames,including “mixedmodeling,” “nested modeling” and“hierarchicallinearmodeling(HLM).” Unfortunately,“HLM” isnotonlytheacronym forhierarchicallinearmodeling,butitisalsothenameofproprietarysoftware.YoucanuseHLM (theproprietarysoftware)todoHLM,butyoucandoHLMinmost softwarepackages,includingSPSS.
Thereareaninfinitenumber
canspecifyinmultilevel regressionmodeling,andwe touchedonthemostbasic. Considerscoresnestedwithin studentsnestedwithin variousteachersnested withinschools.