[PPT] - Unit12:RoadMap(VERBAL) PowerPoint Presentation, free download

SLIDE 1

Unit12:RoadMap(VERBAL)

!"#$ %&'( )*'(#+ READING&(&,-, .&'( )/'(#+ 0.&1 RACE& (,2,"3,4&)-,5 6.&1 HOMEWORK7)&(,8$ ,-$ FREELUNCH9!&(,!:(;&"&, ESL!:&::&(,!" ,)!: +57&< +5&7& $:$outliers#< 3+=777linearity normality <

+=777homoskedasticity <

>+5&::&!" 7)&:!< 8+/(7:&&:!!"7)< +*(7:&(!!"7)< +5(:(:&9&&&< +=777independence :2%'2#<

? .) !$%:

SLIDE 2

Unit12:RoadMap(Schematic)

? .) !$%:

SinglePredictor

ChiSquares ChiSquares Regression ANOVA Polychotomous ChiSquares Dichotomous ChiSquares Logistic Regression Polychotomous Regression ANOVA T(tests Regression Continuous Dichotomous Continuous

Outcome MultiplePredictors

ChiSquares ChiSquares Regression ANOVA Polychotomous ChiSquares Dichotomous ChiSquares Logistic Regression Polychotomous Regression ANOVA Multiple Regression Continuous Dichotomous Continuous

Outcome

Units11(14,19,B: Dealingwith Assumption Violations

SLIDE 3

3 ? .) !$%:

Unit12:Roadmap(SPSSOutput)

Unit Unit 9 9 Unit Unit 16 16 Unit Unit 17 17 Unit Unit 18 18 Unit Unit 8 8 Unit Unit 11 11 Unit Unit 12 12 Unit Unit 13 13 Unit Unit 14 14 Unit Unit 19 19

SLIDE 4

? .)

!$%:

Unit12:CheckingGLMAssumptionswithRegressionDiagnostics Unit12PostHole: 6&)"@(:11 ';#:(( &:&&&$ Unit12TechnicalMemoandSchoolBoardMemo: ::& : @#$ Unit12Review: 78$ Unit12Reading: @$633($

SLIDE 5

> ? .) !&$%:

Unit12:TechnicalMemoandSchoolBoardMemo

WorkProducts(PartIofII): I. TechnicalMemo:Haveonesectionperanalysis.Foreachsection, followthisoutline. A. Introduction i. Stateatheory(orperhapshunch)fortherelationship—thinkcausally,becreative.(1Sentence) ii. Statearesearchquestionforeachtheory(orhunch)—thinkcorrelationally,beformal.Nowthatyouknowthestatistical machinerythatjustifiesaninferencefromasampletoapopulation,begineachresearchquestion,“Inthepopulation,…” (1 Sentence)

iii. Listyourvariables,andlabelthem“outcome” and“predictor,” respectively.
iv. Includeyourtheoreticalmodel.

B. Univariate Statistics.Describeyourvariables,usingdescriptivestatistics.Whatdotheyrepresentormeasure? i. Describethedataset.(1Sentence) ii. Describeyourvariables.(1ParagraphEach) a. Definethevariable(parentheticallynotingthemeanands.d.asdescriptivestatistics). b. Interpretthemeanandstandarddeviationinsuchawaythatyouraudiencebeginstoformapictureofthewaythe worldis.Neverlosesightofthesubstantivemeaningofthenumbers. c. Polishofftheinterpretationbydiscussingwhetherthemeanand standarddeviationcanbemisleading,referencing themedian,outliersand/orskewasappropriate. d. Notevaliditythreatsduetomeasurementerror. C. Correlations.Provideanoverviewoftherelationshipsbetweenyourvariablesusingdescriptivestatistics.Focusfirstonthe relationshipbetweenyouroutcomeandquestionpredictor,second(tiedontherelationshipsbetweenyouroutcomeandcontrol predictors,second(tiedontherelationshipsbetweenyourquestionpredictorandcontrolpredictors,andfourthonthe relationship(s)betweenyourcontrolvariables. a. Includeyourownsimple/partialcorrelationmatrixwithawell(writtencaption. b. Interpretyoursimplecorrelationmatrix.Notewhatthesimplecorrelationmatrixforeshadowsforyourpartialcorrelation matrix;“cheat” herebypeekingatyourpartialcorrelationandthinkingbackwards.Sometimes,yoursimplecorrelation

matrixrevealspossibilitiesinyourpartialcorrelationmatrix. Othertimes,yoursimplecorrelationmatrixprovidesforegone

conclusions.Youcanstareatacorrelationmatrixallday,solimityourselftotwoinsights. c. Interpretyourpartialcorrelationmatrixcontrollingforonevariable.Notewhatthepartialcorrelationmatrixforeshadows forapartialcorrelationmatrixthatcontrolsfortwovariables.Limityourselftotwoinsights.

SLIDE 6

8 ? .) !&$%:

Unit12:TechnicalMemoandSchoolBoardMemo

WorkProducts(PartIIofII): I. TechnicalMemo(continued) D. RegressionAnalysis.Answeryourresearchquestionusinginferentialstatistics.Weaveyourstrategyintoacoherentstory. i. Includeyourfittedmodel. ii. UsetheR2 statistictoconveythegoodnessoffitforthemodel(i.e.,strength).

iii. Todeterminestatisticalsignificance,testeachnullhypothesis thatthemagnitudeinthepopulationiszero,reject(ornot)

thenullhypothesis,anddrawaconclusion(ornot)fromthesampletothepopulation.

iv. Create,displayanddiscussatablewithataxonomyoffittedregressionmodels.

v. Usespreadsheetsoftwaretographtherelationship(s),andincludeawell(writtencaption.

vi. Describethedirectionandmagnitudeoftherelationship(s)inyoursample,preferablywithillustrativeexamples.Drawout

thesubstanceofyourfindingsthroughyournarrative.

vii. Useconfidenceintervalstodescribetheprecisionofyourmagnitudeestimatessothatyoucandiscussthemagnitudeinthe

population.

viii. Ifregressiondiagnosticsrevealaproblem,describetheproblem andtheimplicationsforyouranalysisand,ifpossible,

correcttheproblem. i. Primarily,checkyourresidual(versus(fitted(RVF)plot.(GlanceattheresidualhistogramandP(Pplot.) ii. Checkyourresidual(versus(predictorplots.

iii. Checkforinfluentialoutliersusingleverage,residualandinfluencestatistics.
iv. Checkyourmaineffectsassumptionsbycheckingforinteractions beforeyoufinalizeyourmodel.

X. ExploratoryDataAnalysis.Exploreyourdatausingoutlierresistantstatistics. i. Foreachvariable,useacoherentnarrativetoconveytheresultsofyourexploratoryunivariate analysisofthedata.Don’t losesightofthesubstantivemeaningofthenumbers.(1ParagraphEach) ii. Foreachrelationshipbetweenyouroutcomeandpredictor,useacoherentnarrativetoconveytheresultsofyour exploratorybivariate analysisofthedata.(1ParagraphEach) 1. Ifarelationshipisnon(linear,transformtheoutcomeand/orpredictortomakeitlinear. 2. Ifarelationshipisheteroskedastic,considerusingrobuststandarderrors. II. SchoolBoardMemo:Concisely,preciselyandplainlyconveyyourkeyfindingstoalayaudience.Notethat,whereasyouarebuildingonthe technicalmemoformostofthesemester,yourschoolboardmemoisfresheachweek.(Max200Words)

III. MemoMetacognitive

SLIDE 7

? .)

!$%:

NELS88.savCodebook

NationalEducationLongitudinalStudy Source:U.S.DepartmentofEducation Summary:HereareselectvariablesfromtheNELS88dataset. Notes:IcreatedtheFREELUNCHvariablebasedonannualfamilyincomeoflessthan$25,000. convertedtheHOMEWORKvariablefromanordinal/categoricalvariabletoacontinuous variable,whichiswhyitisso“binny.” Iremovedfromthedatasetstudentswhoself(identified asotherthanAsian,Black,Latino,orWhite.Ithencreatedasetofindicatorvariablesfrom RACE:ASIAN,BLACKANDLATIONwithWHITEasan(implicit)referencecategory. Sample:Anationallyrepresentativesampleof7,8008th graders. Variables: READING&(&,-, FREELUNCH&(,!:(;&"&, HOMEWORK7)&(,8$ ,-$ FREELUNCH9!&(,!:(;&"&,

ESL!:&::&(,!" ,)!:

RACE& (,2,"3,4&)-,5 ASIAN, &(,1/2, LATINO, &(,1/", BLACK, &(,1/4&),

SLIDE 8

? .)

!$%:

SelectVariablesfromtheNELSDataSet

SLIDE 9

IntroductiontoRegressionDiagnostics

SearchHI(N(LOfor AssumptionViolations AHeteroskedasticity AIndependence ANormality ALinearity AOutliers

Atleastinsimplelinear regression,diagnostics provideinformationthatwe couldconceivablygleanfrom abivariate scatterplot ofthe

utome versuspredictor;

neverthelesstheycan provideahelpfullydetailed view.Inmultipleregression, however,diagnosticsprovide informationthatwecould nevergatherbyeye.

6$5&: :&$*&($ 57:7< B):(&$#

SLIDE 10

SettingUpOurQuestion(Part1of3)

88 9 . 7 . 10 92 ˆ READING NG I READ + =

Theaveragestudentscored48pointsasan8th grader.Howmanypointsdoyoupredictthat sheimprovedinthe12th grade?

48 * 9 . 7 . 10 54 + =

Fortheaveragestudent,weexpectan increaseof6readingpointsfromthe8th grade tothe12th grade,from48pointsto54points.

NoticethatIuse“increase.” Thelongitudinaldatawarrantthedevelopmentalconclusion!

Recallthatthey(intercept(i.e.,β0 ,e.g.,10.7)isourpredictionwhenxiszero(e.g.,READING88 = 0).SinceREADING88neverequalszerowithintherangeofourdata,they(interceptismerelya mathematicalabstractioninourfittedmodel.But,wecan makezeromoreinteresting...

*79&& :7&8< :CC@# Forthenextslide, remember54,and6 fortheslideafter.

SLIDE 11

SettingUpOurQuestion(Part2of3)

*ExampleSPSSsyntaxforcomputingtransformedvariables. *Thislineartransformationisnotaz(transformationbecauseI didnotdividethedifferencebythestandarddeviation. COMPUTEZEROCENTEREDREADING88=READING88( 48.0155. EXECUTE. *This(goofy)transformationisnon(linearbecauseIdomore thanadd/subtractand/ormultiply/dividebyaconstant.Iuse powersandlogs. COMPUTESean_Is_A_Great_SPSS_Programmer =READING88* 48.0155( 1975/27+FREELUNCH**(1/2)+LN(HOMEWORK+1). EXECUTE. Lookfamiliar?They(interceptnowhasaninterestinginterpretation.Itisourpredictionfortheaveragestudent nowthattheaveragestudenthasanx(valueof0.(Alsonoticethattheslopehas'not changed.)

88 9 . 2 . 54 92 ˆ EDREADING ZEROCENTER NG I READ + =

SLIDE 12

SettingUpOurQuestion(Part3of3)

*Ifweareinterestedinchanges,let’scomputeachange scoreanduseitasouroutcome.Thisisnotalinear transformationbecauseIadd/subtractand/ormultiply/divide byavariable,notaconstant. COMPUTEREADINGIMPROVEMENT=READING92( READING88. EXECUTE.

Lookfamiliar?They(interceptnowhasanotherinterestinginterpretation.Itisourpredictedchangefortheaveragestudent nowthattheaveragestudenthasanx(valueof0andouroutcomevariableisachangevariable.Theslopenowtellsusthe differenceinchangeassociatedwithaIpointdifferencein1988readingscore.Ifwetaketwostudentswhodifferedby10 pointsin1988,weexpectthehigherscoringstudenttohaveimprovedherscoreless,byabout1pointless.

READING92hasmeasurementerror,andREADING88hasmeasurement error,whenItaketheirdifference,theirdifferencehasmore measurementerrorthaneither,sincetheyarepositivelycorrelated!Ugh. Wheneverthereisanelementofrandomnessintheoutcome,weexpect regressiontothemean.Measurementerrorisonepossiblesource of randomness,butnottheonlypossiblesourceofrandomness.Ifwe predictadultheightbymother’sheight,wewillgetregressiontothe mean,eventhoughthereisonlytrivialmeasurementerrorwithheight. Why?Thereisgeneticandenvironmentalluckinvolvedinheight. Tobe extremelytall(orshort)requiresluck,butthereisnoguaranteethatthe luckishereditary.Mostextremelytallmomshavenot(as(talldaughters.* *But,ifvarianceintheoutcomeisgreaterthanvariancein thepredictor,therecanbeegressionfrom themean!

SLIDE 13

3 ? .) !$%:

Unit12:ResearchQuestionI C+&D::7 :)&$/7& &7&D& ::(:&( &:&&$ &0+5&::: &779&( ::< *+( ,8#;C!& ": !"$# '(+ %&E1 6::& # .&E :& # @+ ε β β + + = 88

1

EDREADING ZEROCENTER ROVEMENT READINGIMP

SLIDE 14

? .)

!$%:

Unit12:ResearchQuestionII

C+::79&:: (77 7:# :: 7(: :$;: 7(:((& (&(&&:$ &0+/(7: ::1&9</ )&7&7&< *+( ,8#;C!& ": !"$# '(+ %&E1 6::& # .&E :& # @+

ε β β + + = 88

1

EDREADING ZEROCENTER ROVEMENT READINGIMP

SLIDE 15

>

ExploratoryGraphs

Aresidual(akaerror)isthedifferencebetweenourobserved

utcomeandourpredictedoutcome.Iftheresidualisnegativethat

meansweshouldhavepredictedlower(i.e.,weoverpredicted).If theresidualispositive,weshouldhavepredictedhigher(i.e., we underpredicted).Ofcourse,weexpectresidualsbecauseofindividual variation,hiddenvariables,andmeasurementerror.

Observation:(32 Prediction:3 Observed– Predicted=Residual (32– 3 =(35

SLIDE 16

8

Residuals

Everydatumhasanassociatedresidual, andwecangraphtheresidualswitha histogram:

Whatwouldhappentoourtrendlineifweremovedtheoutlierwitharesidualof(35?Youcanthinkofevery datumaspullingthelinewitharubberband.Whathappenswhenouroutlierletsgoofitsrubberband? Theaverage residualwill bealwaysbe zero.Ifit werenot zero,we wouldneed todrawa bettertrend line.

SLIDE 17

PlayingAroundForAFewMinutes

http://www.istics.net/stat/PutPoints/ ExpandingourViewof TheScatterplot

1. Scatterplot
2. ResidualHistogram
3. ResidualVs.FittedPlot

(RVFPlot)

1 2 3

SLIDE 18

PlayingAroundForAFewMinutes

ExtremeExample1: Thepart/wholeproblemsolvedbydeletedresiduals. ExtremeExample2: Highleverageisnotnecessarilyhighinfluence.

SLIDE 19

PlayingAroundForAFewMinutes

ExtremeExample3: Lowleveragehighresidualsinfluencethey(intercept. ExtremeExample4: Highleveragehighresidualsinfluencetheslope.

SLIDE 20

PlayingAroundForAFewMinutes

ExtremeExample5: RVFPlotsblowupnon(linearhorseshoes. ExtremeExample6: RVFPlotsblowupheteroskedastic funnels.

SLIDE 21

PlayingAroundForAFewMinutes

ExtremeExample7: Residualhistogramsprovideinsightintonormality. ExtremeExample8: Residualhistogramsdon’tshowconditional normality.

SLIDE 22

OutlierDetection:DeletedResiduals(Part1of3)

RegressionWithOutlierRemoved(n=70): OriginalRegression(n=71):

*Identifytheresidual,temporarilyremoveit,andrefit theline. TEMPORARY. SELECTIFNOT(ID=2999973). REGRESSION /MISSINGLISTWISE /STATISTICSCOEFFOUTSCIRANOVA /CRITERIA=PIN(.05)POUT(.10) /NOORIGIN /DEPENDENTREADINGIMPROVEMENT /METHOD=ENTERZEROCENTEREDREADING88. As before, the(raw) residual is(35. The deleted residual is(37.

Noticethattheslopeisnolongerstatsig!

SLIDE 23

3

OutlierDetection:DeletedResiduals(Part2or3)

*Wedonothavecalculatedeletedresidual“byhand,” wecanhavethecomputerdoitautomaticallyforevery case,and,alongtheway,wecanhavethecomputerdo awholebunchofotherthings. REGRESSION /MISSINGLISTWISE /STATISTICSCOEFFOUTSCIRANOVA /CRITERIA=PIN(.05)POUT(.10) /NOORIGIN /DEPENDENTREADINGIMPROVEMENT /METHOD=ENTERZEROCENTEREDREADING88 /SCATTERPLOT=(*RESID,*PRED) /RESIDUALSHIST(RESID)NORM(RESID) /SAVEPREDRESIDDRESIDLEVERCOOK.

Createaresidualvs.fittedplot(i.e.,aresidualvs.predictedplot). Createahistogramofresidualsandanormalprobabilityplot. Createfivenewvariables: A PRE_#:Apredicted/fittedvalueforeachobservation. A RES_#:Aresidualforeachobservation. A DRE_#:Adeletedresidualforeachobservation. A LEV_#:Aleveragestatisticforeachobservation. A COO_#:Aninfluencestatistic(Cook’sD)foreachobs.

*Onceweproduceourvariables,wecanexaminethem. EXAMINEVARIABLES=DRE_1LEV_1COO_1 /COMPAREGROUP /STATISTICSDESCRIPTIVESEXTREME /CINTERVAL95 /MISSINGLISTWISE /NOTOTAL. GRAPH /HISTOGRAM(NORMAL)=DRE_1. GRAPH /HISTOGRAM=LEV_1. GRAPH /HISTOGRAM=COO_1.

SLIDE 24

OutlierDetection:DeletedResiduals(Part3of3)

Adeletedresidualisaresidualbasedonsubtractingthepredictedvaluefromtheobservedvalue,justlikeatypical, rawresidual,exceptthatthepredictedvalueiscalculatedwith theobservationremovedinordertoavoidthe part/wholeprobleminwhichwearelookingforoutliersfromthe trendbuttheoutlierispartofthetrend.

Twoobviousoutliers. Whoarethey?

SLIDE 25

>

OutlierDetection:TheLeverageStatistic

Aleveragestatisticisameasureoftheextremityofanobservationbasedonthevalue(s)ofitspredictor(s).Whenwe haveonepredictor,wecaneasilyseewhoisextremeonthatpredictor,butwhenwehave12predictors,itcanbe impossibletoseewhoisgenerally extremeonall predictors. Somehighleverageobservations. Whoarethey?

SLIDE 26

8

OutlierDetection:TheCook’sDStatistic

Ahighinfluenceobservation. Whoisit?

Aninfluencestatisticcomparesthetrendline(calculatedfromallthedata,includingtheobservation)witha hypotheticaltrendline(calculatedfromallthedataexceptthe observation).Thebiggerthedifferencebetweenthe twotrendlines,thegreatertheinfluence.Cook’sDstatisticistheinfluencestatisticthatwewilluse,butthereare

thers.

SLIDE 27

OutlierDetection:Residuals,Leverage,Influence

Case Number Deleted Residual Leverage Cook’s Distance Result

8 Extreme Minimal Moderate ExtremeinY,NotinX:InfluenceY(Intercept 27 Extreme Extreme Extreme ExtremeinYAndinX:InfluenceSlope 31 Minimal Extreme Minimal NotExtremeinY,ButinX:LittleInfluence 54 Minimal Minimal Minimal NeitherExtremeinYNorinX:LittleInfluence

27 8 31 54

SLIDE 28

Non(LinearityDetection:RVFPlot

Aresidualversusfittedplot(RVFplot),alsoknownasaresidualversuspredictedplot,isjustwhatitsaysitis:a scatterplot ofresidualvaluesversusfitted/predictedvalues. Horseshoeshapesindicatenon(linearity.Iftherewereahorseshoeshapeinouroutcomeversus predictorplot,it wouldbemagnifiedintheresidualversusfittedplot,buteverythinglooksokayhere.InUnit13,we’llseeexamplesof non(linearrelationships(andattendanthorseshoes).Ifyouarewondering“what’sthebigdeal?” waituntilwehave7 predictors.Nomatterhowmanypredictors,wewillstillhaveonlyonepredictedvalueandonlyonefittedvaluefor eachobservation,sowecanstilluseanRVFplotforthemultipleregressionmodel,whereaswewouldneednotatwo dimensionalscatterplot oftheoutcomeversuspredictorsbutan8dimensionalscatterplot! GoodOldOutcomeVs.PredictorScatterplot: ShinyNewResidualVs.FittedScatterplot:

SLIDE 29

Heteroskedasticity Detection:RVFPlot

Aresidualversusfittedplot(RVFplot),alsoknownasaresidualversuspredictedplot,isjustwhatitsaysitis:a scatterplot ofresidualvaluesversusfitted/predictedvalues. Funnelshapesindicateheterskedasticity.Iftherewereafunnelshapeinouroutcomeversuspredictorplot,itwould bemagnifiedintheresidualversusfittedplot,buteverythinglooksokayhere.InUnit14,we’llseeexamplesof heteroskedastic relationships(andattendantfunnels). GoodOldOutcomeVs.PredictorScatterplot: ShinyNewResidualVs.FittedScatterplot:

SLIDE 30

3

Non(NormalityDetection:ResidualHistograms

Ahistogramofresidualscangiveanindicationwhetherornotthe residualsarenormallydistributed;however,usewithcaution, becausehistogramsofresidualsshowanunconditionaldistribution (i.e.,theydon’tthinkvertically).Weareultimatelyconcerned withnormality(andhomoskedasticity)conditionalonX. Nevertheless,suchhistogramscanbeuseful,especiallywhen supplementedwithanRVFplotwhichallowsyoutothinkinterms

fverticalslicesandconsequentlythinkaboutconditional

distributions.

SLIDE 31

3

Non(NormalityDetection:P(PPlots

Aprobability(probabilityplot(P(Pplot)isanotherwayoflookingat aresidualhistogram,withafocusonnormality.Inanormal distributionweexpect50%oftheobservationstobebelow average,and,becauseit’samathematicalconstruct,weobserve 50%oftheobservationstobebelowaverage.Thissimpletruth formsourbaselineofcomparison(theredlinebelow).Inasample distributionfromapopulationwithanormaldistribution,we expect50%oftheobservationstobebelowaverage,butdueto samplingerror(orperhapsduetoanon(normalpopulation distribution,wemayobservemoreorfewerthan50%of

bservationstobebelowaverage.

Here,weobserve 50%ofoursample, butweexpecta smidgemorethan 50%. Expect more. Expect less.

Thetake(homemessageforP(Pplots isthatwewantthedottedlinetolie

ntopofthestraightline,and

wherethedottedlinedeviates,we havenon(normalityinoursample, whichmayindicatenon(normalityin

urpopulation.

SLIDE 32

3 ? .) !$%:

ReflectingonourUnit12ResearchQuestions

Q2: /(7:: :1&9</)& 7&7&< Q1: 5&:::& 779&( ::<

Toanswerthefirstquestion,wecansortourdatabyresidualsandfindthe largestpositiveresiduals: Toansweroursecondquestion,weseefromourRVFplotthatthe relationshipappearslinear(nohorseshoe)and homoskedastic (nofunnel).

SLIDE 33

33

CheckingRegressionAssumptionsWithRegressionDiagnostics

? .) !$%:

&HI(N(LO 7&& $

H&&+';.)$ I& N+';.=:.1..$ L+';)$ O+:&&$

SLIDE 34

3- ? .) !$%:

AnsweringourRoadmapQuestion

+5&7& $:#<

ε β β β β β β β β β β β β β + + + + + + + + + + + + + = LATINO FREELUNCHx BLACK FREELUNCHx ASIAN FREELUNCHx ESLxLATINO ESLxBLACK ESLxASIAN FREELUNCH ESL HOMEWORKP L LATINO BLACK ASIAN READING

12 11 10 9 8 7 6 5 4 3 2 1

1 2

FromtheRVFplot,wedonot appeartohaveaproblemwith meetingthelinearity assumption.However,duetoa ceilingeffect,the homoskedasticity and normalityassumptionsare questionablymet.Otherthan theceilingeffect,the conditionalvariancesappear roughlyequal.Weare concernedthatthehigh(end predictionsarenegatively skewedbecauseoftheceiling effect

';))&$

SLIDE 35

3> ? .) !$%:

LookingatNormality

+5&7& $:#<

FromahistogramofresidualsandP(Pplot,weseeaslightnegativeskewoftheresidualsthatweattributetothe ceilingeffectofourreadingmeasure.

SLIDE 36

38

LookingforOutliers

Therearenooutliers

fconcern,inpart

becausethelarge samplesizeminimizes theinfluenceofany

nedatum.

Becauseofthelarge samplesize,the histogramsbeloware fairlyuseless,soIwill turnthedistributionof Cook’sDstatisticsinto ascatterplot….

SLIDE 37

3

ABetterLookatTheInfluentialOutliers

Whenweplotthe Cook’sDstatistics versusanarbitrary x(variable,wesee about10students thatstandoutfrom thepack.Wewill inspectthose10 studentsmore closelytoseeif thereisafurther pattern.

SLIDE 38

3

LookingForPatternsintheInfluentialOutliers

*HereistheSPSSsyntaxforoutputtingcase(summarytables. *SortthecasesbyCook’sdistance. *Data>SortCases… *SortbytheCook’sD. SORTCASESBYCoo_1(A). *Checkoutthefirsttwentycases, whichhavethehighestCook’sDasperyoursorting. *Analyze>Reports>CaseSummaries… SUMMARIZE /TABLES=IDREADINGHOMEWORKFREELUNCHESLRACE /FORMAT=LISTNOCASENUMTOTALLIMIT=20 /TITLE='CaseSummaries' /MISSING=VARIABLE /CELLS=COUNT.

SLIDE 39

3 ? .) !$%:

Unit12Appendix:KeyConcepts

NoticethatIuse“increase.” Thelongitudinaldatawarrantthedevelopmental conclusion! READING92hasmeasurementerror,andREADING88hasmeasurementerror,whenI taketheirdifference,theirdifferencenecessarilyhasmoremeasurementerrorthan either!Ah,well. Wheneverthereisanelementofrandomnessintheoutcome,weexpectregressionto themean.Measurementerrorisonepossiblesourceofrandomness,butnottheonly possiblesourceofrandomness.Ifwepredictadultheightbymother’sheight,wewill getregressiontothemean,eventhoughthereisonlytrivialmeasurementerrorwith height.

SLIDE 40

? .)

!$%:

Unit12Appendix:KeyInterpretations

Fortheaveragestudent,weexpectanincreaseof6readingpointsfromthe8thgradetothe 12thgrade,from48pointsto54points. FromtheRVFplot,wedonotappeartohaveaproblemwithmeetingthelinearity assumption.However,duetoaceilingeffect,thehomoskedasticity andnormality assumptionsarequestionablymet.Otherthantheceilingeffect, theconditionalvariances appearroughlyequal.Weareconcernedthatthehigh(endpredictionsarenegativelyskewed becauseoftheceilingeffect. FromahistogramofresidualsandP(Pplot,weseeaslightnegativeskewoftheresidualsthat weattributetotheceilingeffectofourreadingmeasure. Therearenooutliersofconcern,inpartbecausethelargesamplesizeminimizesthe influenceofanyonedatum. WhenweplottheCook’sDstatisticsversusanarbitraryx(variable,weseeabout10students thatstandoutfromthepack.Wewillinspectthose10studentsmorecloselytoseeifthereis afurtherpattern.

SLIDE 41

? .)

!$%:

Unit12Appendix:KeyTerminology

Atleastinsimplelinearregression,diagnosticsprovideinformationthatwecouldconceivablygleanfromabivariate scatterplot oftheoutome versuspredictor;neverthelesstheycanprovideahelpfullydetailedview.Inmultiple regression,however,diagnosticsprovideinformationthatwecouldnevergatherbyeye. Aresidual(akaerror)isthedifferencebetweenourobservedoutcomeandourpredictedoutcome.Iftheresidualis negativethatmeansweshouldhavepredictedlower(i.e.,weoverpredicted).Iftheresidualispositive,weshouldhave predictedhigher(i.e.,weunderpredicted).Ofcourse,weexpectresidualsbecauseofindividualvariation,hidden variables,andmeasurementerror. Everydatumhasanassociatedresidual,andwecangraphtheresidualswithahistogram. Adeletedresidualisaresidualbasedonsubtractingthepredictedvaluefromtheobservedvalue,justlikeatypical,raw residual,exceptthatthepredictedvalueiscalculatedwiththe observationremovedinordertoavoidthepart/whole probleminwhichwearelookingforoutliersfromthetrendbuttheoutlierispartofthetrend. Aleveragestatisticisameasureoftheextremityofanobservationbasedonthevalue(s)ofitspredictor(s).Whenwe haveonepredictor,wecaneasilyseewhoisextremeonthatpredictor,butwhenwehave12predictors,itcanbe impossibletoseewhoisgenerally extremeonall predictors. Aninfluencestatisticcomparesthetrendline(calculatedfromallthedata,includingtheobservation)withahypothetical trendline(calculatedfromallthedataexcepttheobservation).Thebiggerthedifferencebetweenthetwotrendlines, thegreattheinfluence.Cook’sDstatisticistheinfluencestatisticthatwewilluse,butthereareothers. Aresidualversusfittedplot(RVFplot),alsoknownasaresidualversuspredictedplot,isjustwhatitsaysitis:a scatterplot ofresidualvaluesversusfitted/predictedvalues. Ahistogramofresidualscangiveanindicationwhetherornottheresidualsarenormallydistributed;however,usewith caution,becausehistogramsofresidualsshowanunconditionaldistribution(i.e.,theydon’tthinkvertically).Weare ultimatelyconcernedwithnormality(andhomoskedasticity)conditionalonX.Nevertheless,suchhistogramscanbe useful,especiallywhensupplementedwithanRVFplotwhichallowsyoutothinkintermsofverticalslicesand consequentlythinkaboutconditionaldistributions. Aprobability(probabilityplot(P(Pplot)isanotherwayoflookingataresidualhistogram,withafocusonnormality.Ina normaldistributionweexpect50%oftheobservationstobebelowaverageand,becauseit’samathematicalconstruct, weobserve50%oftheobservationstobebelowaverage.Thissimpletruthformsourbaselineofcomparison(inred below).Inasampledistributionfromapopulationwithanormal distribution,weexpect50%oftheobservationstobe belowaverage,butduetosamplingerror,wemayobservemoreor fewerthan50%ofobservationstobebelowaverage.

SLIDE 42

ThepopulationvarianceofmeasurementXisdenoted:.

ThepopulationstandarddeviationofmeasurementXisdenoted: . ThepopulationcorrelationbetweenmeasurementsXandYis denoted:.

? .) !$%:

Unit12Appendix:Formulas

2 2 2 2

2 2

X Y XY X Y DD X Y X Y X Y X YY X

σ σ ρ ρ ρ σ σ ρ σ σ ρ σ σ

′ ′ ′

+ − = + −

2 X X σ ′

2

Y X

ρ σ

ThereliabilityofmeasurementXisdenoted:ρXX‘ ,'wheretheGreek letterrho,standsforthepopulationcorrelation,thesubscript X standsforoneformofmeasurementX,andthesubscriptX'standsfor aparallelformofthemeasurement. ReliabilityNotation TheReliabilityofaDifference

Y X

σ σ

SLIDE 43

3

? .) !$%:

Unit12Appendix:Formulas

2 2 2 2

2 2

X Y XY X Y DD X Y X Y X Y X YY X

σ σ ρ ρ ρ σ σ ρ σ σ ρ σ σ

′ ′ ′

+ − = + −

TheReliabilityofaDifference

1 : y Reliabilit Perfect Of Baseline Small Want Big Want Big Want Small Want Big Want Big Want

'

= = − + = = =

′ ′ Y Y X X DD

ρ ρ ρ

AWewantthereliablevarianceinmeasurementXtobebig. AWewantthereliablevarianceinmeasurementYtobebig. AWewantthecorrelationbetweenmeasurementsXandYtobesmall. Ifthecorrelationisnegative,thenthereliabilityofthedifferencecanactuallyexceedthereliabilityoftheindividualtests! AWhathappenswhenmeasurementXandYareperfectlyreliable? AWhathappenswhenmeasurementXandYareperfectlyunreliable? Notethat,ifmeasurementsXandYareperfectlyunreliable,thentheymustbeperfectlyuncorrelatedaswell. AWhathappenswhenmeasurementXandYareperfectlycorrelated? Notethat,ifmeasurementsXandYareperfectlycorrelated(and theyhavethesamestandarddeviation),theneverybodyhasthe sameexact differencescore.

;)&FBG& &D&#D+

XY XY Y Y X X DD

ρ ρ ρ ρ ρ 2 2 2

'

− − + =

′ ′

SLIDE 44

? .)

!$%:

Unit12Appendix:SPSSSyntax

*ExampleSPSSsyntaxforcomputingtransformedvariables. *Thislineartransformationisnotaz(transformationbecauseIdidnotdividethedifferencebythestandard deviation. COMPUTEZEROCENTEREDREADING88=READING88( 48.0155. EXECUTE. *This(goofy)transformationisnon(linearbecauseIdomorethanadd/subtractand/ormultiply/dividebyaconstant.I usepowersandlogs. COMPUTESean_Is_A_Great_SPSS_Programmer =READING88*48.0155( 1975/27+FREELUNCH**(1/2)+ LN(HOMEWORK+1). EXECUTE. *Ifweareinterestedinchanges,let’scomputeachangescoreanduseitasouroutcome.Thisisnotalinear transformationbecauseIadd/subtractand/ormultiply/dividebyavariable,notaconstant. COMPUTEREADINGIMPROVEMENT=READING92( READING88. EXECUTE. *Identifytheresidual,temporarilyremoveit,andrefittheline. TEMPORARY. SELECTIFNOT(ID=2999973). REGRESSION /MISSINGLISTWISE /STATISTICSCOEFFOUTSCIRANOVA /CRITERIA=PIN(.05)POUT(.10) /NOORIGIN /DEPENDENTREADINGIMPROVEMENT /METHOD=ENTERZEROCENTEREDREADING88.

SLIDE 45

>

? .) !$%:

Unit12Appendix:SPSSSyntax

*Wedonothavecalculatedeletedresidual“byhand,” wecanhavethecomputerdoitautomaticallyforeverycase, and,alongtheway,wecanhavethecomputerdoawholebunchof otherthings. REGRESSION /MISSINGLISTWISE /STATISTICSCOEFFOUTSCIRANOVA /CRITERIA=PIN(.05)POUT(.10) /NOORIGIN /DEPENDENTREADINGIMPROVEMENT /METHOD=ENTERZEROCENTEREDREADING88 /SCATTERPLOT=(*RESID,*PRED) /RESIDUALSHIST(RESID)NORM(RESID) /SAVEPREDRESIDDRESIDLEVERCOOK. *Onceweproduceourvariables,wecanexaminethem. EXAMINEVARIABLES=DRE_1LEV_1COO_1 /COMPAREGROUP /STATISTICSDESCRIPTIVESEXTREME /CINTERVAL95 /MISSINGLISTWISE /NOTOTAL. GRAPH /HISTOGRAM(NORMAL)=DRE_1. GRAPH /HISTOGRAM=LEV_1. GRAPH /HISTOGRAM=COO_1.

SLIDE 46

8

OutputYourNewVariablesandNiftyPlots

Startfrom: Analyze> Regression> Linear

SLIDE 47

ExamineYourNewVariables

Lookaroundand checkoutyour

ptions.

SLIDE 48

? .)

!$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

A Source:HGSEthesisbyDr.LindaKilner entitledIntimacyinFemale Adolescent'sRelationshipswithParentsandFriends(1991).Kilner collectedtheratingsusingtheAdolescentIntimacyScale. A Sample:64adolescentgirlsinthesophomore,juniorandseniorclasses

falocalsuburbanpublicschoolsystem.

A Variables:

&@ @H# C@ @HC# @6:7@ @H6# )'(7@ @H'# .&2&7@ @H.# 6&7@ @H6# &4 4H# C4 4HC# @6:74 4H6# )'(74 4H'# .&2&74 4H.# 6&74 4H6#

A Overview:Datasetcontainsself(ratingsoftheintimacythat adolescentgirlsperceivethemselvesashavingwith:(a)their motherand(b)theirboyfriend.

SLIDE 49

? .)

!$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

SLIDE 50

> ? .) !$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

SLIDE 51

> ? .) !$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

SLIDE 52

> ? .) !$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

SLIDE 53

>3 ? .) !$%:

HighSchoolandBeyond(HSB.sav)

A Source:SubsetofdatagraciouslyprovidedbyValerieLee,Universityof Michigan. A Sample:Thissubsamplehas1044studentsin205schools.Missing data

ntheoutcometestscoreandfamilySESwereeliminated.Inaddition,

schoolswithfewerthan3studentsincludedinthissubsetofdatawere excluded. A Variables:

'((E

4&)#,4&),% "#,",% 9#,;,@ 4B!#4! .2#=.2 .#=.2 4BC#4&: 446&#4&& ;!6&#;;71&&

'((I&E

.&@#J=.&: =G#=G .&*#J=.&: 4B!H#2:!= .2H#2:.2= .2H#2:.2= 4BCH#2:&= 446&H#2:(&&= ;!6&H#2:71&&=

A Overview:HighSchool&Beyond– Subsetofdata focusedonselectedstudentandschoolcharacteristics aspredictorsofacademicachievement.

SLIDE 54

>- ? .) !$%:

HighSchoolandBeyond(HSB.sav)

SLIDE 55

>> ? .) !$%:

HighSchoolandBeyond(HSB.sav)

SLIDE 56

>8 ? .) !$%:

HighSchoolandBeyond(HSB.sav)

SLIDE 57

> ? .) !$%:

HighSchoolandBeyond(HSB.sav)

SLIDE 58

> ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

A Source:PerrinE.C.,Sayer A.G.,andWillettJ.B.(1991). SticksAndStonesMayBreakMyBones:ReasoningAboutIllness CausalityAndBodyFunctioningInChildrenWhoHaveAChronicIllness, Pediatrics,88(3),608(19. A Sample:301children,includingasub(sampleof205whowere describedasasthmatic,diabetic,or healthy.Afterfurtherreductions duetothelist0wise'deletion'ofcaseswithmissingdataononeormore variables,theanalyticsub(sampleusedinclassendsupcontaining:33 diabeticchildren,68asthmaticchildrenand93healthychildren. A Variables:

/""62!# 6I:/6 !# 6I! :&7!$# ..'C# 6I&.(.&'&(C 2!# 6I2:/@ !!2# 6I&:C 6&/# ,2&*(&,= 2&# ,2&,= *(&# ,*(&,=

A Overview:Dataforinvestigatingdifferencesinchildren’s understandingofthecausesofillness,bytheirhealth status.

SLIDE 59

> ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

SLIDE 60

8 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

SLIDE 61

8 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

SLIDE 62

8 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

SLIDE 63

83 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

SLIDE 64

8- ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

SLIDE 65

8> ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

SLIDE 66

88 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

SLIDE 67

8 ? .) !$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

A Source:Portes,Alejandro,&RubenG.Rumbaut (2001).'Legacies:'The'Story'of' the'Immigrant'SecondGeneration.BerkeleyCA:UniversityofCaliforniaPress. A Sample:Randomsampleof880participantsobtainedthroughthewebsite. A Variables:

:#

:2&&

;&# J&7:(&: @#

,@,;

*#

*& =:&#

!#

6!& A Overview:“CILSisalongitudinalstudydesignedtostudythe adaptationprocessoftheimmigrantsecondgenerationwhichis definedbroadlyasU.S.(bornchildrenwithatleastoneforeign(born parentorchildrenbornabroadbutbroughtatanearlyagetothe UnitedStates.Theoriginalsurveywasconductedwithlargesamples

fsecond(generationchildrenattendingthe8thand9thgradesin

publicandprivateschoolsinthemetropolitanareasofMiami/Ft. LauderdaleinFloridaandSanDiego,California” (fromthewebsite descriptionofthedataset).

SLIDE 68

8 ? .) !$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

SLIDE 69

8 ? .) !$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

SLIDE 70

? .)

!$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

SLIDE 71

? .)

!$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

SLIDE 72

? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

A Source:Sampson,R.J.,Raudenbush,S.W.,&Earls,F.(1997).Neighborhoods andviolentcrime:Amultilevelstudyofcollectiveefficacy.Science,'277,918( 924. A Sample:Thedatadescribedhereconsistofinformationfrom343Neighborhood ClustersinChicagoIllinois.Someofthevariableswereobtainedbyprojectstaff fromthe1990Censusandcityrecords.Othervariableswereobtainedthrough questionnaireinterviewswith8782Chicagoresidentswhowereinterviewedin theirhomes. A Variables: =# =&&$ @># =&> # 6&: /H6&#/: (# ( .# . 6!# 6&!&& '&# J55'&'& .&'# J5.&'&

A ThesedatawerecollectedaspartoftheProjecton HumanDevelopmentinChicagoNeighborhoodsin1995.

SLIDE 73

3 ? .) !$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 74

? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 75

> ? .) !$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 76

8 ? .) !$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 77

? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 78

? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 79

? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 80

? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 81

? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

SLIDE 82

? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

A Sample:Thesedataconsistofseventhgraderswhoparticipatedin Wave3ofthe4(HStudyofPositiveYouthDevelopmentatTufts University.Thissubfile isasubstantiallysampled(downversionofthe

riginalfile,asallthecaseswithanymissingdataontheseselected

variableswereeliminated. A Variables:

9;# ,;,@ @!# B@I!& # 1 *# * 6# ;/# ;I ./& .# . *# , 1>*# ,B 8K*#

A 4(HStudyofPositiveYouthDevelopment A Source:SubsetofdatafromIARYD,TuftsUniversity

2&6# 1.&2&&6& &6# 1.&&6& .6# 1.&.&6& .2# 1.&.&2& 64# 1.&6&4 5# 15

SLIDE 83

3 ? .) !$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

SLIDE 84

? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

SLIDE 85

> ? .) !$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

SLIDE 86

8 ? .) !$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

SLIDE 87

? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

SLIDE 88

? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

SLIDE 89

? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

SLIDE 90

? .)

!$%: