Unit12:RoadMap(VERBAL) - - PowerPoint PPT Presentation

unit 12 road map verbal
SMART_READER_LITE
LIVE PREVIEW

Unit12:RoadMap(VERBAL) - - PowerPoint PPT Presentation

Unit12:RoadMap(VERBAL)


slide-1
SLIDE 1
  • Unit12:RoadMap(VERBAL)

!"#$ %&'( )*'(#+ READING&(&,-, .&'( )/'(#+ 0.&1 RACE& (,2,"3,4&)-,5 6.&1 HOMEWORK7)&(,8$ ,-$ FREELUNCH9!&(,!:(;&"&, ESL!:&::&(,!" ,)!: +57&< +5&7& $:$outliers#< 3+=777linearity normality <

  • +=777homoskedasticity <

>+5&::&!" 7)&:!< 8+/(7:&&:!!"7)< +*(7:&(!!"7)< +5(:(:&9&&&< +=777independence :2%'2#<

? .) !$%:

slide-2
SLIDE 2
  • Unit12:RoadMap(Schematic)

? .) !$%:

SinglePredictor

ChiSquares ChiSquares Regression ANOVA Polychotomous ChiSquares Dichotomous ChiSquares Logistic Regression Polychotomous Regression ANOVA T(tests Regression Continuous Dichotomous Continuous

Outcome MultiplePredictors

ChiSquares ChiSquares Regression ANOVA Polychotomous ChiSquares Dichotomous ChiSquares Logistic Regression Polychotomous Regression ANOVA Multiple Regression Continuous Dichotomous Continuous

Outcome

Units11(14,19,B: Dealingwith Assumption Violations

slide-3
SLIDE 3

3 ? .) !$%:

Unit12:Roadmap(SPSSOutput)

Unit Unit 9 9 Unit Unit 16 16 Unit Unit 17 17 Unit Unit 18 18 Unit Unit 8 8 Unit Unit 11 11 Unit Unit 12 12 Unit Unit 13 13 Unit Unit 14 14 Unit Unit 19 19

slide-4
SLIDE 4
  • ? .)

!$%:

Unit12:CheckingGLMAssumptionswithRegressionDiagnostics Unit12PostHole: 6&)"@(:11 ';#:(( &:&&&$ Unit12TechnicalMemoandSchoolBoardMemo: ::& : @#$ Unit12Review: 78$ Unit12Reading: @$633($

slide-5
SLIDE 5

> ? .) !&$%:

Unit12:TechnicalMemoandSchoolBoardMemo

WorkProducts(PartIofII): I. TechnicalMemo:Haveonesectionperanalysis.Foreachsection, followthisoutline. A. Introduction i. Stateatheory(orperhapshunch)fortherelationship—thinkcausally,becreative.(1Sentence) ii. Statearesearchquestionforeachtheory(orhunch)—thinkcorrelationally,beformal.Nowthatyouknowthestatistical machinerythatjustifiesaninferencefromasampletoapopulation,begineachresearchquestion,“Inthepopulation,…” (1 Sentence)

  • iii. Listyourvariables,andlabelthem“outcome” and“predictor,” respectively.
  • iv. Includeyourtheoreticalmodel.

B. Univariate Statistics.Describeyourvariables,usingdescriptivestatistics.Whatdotheyrepresentormeasure? i. Describethedataset.(1Sentence) ii. Describeyourvariables.(1ParagraphEach) a. Definethevariable(parentheticallynotingthemeanands.d.asdescriptivestatistics). b. Interpretthemeanandstandarddeviationinsuchawaythatyouraudiencebeginstoformapictureofthewaythe worldis.Neverlosesightofthesubstantivemeaningofthenumbers. c. Polishofftheinterpretationbydiscussingwhetherthemeanand standarddeviationcanbemisleading,referencing themedian,outliersand/orskewasappropriate. d. Notevaliditythreatsduetomeasurementerror. C. Correlations.Provideanoverviewoftherelationshipsbetweenyourvariablesusingdescriptivestatistics.Focusfirstonthe relationshipbetweenyouroutcomeandquestionpredictor,second(tiedontherelationshipsbetweenyouroutcomeandcontrol predictors,second(tiedontherelationshipsbetweenyourquestionpredictorandcontrolpredictors,andfourthonthe relationship(s)betweenyourcontrolvariables. a. Includeyourownsimple/partialcorrelationmatrixwithawell(writtencaption. b. Interpretyoursimplecorrelationmatrix.Notewhatthesimplecorrelationmatrixforeshadowsforyourpartialcorrelation matrix;“cheat” herebypeekingatyourpartialcorrelationandthinkingbackwards.Sometimes,yoursimplecorrelation

  • matrixrevealspossibilitiesinyourpartialcorrelationmatrix. Othertimes,yoursimplecorrelationmatrixprovidesforegone

conclusions.Youcanstareatacorrelationmatrixallday,solimityourselftotwoinsights. c. Interpretyourpartialcorrelationmatrixcontrollingforonevariable.Notewhatthepartialcorrelationmatrixforeshadows forapartialcorrelationmatrixthatcontrolsfortwovariables.Limityourselftotwoinsights.

slide-6
SLIDE 6

8 ? .) !&$%:

Unit12:TechnicalMemoandSchoolBoardMemo

WorkProducts(PartIIofII): I. TechnicalMemo(continued) D. RegressionAnalysis.Answeryourresearchquestionusinginferentialstatistics.Weaveyourstrategyintoacoherentstory. i. Includeyourfittedmodel. ii. UsetheR2 statistictoconveythegoodnessoffitforthemodel(i.e.,strength).

  • iii. Todeterminestatisticalsignificance,testeachnullhypothesis thatthemagnitudeinthepopulationiszero,reject(ornot)

thenullhypothesis,anddrawaconclusion(ornot)fromthesampletothepopulation.

  • iv. Create,displayanddiscussatablewithataxonomyoffittedregressionmodels.

v. Usespreadsheetsoftwaretographtherelationship(s),andincludeawell(writtencaption.

  • vi. Describethedirectionandmagnitudeoftherelationship(s)inyoursample,preferablywithillustrativeexamples.Drawout

thesubstanceofyourfindingsthroughyournarrative.

  • vii. Useconfidenceintervalstodescribetheprecisionofyourmagnitudeestimatessothatyoucandiscussthemagnitudeinthe

population.

  • viii. Ifregressiondiagnosticsrevealaproblem,describetheproblem andtheimplicationsforyouranalysisand,ifpossible,

correcttheproblem. i. Primarily,checkyourresidual(versus(fitted(RVF)plot.(GlanceattheresidualhistogramandP(Pplot.) ii. Checkyourresidual(versus(predictorplots.

  • iii. Checkforinfluentialoutliersusingleverage,residualandinfluencestatistics.
  • iv. Checkyourmaineffectsassumptionsbycheckingforinteractions beforeyoufinalizeyourmodel.

X. ExploratoryDataAnalysis.Exploreyourdatausingoutlierresistantstatistics. i. Foreachvariable,useacoherentnarrativetoconveytheresultsofyourexploratoryunivariate analysisofthedata.Don’t losesightofthesubstantivemeaningofthenumbers.(1ParagraphEach) ii. Foreachrelationshipbetweenyouroutcomeandpredictor,useacoherentnarrativetoconveytheresultsofyour exploratorybivariate analysisofthedata.(1ParagraphEach) 1. Ifarelationshipisnon(linear,transformtheoutcomeand/orpredictortomakeitlinear. 2. Ifarelationshipisheteroskedastic,considerusingrobuststandarderrors. II. SchoolBoardMemo:Concisely,preciselyandplainlyconveyyourkeyfindingstoalayaudience.Notethat,whereasyouarebuildingonthe technicalmemoformostofthesemester,yourschoolboardmemoisfresheachweek.(Max200Words)

  • III. MemoMetacognitive
slide-7
SLIDE 7
  • ? .)

!$%:

NELS88.savCodebook

NationalEducationLongitudinalStudy Source:U.S.DepartmentofEducation Summary:HereareselectvariablesfromtheNELS88dataset. Notes:IcreatedtheFREELUNCHvariablebasedonannualfamilyincomeoflessthan$25,000. convertedtheHOMEWORKvariablefromanordinal/categoricalvariabletoacontinuous variable,whichiswhyitisso“binny.” Iremovedfromthedatasetstudentswhoself(identified asotherthanAsian,Black,Latino,orWhite.Ithencreatedasetofindicatorvariablesfrom RACE:ASIAN,BLACKANDLATIONwithWHITEasan(implicit)referencecategory. Sample:Anationallyrepresentativesampleof7,8008th graders. Variables: READING&(&,-, FREELUNCH&(,!:(;&"&, HOMEWORK7)&(,8$ ,-$ FREELUNCH9!&(,!:(;&"&,

  • ESL!:&::&(,!" ,)!:

RACE& (,2,"3,4&)-,5 ASIAN, &(,1/2, LATINO, &(,1/", BLACK, &(,1/4&),

slide-8
SLIDE 8
  • ? .)

!$%:

SelectVariablesfromtheNELSDataSet

slide-9
SLIDE 9
  • IntroductiontoRegressionDiagnostics

SearchHI(N(LOfor AssumptionViolations AHeteroskedasticity AIndependence ANormality ALinearity AOutliers

Atleastinsimplelinear regression,diagnostics provideinformationthatwe couldconceivablygleanfrom abivariate scatterplot ofthe

  • utome versuspredictor;

neverthelesstheycan provideahelpfullydetailed view.Inmultipleregression, however,diagnosticsprovide informationthatwecould nevergatherbyeye.

6$5&: :&$*&($ 57:7< B):(&$#

slide-10
SLIDE 10
  • SettingUpOurQuestion(Part1of3)

88 9 . 7 . 10 92 ˆ READING NG I READ + =

Theaveragestudentscored48pointsasan8th grader.Howmanypointsdoyoupredictthat sheimprovedinthe12th grade?

48 * 9 . 7 . 10 54 + =

Fortheaveragestudent,weexpectan increaseof6readingpointsfromthe8th grade tothe12th grade,from48pointsto54points.

NoticethatIuse“increase.” Thelongitudinaldatawarrantthedevelopmentalconclusion!

Recallthatthey(intercept(i.e.,β0 ,e.g.,10.7)isourpredictionwhenxiszero(e.g.,READING88 = 0).SinceREADING88neverequalszerowithintherangeofourdata,they(interceptismerelya mathematicalabstractioninourfittedmodel.But,wecan makezeromoreinteresting...

*79&& :7&8< :CC@# Forthenextslide, remember54,and6 fortheslideafter.

slide-11
SLIDE 11
  • SettingUpOurQuestion(Part2of3)

*ExampleSPSSsyntaxforcomputingtransformedvariables. *Thislineartransformationisnotaz(transformationbecauseI didnotdividethedifferencebythestandarddeviation. COMPUTEZEROCENTEREDREADING88=READING88( 48.0155. EXECUTE. *This(goofy)transformationisnon(linearbecauseIdomore thanadd/subtractand/ormultiply/dividebyaconstant.Iuse powersandlogs. COMPUTESean_Is_A_Great_SPSS_Programmer =READING88* 48.0155( 1975/27+FREELUNCH**(1/2)+LN(HOMEWORK+1). EXECUTE. Lookfamiliar?They(interceptnowhasaninterestinginterpretation.Itisourpredictionfortheaveragestudent nowthattheaveragestudenthasanx(valueof0.(Alsonoticethattheslopehas'not changed.)

88 9 . 2 . 54 92 ˆ EDREADING ZEROCENTER NG I READ + =

slide-12
SLIDE 12
  • SettingUpOurQuestion(Part3of3)

*Ifweareinterestedinchanges,let’scomputeachange scoreanduseitasouroutcome.Thisisnotalinear transformationbecauseIadd/subtractand/ormultiply/divide byavariable,notaconstant. COMPUTEREADINGIMPROVEMENT=READING92( READING88. EXECUTE.

Lookfamiliar?They(interceptnowhasanotherinterestinginterpretation.Itisourpredictedchangefortheaveragestudent nowthattheaveragestudenthasanx(valueof0andouroutcomevariableisachangevariable.Theslopenowtellsusthe differenceinchangeassociatedwithaIpointdifferencein1988readingscore.Ifwetaketwostudentswhodifferedby10 pointsin1988,weexpectthehigherscoringstudenttohaveimprovedherscoreless,byabout1pointless.

READING92hasmeasurementerror,andREADING88hasmeasurement error,whenItaketheirdifference,theirdifferencehasmore measurementerrorthaneither,sincetheyarepositivelycorrelated!Ugh. Wheneverthereisanelementofrandomnessintheoutcome,weexpect regressiontothemean.Measurementerrorisonepossiblesource of randomness,butnottheonlypossiblesourceofrandomness.Ifwe predictadultheightbymother’sheight,wewillgetregressiontothe mean,eventhoughthereisonlytrivialmeasurementerrorwithheight. Why?Thereisgeneticandenvironmentalluckinvolvedinheight. Tobe extremelytall(orshort)requiresluck,butthereisnoguaranteethatthe luckishereditary.Mostextremelytallmomshavenot(as(talldaughters.* *But,ifvarianceintheoutcomeisgreaterthanvariancein thepredictor,therecanbeegressionfrom themean!

slide-13
SLIDE 13

3 ? .) !$%:

Unit12:ResearchQuestionI C+&D::7 :)&$/7& &7&D& ::(:&( &:&&$ &0+5&::: &779&( ::< *+( ,8#;C!& ": !"$# '(+ %&E1 6::& # .&E :& # @+ ε β β + + = 88

1

EDREADING ZEROCENTER ROVEMENT READINGIMP

slide-14
SLIDE 14
  • ? .)

!$%:

Unit12:ResearchQuestionII

C+::79&:: (77 7:# :: 7(: :$;: 7(:((& (&(&&:$ &0+/(7: ::1&9</ )&7&7&< *+( ,8#;C!& ": !"$# '(+ %&E1 6::& # .&E :& # @+

ε β β + + = 88

1

EDREADING ZEROCENTER ROVEMENT READINGIMP

slide-15
SLIDE 15

>

ExploratoryGraphs

Aresidual(akaerror)isthedifferencebetweenourobserved

  • utcomeandourpredictedoutcome.Iftheresidualisnegativethat

meansweshouldhavepredictedlower(i.e.,weoverpredicted).If theresidualispositive,weshouldhavepredictedhigher(i.e., we underpredicted).Ofcourse,weexpectresidualsbecauseofindividual variation,hiddenvariables,andmeasurementerror.

Observation:(32 Prediction:3 Observed– Predicted=Residual (32– 3 =(35

slide-16
SLIDE 16

8

Residuals

Everydatumhasanassociatedresidual, andwecangraphtheresidualswitha histogram:

Whatwouldhappentoourtrendlineifweremovedtheoutlierwitharesidualof(35?Youcanthinkofevery datumaspullingthelinewitharubberband.Whathappenswhenouroutlierletsgoofitsrubberband? Theaverage residualwill bealwaysbe zero.Ifit werenot zero,we wouldneed todrawa bettertrend line.

slide-17
SLIDE 17
  • PlayingAroundForAFewMinutes

http://www.istics.net/stat/PutPoints/ ExpandingourViewof TheScatterplot

  • 1. Scatterplot
  • 2. ResidualHistogram
  • 3. ResidualVs.FittedPlot

(RVFPlot)

1 2 3

slide-18
SLIDE 18
  • PlayingAroundForAFewMinutes

ExtremeExample1: Thepart/wholeproblemsolvedbydeletedresiduals. ExtremeExample2: Highleverageisnotnecessarilyhighinfluence.

slide-19
SLIDE 19
  • PlayingAroundForAFewMinutes

ExtremeExample3: Lowleveragehighresidualsinfluencethey(intercept. ExtremeExample4: Highleveragehighresidualsinfluencetheslope.

slide-20
SLIDE 20
  • PlayingAroundForAFewMinutes

ExtremeExample5: RVFPlotsblowupnon(linearhorseshoes. ExtremeExample6: RVFPlotsblowupheteroskedastic funnels.

slide-21
SLIDE 21
  • PlayingAroundForAFewMinutes

ExtremeExample7: Residualhistogramsprovideinsightintonormality. ExtremeExample8: Residualhistogramsdon’tshowconditional normality.

slide-22
SLIDE 22
  • OutlierDetection:DeletedResiduals(Part1of3)

RegressionWithOutlierRemoved(n=70): OriginalRegression(n=71):

*Identifytheresidual,temporarilyremoveit,andrefit theline. TEMPORARY. SELECTIFNOT(ID=2999973). REGRESSION /MISSINGLISTWISE /STATISTICSCOEFFOUTSCIRANOVA /CRITERIA=PIN(.05)POUT(.10) /NOORIGIN /DEPENDENTREADINGIMPROVEMENT /METHOD=ENTERZEROCENTEREDREADING88. As before, the(raw) residual is(35. The deleted residual is(37.

Noticethattheslopeisnolongerstatsig!

slide-23
SLIDE 23

3

OutlierDetection:DeletedResiduals(Part2or3)

*Wedonothavecalculatedeletedresidual“byhand,” wecanhavethecomputerdoitautomaticallyforevery case,and,alongtheway,wecanhavethecomputerdo awholebunchofotherthings. REGRESSION /MISSINGLISTWISE /STATISTICSCOEFFOUTSCIRANOVA /CRITERIA=PIN(.05)POUT(.10) /NOORIGIN /DEPENDENTREADINGIMPROVEMENT /METHOD=ENTERZEROCENTEREDREADING88 /SCATTERPLOT=(*RESID,*PRED) /RESIDUALSHIST(RESID)NORM(RESID) /SAVEPREDRESIDDRESIDLEVERCOOK.

Createaresidualvs.fittedplot(i.e.,aresidualvs.predictedplot). Createahistogramofresidualsandanormalprobabilityplot. Createfivenewvariables: A PRE_#:Apredicted/fittedvalueforeachobservation. A RES_#:Aresidualforeachobservation. A DRE_#:Adeletedresidualforeachobservation. A LEV_#:Aleveragestatisticforeachobservation. A COO_#:Aninfluencestatistic(Cook’sD)foreachobs.

*Onceweproduceourvariables,wecanexaminethem. EXAMINEVARIABLES=DRE_1LEV_1COO_1 /COMPAREGROUP /STATISTICSDESCRIPTIVESEXTREME /CINTERVAL95 /MISSINGLISTWISE /NOTOTAL. GRAPH /HISTOGRAM(NORMAL)=DRE_1. GRAPH /HISTOGRAM=LEV_1. GRAPH /HISTOGRAM=COO_1.

slide-24
SLIDE 24
  • OutlierDetection:DeletedResiduals(Part3of3)

Adeletedresidualisaresidualbasedonsubtractingthepredictedvaluefromtheobservedvalue,justlikeatypical, rawresidual,exceptthatthepredictedvalueiscalculatedwith theobservationremovedinordertoavoidthe part/wholeprobleminwhichwearelookingforoutliersfromthe trendbuttheoutlierispartofthetrend.

Twoobviousoutliers. Whoarethey?

slide-25
SLIDE 25

>

OutlierDetection:TheLeverageStatistic

Aleveragestatisticisameasureoftheextremityofanobservationbasedonthevalue(s)ofitspredictor(s).Whenwe haveonepredictor,wecaneasilyseewhoisextremeonthatpredictor,butwhenwehave12predictors,itcanbe impossibletoseewhoisgenerally extremeonall predictors. Somehighleverageobservations. Whoarethey?

slide-26
SLIDE 26

8

OutlierDetection:TheCook’sDStatistic

Ahighinfluenceobservation. Whoisit?

Aninfluencestatisticcomparesthetrendline(calculatedfromallthedata,includingtheobservation)witha hypotheticaltrendline(calculatedfromallthedataexceptthe observation).Thebiggerthedifferencebetweenthe twotrendlines,thegreatertheinfluence.Cook’sDstatisticistheinfluencestatisticthatwewilluse,butthereare

  • thers.
slide-27
SLIDE 27
  • OutlierDetection:Residuals,Leverage,Influence

Case Number Deleted Residual Leverage Cook’s Distance Result

8 Extreme Minimal Moderate ExtremeinY,NotinX:InfluenceY(Intercept 27 Extreme Extreme Extreme ExtremeinYAndinX:InfluenceSlope 31 Minimal Extreme Minimal NotExtremeinY,ButinX:LittleInfluence 54 Minimal Minimal Minimal NeitherExtremeinYNorinX:LittleInfluence

27 8 31 54

slide-28
SLIDE 28
  • Non(LinearityDetection:RVFPlot

Aresidualversusfittedplot(RVFplot),alsoknownasaresidualversuspredictedplot,isjustwhatitsaysitis:a scatterplot ofresidualvaluesversusfitted/predictedvalues. Horseshoeshapesindicatenon(linearity.Iftherewereahorseshoeshapeinouroutcomeversus predictorplot,it wouldbemagnifiedintheresidualversusfittedplot,buteverythinglooksokayhere.InUnit13,we’llseeexamplesof non(linearrelationships(andattendanthorseshoes).Ifyouarewondering“what’sthebigdeal?” waituntilwehave7 predictors.Nomatterhowmanypredictors,wewillstillhaveonlyonepredictedvalueandonlyonefittedvaluefor eachobservation,sowecanstilluseanRVFplotforthemultipleregressionmodel,whereaswewouldneednotatwo dimensionalscatterplot oftheoutcomeversuspredictorsbutan8dimensionalscatterplot! GoodOldOutcomeVs.PredictorScatterplot: ShinyNewResidualVs.FittedScatterplot:

slide-29
SLIDE 29
  • Heteroskedasticity Detection:RVFPlot

Aresidualversusfittedplot(RVFplot),alsoknownasaresidualversuspredictedplot,isjustwhatitsaysitis:a scatterplot ofresidualvaluesversusfitted/predictedvalues. Funnelshapesindicateheterskedasticity.Iftherewereafunnelshapeinouroutcomeversuspredictorplot,itwould bemagnifiedintheresidualversusfittedplot,buteverythinglooksokayhere.InUnit14,we’llseeexamplesof heteroskedastic relationships(andattendantfunnels). GoodOldOutcomeVs.PredictorScatterplot: ShinyNewResidualVs.FittedScatterplot:

slide-30
SLIDE 30

3

Non(NormalityDetection:ResidualHistograms

Ahistogramofresidualscangiveanindicationwhetherornotthe residualsarenormallydistributed;however,usewithcaution, becausehistogramsofresidualsshowanunconditionaldistribution (i.e.,theydon’tthinkvertically).Weareultimatelyconcerned withnormality(andhomoskedasticity)conditionalonX. Nevertheless,suchhistogramscanbeuseful,especiallywhen supplementedwithanRVFplotwhichallowsyoutothinkinterms

  • fverticalslicesandconsequentlythinkaboutconditional

distributions.

slide-31
SLIDE 31

3

Non(NormalityDetection:P(PPlots

Aprobability(probabilityplot(P(Pplot)isanotherwayoflookingat aresidualhistogram,withafocusonnormality.Inanormal distributionweexpect50%oftheobservationstobebelow average,and,becauseit’samathematicalconstruct,weobserve 50%oftheobservationstobebelowaverage.Thissimpletruth formsourbaselineofcomparison(theredlinebelow).Inasample distributionfromapopulationwithanormaldistribution,we expect50%oftheobservationstobebelowaverage,butdueto samplingerror(orperhapsduetoanon(normalpopulation distribution,wemayobservemoreorfewerthan50%of

  • bservationstobebelowaverage.

Here,weobserve 50%ofoursample, butweexpecta smidgemorethan 50%. Expect more. Expect less.

Thetake(homemessageforP(Pplots isthatwewantthedottedlinetolie

  • ntopofthestraightline,and

wherethedottedlinedeviates,we havenon(normalityinoursample, whichmayindicatenon(normalityin

  • urpopulation.
slide-32
SLIDE 32

3 ? .) !$%:

ReflectingonourUnit12ResearchQuestions

Q2: /(7:: :1&9</)& 7&7&< Q1: 5&:::& 779&( ::<

Toanswerthefirstquestion,wecansortourdatabyresidualsandfindthe largestpositiveresiduals: Toansweroursecondquestion,weseefromourRVFplotthatthe relationshipappearslinear(nohorseshoe)and homoskedastic (nofunnel).

slide-33
SLIDE 33

33

CheckingRegressionAssumptionsWithRegressionDiagnostics

? .) !$%:

&HI(N(LO 7&& $

H&&+';.)$ I& N+';.=:.1..$ L+';)$ O+:&&$

slide-34
SLIDE 34

3- ? .) !$%:

AnsweringourRoadmapQuestion

+5&7& $:#<

ε β β β β β β β β β β β β β + + + + + + + + + + + + + = LATINO FREELUNCHx BLACK FREELUNCHx ASIAN FREELUNCHx ESLxLATINO ESLxBLACK ESLxASIAN FREELUNCH ESL HOMEWORKP L LATINO BLACK ASIAN READING

12 11 10 9 8 7 6 5 4 3 2 1

1 2

FromtheRVFplot,wedonot appeartohaveaproblemwith meetingthelinearity assumption.However,duetoa ceilingeffect,the homoskedasticity and normalityassumptionsare questionablymet.Otherthan theceilingeffect,the conditionalvariancesappear roughlyequal.Weare concernedthatthehigh(end predictionsarenegatively skewedbecauseoftheceiling effect

';))&$

slide-35
SLIDE 35

3> ? .) !$%:

LookingatNormality

+5&7& $:#<

FromahistogramofresidualsandP(Pplot,weseeaslightnegativeskewoftheresidualsthatweattributetothe ceilingeffectofourreadingmeasure.

slide-36
SLIDE 36

38

LookingforOutliers

Therearenooutliers

  • fconcern,inpart

becausethelarge samplesizeminimizes theinfluenceofany

  • nedatum.

Becauseofthelarge samplesize,the histogramsbeloware fairlyuseless,soIwill turnthedistributionof Cook’sDstatisticsinto ascatterplot….

slide-37
SLIDE 37

3

ABetterLookatTheInfluentialOutliers

Whenweplotthe Cook’sDstatistics versusanarbitrary x(variable,wesee about10students thatstandoutfrom thepack.Wewill inspectthose10 studentsmore closelytoseeif thereisafurther pattern.

slide-38
SLIDE 38

3

LookingForPatternsintheInfluentialOutliers

*HereistheSPSSsyntaxforoutputtingcase(summarytables. *SortthecasesbyCook’sdistance. *Data>SortCases… *SortbytheCook’sD. SORTCASESBYCoo_1(A). *Checkoutthefirsttwentycases, whichhavethehighestCook’sDasperyoursorting. *Analyze>Reports>CaseSummaries… SUMMARIZE /TABLES=IDREADINGHOMEWORKFREELUNCHESLRACE /FORMAT=LISTNOCASENUMTOTALLIMIT=20 /TITLE='CaseSummaries' /MISSING=VARIABLE /CELLS=COUNT.

slide-39
SLIDE 39

3 ? .) !$%:

Unit12Appendix:KeyConcepts

NoticethatIuse“increase.” Thelongitudinaldatawarrantthedevelopmental conclusion! READING92hasmeasurementerror,andREADING88hasmeasurementerror,whenI taketheirdifference,theirdifferencenecessarilyhasmoremeasurementerrorthan either!Ah,well. Wheneverthereisanelementofrandomnessintheoutcome,weexpectregressionto themean.Measurementerrorisonepossiblesourceofrandomness,butnottheonly possiblesourceofrandomness.Ifwepredictadultheightbymother’sheight,wewill getregressiontothemean,eventhoughthereisonlytrivialmeasurementerrorwith height.

slide-40
SLIDE 40
  • ? .)

!$%:

Unit12Appendix:KeyInterpretations

Fortheaveragestudent,weexpectanincreaseof6readingpointsfromthe8thgradetothe 12thgrade,from48pointsto54points. FromtheRVFplot,wedonotappeartohaveaproblemwithmeetingthelinearity assumption.However,duetoaceilingeffect,thehomoskedasticity andnormality assumptionsarequestionablymet.Otherthantheceilingeffect, theconditionalvariances appearroughlyequal.Weareconcernedthatthehigh(endpredictionsarenegativelyskewed becauseoftheceilingeffect. FromahistogramofresidualsandP(Pplot,weseeaslightnegativeskewoftheresidualsthat weattributetotheceilingeffectofourreadingmeasure. Therearenooutliersofconcern,inpartbecausethelargesamplesizeminimizesthe influenceofanyonedatum. WhenweplottheCook’sDstatisticsversusanarbitraryx(variable,weseeabout10students thatstandoutfromthepack.Wewillinspectthose10studentsmorecloselytoseeifthereis afurtherpattern.

slide-41
SLIDE 41
  • ? .)

!$%:

Unit12Appendix:KeyTerminology

Atleastinsimplelinearregression,diagnosticsprovideinformationthatwecouldconceivablygleanfromabivariate scatterplot oftheoutome versuspredictor;neverthelesstheycanprovideahelpfullydetailedview.Inmultiple regression,however,diagnosticsprovideinformationthatwecouldnevergatherbyeye. Aresidual(akaerror)isthedifferencebetweenourobservedoutcomeandourpredictedoutcome.Iftheresidualis negativethatmeansweshouldhavepredictedlower(i.e.,weoverpredicted).Iftheresidualispositive,weshouldhave predictedhigher(i.e.,weunderpredicted).Ofcourse,weexpectresidualsbecauseofindividualvariation,hidden variables,andmeasurementerror. Everydatumhasanassociatedresidual,andwecangraphtheresidualswithahistogram. Adeletedresidualisaresidualbasedonsubtractingthepredictedvaluefromtheobservedvalue,justlikeatypical,raw residual,exceptthatthepredictedvalueiscalculatedwiththe observationremovedinordertoavoidthepart/whole probleminwhichwearelookingforoutliersfromthetrendbuttheoutlierispartofthetrend. Aleveragestatisticisameasureoftheextremityofanobservationbasedonthevalue(s)ofitspredictor(s).Whenwe haveonepredictor,wecaneasilyseewhoisextremeonthatpredictor,butwhenwehave12predictors,itcanbe impossibletoseewhoisgenerally extremeonall predictors. Aninfluencestatisticcomparesthetrendline(calculatedfromallthedata,includingtheobservation)withahypothetical trendline(calculatedfromallthedataexcepttheobservation).Thebiggerthedifferencebetweenthetwotrendlines, thegreattheinfluence.Cook’sDstatisticistheinfluencestatisticthatwewilluse,butthereareothers. Aresidualversusfittedplot(RVFplot),alsoknownasaresidualversuspredictedplot,isjustwhatitsaysitis:a scatterplot ofresidualvaluesversusfitted/predictedvalues. Ahistogramofresidualscangiveanindicationwhetherornottheresidualsarenormallydistributed;however,usewith caution,becausehistogramsofresidualsshowanunconditionaldistribution(i.e.,theydon’tthinkvertically).Weare ultimatelyconcernedwithnormality(andhomoskedasticity)conditionalonX.Nevertheless,suchhistogramscanbe useful,especiallywhensupplementedwithanRVFplotwhichallowsyoutothinkintermsofverticalslicesand consequentlythinkaboutconditionaldistributions. Aprobability(probabilityplot(P(Pplot)isanotherwayoflookingataresidualhistogram,withafocusonnormality.Ina normaldistributionweexpect50%oftheobservationstobebelowaverageand,becauseit’samathematicalconstruct, weobserve50%oftheobservationstobebelowaverage.Thissimpletruthformsourbaselineofcomparison(inred below).Inasampledistributionfromapopulationwithanormal distribution,weexpect50%oftheobservationstobe belowaverage,butduetosamplingerror,wemayobservemoreor fewerthan50%ofobservationstobebelowaverage.

slide-42
SLIDE 42
  • ThepopulationvarianceofmeasurementXisdenoted:.

ThepopulationstandarddeviationofmeasurementXisdenoted: . ThepopulationcorrelationbetweenmeasurementsXandYis denoted:.

? .) !$%:

Unit12Appendix:Formulas

2 2 2 2

2 2

X Y XY X Y DD X Y X Y X Y X YY X

σ σ ρ ρ ρ σ σ ρ σ σ ρ σ σ

′ ′ ′

+ − = + −

2 X X σ ′

2

Y X

ρ σ

ThereliabilityofmeasurementXisdenoted:ρXX‘ ,'wheretheGreek letterrho,standsforthepopulationcorrelation,thesubscript X standsforoneformofmeasurementX,andthesubscriptX'standsfor aparallelformofthemeasurement. ReliabilityNotation TheReliabilityofaDifference

Y X

σ σ

slide-43
SLIDE 43
  • 3

? .) !$%:

Unit12Appendix:Formulas

2 2 2 2

2 2

X Y XY X Y DD X Y X Y X Y X YY X

σ σ ρ ρ ρ σ σ ρ σ σ ρ σ σ

′ ′ ′

+ − = + −

TheReliabilityofaDifference

1 : y Reliabilit Perfect Of Baseline Small Want Big Want Big Want Small Want Big Want Big Want

'

= = − + = = =

′ ′ Y Y X X DD

ρ ρ ρ

AWewantthereliablevarianceinmeasurementXtobebig. AWewantthereliablevarianceinmeasurementYtobebig. AWewantthecorrelationbetweenmeasurementsXandYtobesmall. Ifthecorrelationisnegative,thenthereliabilityofthedifferencecanactuallyexceedthereliabilityoftheindividualtests! AWhathappenswhenmeasurementXandYareperfectlyreliable? AWhathappenswhenmeasurementXandYareperfectlyunreliable? Notethat,ifmeasurementsXandYareperfectlyunreliable,thentheymustbeperfectlyuncorrelatedaswell. AWhathappenswhenmeasurementXandYareperfectlycorrelated? Notethat,ifmeasurementsXandYareperfectlycorrelated(and theyhavethesamestandarddeviation),theneverybodyhasthe sameexact differencescore.

;)&FBG& &D&#D+

XY XY Y Y X X DD

ρ ρ ρ ρ ρ 2 2 2

'

− − + =

′ ′

slide-44
SLIDE 44
  • ? .)

!$%:

Unit12Appendix:SPSSSyntax

*ExampleSPSSsyntaxforcomputingtransformedvariables. *Thislineartransformationisnotaz(transformationbecauseIdidnotdividethedifferencebythestandard deviation. COMPUTEZEROCENTEREDREADING88=READING88( 48.0155. EXECUTE. *This(goofy)transformationisnon(linearbecauseIdomorethanadd/subtractand/ormultiply/dividebyaconstant.I usepowersandlogs. COMPUTESean_Is_A_Great_SPSS_Programmer =READING88*48.0155( 1975/27+FREELUNCH**(1/2)+ LN(HOMEWORK+1). EXECUTE. *Ifweareinterestedinchanges,let’scomputeachangescoreanduseitasouroutcome.Thisisnotalinear transformationbecauseIadd/subtractand/ormultiply/dividebyavariable,notaconstant. COMPUTEREADINGIMPROVEMENT=READING92( READING88. EXECUTE. *Identifytheresidual,temporarilyremoveit,andrefittheline. TEMPORARY. SELECTIFNOT(ID=2999973). REGRESSION /MISSINGLISTWISE /STATISTICSCOEFFOUTSCIRANOVA /CRITERIA=PIN(.05)POUT(.10) /NOORIGIN /DEPENDENTREADINGIMPROVEMENT /METHOD=ENTERZEROCENTEREDREADING88.

slide-45
SLIDE 45
  • >

? .) !$%:

Unit12Appendix:SPSSSyntax

*Wedonothavecalculatedeletedresidual“byhand,” wecanhavethecomputerdoitautomaticallyforeverycase, and,alongtheway,wecanhavethecomputerdoawholebunchof otherthings. REGRESSION /MISSINGLISTWISE /STATISTICSCOEFFOUTSCIRANOVA /CRITERIA=PIN(.05)POUT(.10) /NOORIGIN /DEPENDENTREADINGIMPROVEMENT /METHOD=ENTERZEROCENTEREDREADING88 /SCATTERPLOT=(*RESID,*PRED) /RESIDUALSHIST(RESID)NORM(RESID) /SAVEPREDRESIDDRESIDLEVERCOOK. *Onceweproduceourvariables,wecanexaminethem. EXAMINEVARIABLES=DRE_1LEV_1COO_1 /COMPAREGROUP /STATISTICSDESCRIPTIVESEXTREME /CINTERVAL95 /MISSINGLISTWISE /NOTOTAL. GRAPH /HISTOGRAM(NORMAL)=DRE_1. GRAPH /HISTOGRAM=LEV_1. GRAPH /HISTOGRAM=COO_1.

slide-46
SLIDE 46
  • 8

OutputYourNewVariablesandNiftyPlots

Startfrom: Analyze> Regression> Linear

slide-47
SLIDE 47
  • ExamineYourNewVariables

Lookaroundand checkoutyour

  • ptions.
slide-48
SLIDE 48
  • ? .)

!$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

A Source:HGSEthesisbyDr.LindaKilner entitledIntimacyinFemale Adolescent'sRelationshipswithParentsandFriends(1991).Kilner collectedtheratingsusingtheAdolescentIntimacyScale. A Sample:64adolescentgirlsinthesophomore,juniorandseniorclasses

  • falocalsuburbanpublicschoolsystem.

A Variables:

*&@ @H# C@ @HC# @6:7@ @H6# )'(7@ @H'# .&2&7@ @H.# 6&7@ @H6# *&4 4H# C4 4HC# @6:74 4H6# )'(74 4H'# .&2&74 4H.# 6&74 4H6#

A Overview:Datasetcontainsself(ratingsoftheintimacythat adolescentgirlsperceivethemselvesashavingwith:(a)their motherand(b)theirboyfriend.

slide-49
SLIDE 49
  • ? .)

!$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

slide-50
SLIDE 50

> ? .) !$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

slide-51
SLIDE 51

> ? .) !$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

slide-52
SLIDE 52

> ? .) !$%:

PerceivedIntimacyofAdolescentGirls(Intimacy.sav)

slide-53
SLIDE 53

>3 ? .) !$%:

HighSchoolandBeyond(HSB.sav)

A Source:SubsetofdatagraciouslyprovidedbyValerieLee,Universityof Michigan. A Sample:Thissubsamplehas1044studentsin205schools.Missing data

  • ntheoutcometestscoreandfamilySESwereeliminated.Inaddition,

schoolswithfewerthan3studentsincludedinthissubsetofdatawere excluded. A Variables:

'((E

4&)#,4&),% "#,",% 9#,;,@ 4B!#4! .2#=.2 .#=.2 4BC#4&: 446&#4&& ;!6&#;;71&&

'((I&E

.&@#J=.&: =G#=G .&*#J=.&: 4B!H#2:!= .2H#2:.2= .2H#2:.2= 4BCH#2:&= 446&H#2:(&&= ;!6&H#2:71&&=

A Overview:HighSchool&Beyond– Subsetofdata focusedonselectedstudentandschoolcharacteristics aspredictorsofacademicachievement.

slide-54
SLIDE 54

>- ? .) !$%:

HighSchoolandBeyond(HSB.sav)

slide-55
SLIDE 55

>> ? .) !$%:

HighSchoolandBeyond(HSB.sav)

slide-56
SLIDE 56

>8 ? .) !$%:

HighSchoolandBeyond(HSB.sav)

slide-57
SLIDE 57

> ? .) !$%:

HighSchoolandBeyond(HSB.sav)

slide-58
SLIDE 58

> ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

A Source:PerrinE.C.,Sayer A.G.,andWillettJ.B.(1991). SticksAndStonesMayBreakMyBones:ReasoningAboutIllness CausalityAndBodyFunctioningInChildrenWhoHaveAChronicIllness, Pediatrics,88(3),608(19. A Sample:301children,includingasub(sampleof205whowere describedasasthmatic,diabetic,or healthy.Afterfurtherreductions duetothelist0wise'deletion'ofcaseswithmissingdataononeormore variables,theanalyticsub(sampleusedinclassendsupcontaining:33 diabeticchildren,68asthmaticchildrenand93healthychildren. A Variables:

/""62!# 6I:/6 !# 6I! :&7!$# ..'C# 6I&.(.&'&(C 2!# 6I2:/@ !!2# 6I&:C 6&/# ,2&*(&,= 2&# ,2&,= *(&# ,*(&,=

A Overview:Dataforinvestigatingdifferencesinchildren’s understandingofthecausesofillness,bytheirhealth status.

slide-59
SLIDE 59

> ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

slide-60
SLIDE 60

8 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

slide-61
SLIDE 61

8 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

slide-62
SLIDE 62

8 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

slide-63
SLIDE 63

83 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

slide-64
SLIDE 64

8- ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

slide-65
SLIDE 65

8> ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

slide-66
SLIDE 66

88 ? .) !$%:

UnderstandingCausesofIllness(ILLCAUSE.sav)

slide-67
SLIDE 67

8 ? .) !$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

A Source:Portes,Alejandro,&RubenG.Rumbaut (2001).'Legacies:'The'Story'of' the'Immigrant'SecondGeneration.BerkeleyCA:UniversityofCaliforniaPress. A Sample:Randomsampleof880participantsobtainedthroughthewebsite. A Variables:

:#

:2&&

;&# J&7:(&: @#

,@,;

*#

*& =:&#

!#

6!& A Overview:“CILSisalongitudinalstudydesignedtostudythe adaptationprocessoftheimmigrantsecondgenerationwhichis definedbroadlyasU.S.(bornchildrenwithatleastoneforeign(born parentorchildrenbornabroadbutbroughtatanearlyagetothe UnitedStates.Theoriginalsurveywasconductedwithlargesamples

  • fsecond(generationchildrenattendingthe8thand9thgradesin

publicandprivateschoolsinthemetropolitanareasofMiami/Ft. LauderdaleinFloridaandSanDiego,California” (fromthewebsite descriptionofthedataset).

slide-68
SLIDE 68

8 ? .) !$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

slide-69
SLIDE 69

8 ? .) !$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

slide-70
SLIDE 70
  • ? .)

!$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

slide-71
SLIDE 71
  • ? .)

!$%:

ChildrenofImmigrants(ChildrenOfImmigrants.sav)

slide-72
SLIDE 72
  • ? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

A Source:Sampson,R.J.,Raudenbush,S.W.,&Earls,F.(1997).Neighborhoods andviolentcrime:Amultilevelstudyofcollectiveefficacy.Science,'277,918( 924. A Sample:Thedatadescribedhereconsistofinformationfrom343Neighborhood ClustersinChicagoIllinois.Someofthevariableswereobtainedbyprojectstaff fromthe1990Censusandcityrecords.Othervariableswereobtainedthrough questionnaireinterviewswith8782Chicagoresidentswhowereinterviewedin theirhomes. A Variables: =# =&&$ @># =&> *# 6&*: /H6&#/: (# ( .# . 6!# 6&!&& '&# J55'&'& .&'# J5.&'&

A ThesedatawerecollectedaspartoftheProjecton HumanDevelopmentinChicagoNeighborhoodsin1995.

slide-73
SLIDE 73

3 ? .) !$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-74
SLIDE 74
  • ? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-75
SLIDE 75

> ? .) !$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-76
SLIDE 76

8 ? .) !$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-77
SLIDE 77
  • ? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-78
SLIDE 78
  • ? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-79
SLIDE 79
  • ? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-80
SLIDE 80
  • ? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-81
SLIDE 81
  • ? .)

!$%:

HumanDevelopmentinChicagoNeighborhoods(Neighborhoods.sav)

slide-82
SLIDE 82
  • ? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

A Sample:Thesedataconsistofseventhgraderswhoparticipatedin Wave3ofthe4(HStudyofPositiveYouthDevelopmentatTufts University.Thissubfile isasubstantiallysampled(downversionofthe

  • riginalfile,asallthecaseswithanymissingdataontheseselected

variableswereeliminated. A Variables:

9;# ,;,@ @!# B@I!& # 1 *# * 6# ;/# ;I ./& .# . *# , 1>*# ,B 8K*#

A 4(HStudyofPositiveYouthDevelopment A Source:SubsetofdatafromIARYD,TuftsUniversity

2&6# 1.&2&&6& &6# 1.&&6& .6# 1.&.&6& .2# 1.&.&2& 64# 1.&6&4 5# 15

slide-83
SLIDE 83

3 ? .) !$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

slide-84
SLIDE 84
  • ? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

slide-85
SLIDE 85

> ? .) !$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

slide-86
SLIDE 86

8 ? .) !$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

slide-87
SLIDE 87
  • ? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

slide-88
SLIDE 88
  • ? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

slide-89
SLIDE 89
  • ? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)

slide-90
SLIDE 90
  • ? .)

!$%:

4(HStudyofPositiveYouthDevelopment(4H.sav)