SLIDE 41
!$%:
Unit12Appendix:KeyTerminology
Atleastinsimplelinearregression,diagnosticsprovideinformationthatwecouldconceivablygleanfromabivariate scatterplot oftheoutome versuspredictor;neverthelesstheycanprovideahelpfullydetailedview.Inmultiple regression,however,diagnosticsprovideinformationthatwecouldnevergatherbyeye. Aresidual(akaerror)isthedifferencebetweenourobservedoutcomeandourpredictedoutcome.Iftheresidualis negativethatmeansweshouldhavepredictedlower(i.e.,weoverpredicted).Iftheresidualispositive,weshouldhave predictedhigher(i.e.,weunderpredicted).Ofcourse,weexpectresidualsbecauseofindividualvariation,hidden variables,andmeasurementerror. Everydatumhasanassociatedresidual,andwecangraphtheresidualswithahistogram. Adeletedresidualisaresidualbasedonsubtractingthepredictedvaluefromtheobservedvalue,justlikeatypical,raw residual,exceptthatthepredictedvalueiscalculatedwiththe observationremovedinordertoavoidthepart/whole probleminwhichwearelookingforoutliersfromthetrendbuttheoutlierispartofthetrend. Aleveragestatisticisameasureoftheextremityofanobservationbasedonthevalue(s)ofitspredictor(s).Whenwe haveonepredictor,wecaneasilyseewhoisextremeonthatpredictor,butwhenwehave12predictors,itcanbe impossibletoseewhoisgenerally extremeonall predictors. Aninfluencestatisticcomparesthetrendline(calculatedfromallthedata,includingtheobservation)withahypothetical trendline(calculatedfromallthedataexcepttheobservation).Thebiggerthedifferencebetweenthetwotrendlines, thegreattheinfluence.Cook’sDstatisticistheinfluencestatisticthatwewilluse,butthereareothers. Aresidualversusfittedplot(RVFplot),alsoknownasaresidualversuspredictedplot,isjustwhatitsaysitis:a scatterplot ofresidualvaluesversusfitted/predictedvalues. Ahistogramofresidualscangiveanindicationwhetherornottheresidualsarenormallydistributed;however,usewith caution,becausehistogramsofresidualsshowanunconditionaldistribution(i.e.,theydon’tthinkvertically).Weare ultimatelyconcernedwithnormality(andhomoskedasticity)conditionalonX.Nevertheless,suchhistogramscanbe useful,especiallywhensupplementedwithanRVFplotwhichallowsyoutothinkintermsofverticalslicesand consequentlythinkaboutconditionaldistributions. Aprobability(probabilityplot(P(Pplot)isanotherwayoflookingataresidualhistogram,withafocusonnormality.Ina normaldistributionweexpect50%oftheobservationstobebelowaverageand,becauseit’samathematicalconstruct, weobserve50%oftheobservationstobebelowaverage.Thissimpletruthformsourbaselineofcomparison(inred below).Inasampledistributionfromapopulationwithanormal distribution,weexpect50%oftheobservationstobe belowaverage,butduetosamplingerror,wemayobservemoreor fewerthan50%ofobservationstobebelowaverage.