- IndianaUniversity
- PrinciplesofWorkflowin
DataAnalysis
ScottLong
- November2010
- 1.Acoordinatedframeworkforconductingdataanalysis
2.WFinvolvescoordinatedproceduresfor:
- Planning,organizinganddocumentingresearch
- Cleaningdata
- Analyzingdata
- Presentingresults
- Backingupandarchivingmaterials
- 1.YourWFmightbe:
A.Plannedandcarefullyorchestrated. B.Adhoc,piecemeal,developedinreactiontomistakes. 2.YoucanimproveyourWFwithamodestinvestmentoftime. A.Thelessexperienceyouhave,theeasieritis. B.Itwillsaveyoutimeandmakeyouabetterdataanalyst.
- 1.Replication
- Replicationisessentialforgoodscience.
- Aneffectiveworkflowisessentialforreplication.
2.Gettingtherightanswers
- Retractionsareembarrassingandcanendcareers.
3.Time
- “Scienceisavoraciousinstitution.”
- Aneffectiveworkflowmakesyoumoreefficient.
4.Errorsareinevitable;aneffectiveworkflowhelpsyoufindandfixthem.
- 5.GainingtheIUadvantage
- “Thepublicationof[TheWorkflowofDataAnalysis
UsingStata]mayevenreduceIndiana’scomparative advantageofproducinghotshotquantPhDsnowthat gradstudentselsewherecanvicariouslybenefitfrom thisimportantaspectofthetrainingthere.”Gabriel Rossmanonhisblog
- 1.Easythings:consultingoneasythings,insteadofhardthings.
2.Incorrectresultswithclever“explanations”. 3.Adissertationdelayed18monthstodeterminewhyresultschanged. 4.Irreproducibleresultsfromasingle,743linedofile. 5.Analyzingthewrongdataset:“Thedatasetsareexactlythesameexcept thatIchangedthemarriedvariable.” 6.AnalyzingthewrongvariablewhilewritinganNASreport. 7.Miscodedgenesthatdelayedprogressinastudyofalcholism. 8.Collaborationsthatmultiplythewaysthingscangowrong. 9.Misleadingorambiguousoutputsuchas...