[PPT] - Experimental Models for Valida3ng Technology Marvin V. PowerPoint Presentation

SLIDE 1

Experimental ¡Models ¡for ¡Valida3ng ¡ Technology ¡

Marvin ¡V. ¡Zelkowitz ¡ ¡

P ¡R ¡E ¡S ¡E ¡N ¡T ¡E ¡D ¡BY ¡: ¡Djedjiga ¡OuAoua ¡ ¡

SLIDE 2

INTRODUCTION ¡

¡ ¡ ¡ ¡

Effec3ve ¡soBware ¡(new ¡technology) ¡: ¡ ¡we ¡can ¡have ¡these ¡aEributes ¡ low ¡cost, ¡reliable, ¡rapidly ¡developed, ¡safe, ¡or ¡has ¡some ¡other ¡relevant ¡

aEribute. ¡

¡ ¡ One ¡technique ¡is ¡more ¡or ¡less ¡effec3ve ¡than ¡another ¡ Need ¡Measure ¡each ¡soLware ¡aEribute ¡ number ¡of ¡failures ¡per ¡day, ¡ errors ¡found ¡during ¡development ¡ MTBF ¡(mean ¡Ame ¡between ¡failures) ¡ ¡ count ¡of ¡the ¡number ¡of ¡errors ¡found ¡during ¡tesAng ¡ ¡: ¡ ¡are ¡there ¡errors ¡ remaining ¡to ¡be ¡found ¡? ¡ ¡ ¡ ¡ ¡ ¡……. ¡

SLIDE 3

We ¡should ¡do ¡experimenta3on: ¡ ¡

EvaluaAon ¡aEribute ¡ ¡
Determine ¡whether ¡methods ¡used ¡ ¡effecAve ¡and ¡necessary ¡ ¡

¡ ¡ Now ¡In ¡most ¡case ¡: ¡the ¡creator ¡of ¡the ¡technology ¡ ¡

implements ¡the ¡technology ¡ ¡
shows ¡that ¡it ¡works. ¡

¡ ¡ we ¡need ¡to ¡do ¡more ¡than ¡simply ¡say, ¡“I ¡tried ¡it, ¡and ¡I ¡like ¡it.” ¡

Researchers ¡write ¡papers ¡that ¡explain ¡some ¡new ¡technology; ¡then ¡

they ¡perform ¡“experiments” ¡to ¡show ¡how ¡effecAve ¡the ¡technology ¡is. ¡

INTRODUCTION ¡

SLIDE 4

Experimenta3on ¡: ¡is ¡a ¡crucial ¡part ¡of ¡aEribute ¡evaluaAon ¡and ¡can ¡help ¡ determine ¡whether ¡methods ¡used ¡in ¡accordance ¡with ¡some ¡theory ¡during ¡ product ¡development ¡will ¡result ¡in ¡soLware ¡being ¡as ¡effecAve ¡as ¡necessary ¡

¡ ¡ ¡

How ¡to ¡experiment ¡: ¡ ¡ ¡

we ¡collect ¡enough ¡data ¡from ¡a ¡sufficient ¡number ¡of ¡subjects ¡all ¡adhering ¡

to ¡the ¡same ¡treatment, ¡in ¡order ¡to ¡obtain ¡a ¡staAsAcally ¡significant ¡result ¡

n ¡the ¡aEribute ¡of ¡concern, ¡compared ¡to ¡some ¡other ¡treatment ¡

¡ ¡

One ¡type ¡of ¡experimentaAon: ¡ ¡ ¡Data ¡collecAon ¡and ¡analysis ¡ ¡ Others ¡approaches ¡grouped ¡in ¡4 ¡general ¡categories ¡(4 ¡categories ¡of ¡ experimenta3on): ¡ ¡ ScienAfic ¡method ¡: ¡test ¡alternaAve ¡variaAons ¡of ¡the ¡hypothesis ¡ ¡ Engineering ¡method ¡: ¡test ¡a ¡soluAon ¡to ¡a ¡ ¡hypothesis, ¡improve ¡the ¡soluAon ¡ ¡ Empirical ¡method ¡: ¡validate ¡a ¡given ¡hypothesis ¡by ¡a ¡staAsAcal ¡method. ¡ Data ¡is ¡collected ¡to ¡verify ¡the ¡hypothesis ¡ AnalyAcal ¡method ¡ ¡: ¡ ¡we ¡developed ¡a ¡formal ¡theory ¡ ¡

¡ ¡

HOW ¡DO ¡WE ¡EXPERIMENT? ¡

SLIDE 5

Scien3fic ¡method ¡ ¡

hEp://www.sciencebuddies.org/science-‑fair-‑projects/project_scienAfic_method.shtml ¡ ¡ ¡

SLIDE 6

hEp://ssds-‑science5774.weebly.com/scienAfic-‑method-‑and-‑the-‑engineering-‑design-‑ process.html ¡ ¡

Engineering ¡method ¡

SLIDE 7

hEps://en.wikipedia.org/wiki/Empirical_research ¡ ¡ ¡ ¡

Empirical ¡method ¡ ¡

ObservaAon: ¡The ¡collecAng ¡and ¡
rganisaAon ¡of ¡empirical ¡facts; ¡

Forming ¡hypothesis. ¡

InducAon: ¡FormulaAng ¡
hypothesis. ¡
DeducAon: ¡DeducAng ¡

consequences ¡of ¡hypothesis ¡as ¡ testable ¡predicAons. ¡

TesAng: ¡TesAng ¡the ¡hypothesis ¡

with ¡new ¡empirical ¡material. ¡

EvaluaAon: ¡EvaluaAng ¡the ¡
utcome ¡of ¡tesAng ¡

SLIDE 8

common ¡: ¡ ¡collecAon ¡of ¡data ¡on ¡the ¡development ¡process ¡or ¡the ¡ product ¡itself ¡ ¡ ¡ ¡ In ¡an ¡experiment, ¡a ¡researcher ¡manipulates ¡one ¡or ¡more ¡ variables ¡: ¡ ¡

independent ¡variable ¡( ¡factors) ¡
¡dependent ¡variables ¡
¡experimental ¡units ¡: ¡

¡ ¡

HOW ¡DO ¡WE ¡EXPERIMENT? ¡

SLIDE 9

1-‑Factor ¡ ¡(independent ¡variable): ¡ ¡is ¡explanatory ¡variable ¡manipulated ¡by ¡the ¡

experimenter. ¡Each ¡factor ¡has ¡two ¡or ¡more ¡levels ¡(i.e., ¡different ¡values ¡of ¡the ¡factor). ¡

CombinaAons ¡of ¡factor ¡levels ¡are ¡called ¡treatments. ¡ ¡ hEp://staErek.com/experiments/what-‑is-‑an-‑experiment.aspx?Tutorial=AP ¡ ¡

the ¡researcher ¡is ¡studying ¡the ¡

possible ¡effects ¡of ¡Vitamin ¡C ¡and ¡ Vitamin ¡E ¡on ¡health. ¡ ¡

2 ¡factors ¡: ¡dosage ¡of ¡Vitamin ¡C ¡and ¡dosage ¡of ¡Vitamin ¡E. ¡ ¡
6 ¡Treatments ¡: ¡

2-‑ ¡Dependent ¡variable. ¡The ¡dependent ¡variable ¡in ¡this ¡experiment ¡would ¡be ¡some ¡ measure ¡of ¡health ¡(annual ¡doctor ¡bills, ¡number ¡of ¡colds ¡caught ¡in ¡a ¡year, ¡number ¡of ¡days ¡ hospitalized, ¡etc.). ¡ 3-‑Experimental ¡units ¡(subject). ¡The ¡recipients ¡of ¡experimental ¡treatments ¡: ¡people ¡ (parAcipants), ¡plants, ¡animals ¡(subjects), ¡ ¡lab, ¡. ¡

HOW ¡DO ¡WE ¡EXPERIMENT? ¡

SLIDE 10

The ¡goal ¡of ¡an ¡experiment ¡: ¡

collect ¡enough ¡data ¡from ¡a ¡sufficient ¡number ¡of ¡

subjects ¡ ¡

¡ ¡ ¡In ¡order ¡to ¡– ¡obtain ¡a ¡staAsAcally ¡significant ¡result ¡
n ¡the ¡aEribute ¡compared ¡to ¡ ¡some ¡other ¡treatment ¡ ¡

SLIDE 11

1-‑Replica3on. ¡ReplicaAon ¡refers ¡to ¡the ¡pracAce ¡of ¡assigning ¡each ¡ treatment ¡to ¡many ¡experimental ¡units. ¡ ¡ 2-‑ ¡influence ¡: ¡ ¡we ¡need ¡to ¡know ¡the ¡impact—that ¡is, ¡the ¡influence— that ¡a ¡given ¡experimental ¡design ¡has ¡on ¡the ¡results ¡of ¡an ¡experiment. ¡ ¡ ¡ They ¡Classify ¡: ¡ ¡Methods ¡passive ¡and ¡Methods ¡acAve ¡ 3-‑Local ¡Control ¡: ¡refers ¡to ¡steps ¡taken ¡to ¡reduce ¡the ¡effects ¡of ¡extraneous ¡variables ¡(i.e., ¡ variables ¡other ¡than ¡the ¡independent ¡variable ¡and ¡the ¡dependent ¡variable). ¡ ¡

Characteris3cs ¡of ¡a ¡Well-‑Designed ¡Experiment ¡

3-‑ ¡Temporal ¡proper3es ¡: ¡Data ¡collecAon ¡may ¡be ¡historical ¡(for ¡example, ¡ archaeological) ¡or ¡current ¡(for ¡example, ¡monitoring ¡a ¡current ¡project). ¡Historical ¡ data ¡will ¡certainly ¡be ¡passive, ¡but ¡may ¡be ¡missing ¡just ¡the ¡informaAon ¡we ¡need ¡to ¡ come ¡to ¡a ¡conclusion ¡

SLIDE 12

Example ¡of ¡no ¡controls ¡

Consider ¡this ¡example. ¡A ¡drug ¡manufacturer ¡tests ¡a ¡new ¡cold ¡ medicine ¡with ¡200 ¡parAcipants ¡-‑ ¡100 ¡men ¡and ¡100 ¡women. ¡ The ¡men ¡receive ¡the ¡drug, ¡and ¡the ¡women ¡do ¡not. ¡At ¡the ¡ end ¡of ¡the ¡test ¡period, ¡the ¡men ¡report ¡fewer ¡colds. ¡ it ¡is ¡impossible ¡: ¡ ¡to ¡say ¡whether ¡the ¡drug ¡was ¡effecAve. ¡ we ¡don’t ¡consider ¡: ¡

The ¡men ¡are ¡less ¡vulnerable ¡to ¡the ¡parAcular ¡cold ¡virus ¡

circulaAng ¡during ¡the ¡experiment ¡ ¡

The ¡men ¡experienced ¡a ¡placebo ¡effect. ¡

Characteris3cs ¡of ¡a ¡Well-‑Designed ¡Experiment ¡

SLIDE 13

We ¡describe ¡now ¡several ¡approaches ¡and ¡the ¡results ¡of ¡a ¡study ¡and ¡how ¡ ¡these ¡

approaches ¡ ¡have ¡been ¡used ¡: ¡ ¡ ¡

¡ ¡ Valida3on ¡Models ¡: ¡They ¡idenAfy ¡ ¡12 ¡methods ¡used ¡by ¡researchers ¡to ¡develop ¡

new ¡technology ¡that ¡have ¡been ¡used ¡in ¡the ¡computer ¡field ¡: ¡

1. ¡Project ¡monitoring ¡: ¡ObservaAonal ¡ ¡
2. Case ¡study ¡: ¡ObservaAonal ¡ ¡
3. AsserAon ¡ ¡: ¡ ¡informal ¡(no ¡validaAon) ¡
4. Field ¡study ¡: ¡ ¡ObservaAonal ¡ ¡
5. Literature ¡search ¡: ¡Historical ¡ ¡
6. Legacy ¡: ¡Historical ¡ ¡
7. Lessons ¡learned ¡: ¡Historical ¡ ¡
8. StaAc ¡analysis ¡: ¡Historical ¡ ¡
9. Replicated ¡: ¡controlled ¡ ¡

10. SyntheAc ¡: ¡controlled ¡ ¡ 11. Dynamic ¡analysis ¡: ¡controlled ¡ ¡ 12. SimulaAon ¡: ¡controlled ¡ ¡

VALIDATION ¡MODELS ¡ ¡

SLIDE 14

Each ¡approaches ¡use ¡one ¡data ¡collecAon ¡methods: ¡

1. Observa3onal ¡method ¡: ¡ ¡It ¡is ¡the ¡collecAon ¡and ¡storage ¡of ¡data ¡that ¡
ccurs ¡during ¡project ¡development ¡
2. Historical ¡method ¡: ¡collects ¡data ¡from ¡projects ¡that ¡have ¡already ¡been ¡

completed, ¡The ¡data ¡already ¡exist ¡

3. Controlled ¡method ¡ ¡: ¡provides ¡for ¡mulAple ¡instances ¡of ¡an ¡observaAon ¡

for ¡staAsAcal ¡validity ¡of ¡the ¡ ¡results. ¡ ¡Involve ¡study ¡of ¡alternaAve ¡ strategies ¡to ¡determine ¡the ¡effecAveness ¡of ¡one ¡method ¡as ¡compared ¡ to ¡other ¡methods. ¡ ¡

4. Informal ¡methods ¡: ¡ ¡are ¡generally ¡ad ¡hoc ¡and ¡do ¡not ¡provide ¡significant ¡

results ¡

VALIDATION ¡MODELS ¡ ¡

SLIDE 15

OBSERVATIONAL ¡METHODS ¡ ¡ project ¡monitoring ¡ case ¡study ¡ ¡asserAon ¡ field ¡study ¡

SLIDE 16

Projects ¡monitoring ¡: ¡ ¡ ¡ ¡ ¡ ¡ ¡

It ¡is ¡the ¡collecAon ¡and ¡storage ¡of ¡data ¡that ¡occurs ¡during ¡project ¡

development ¡

the ¡available ¡data ¡will ¡be ¡whatever ¡the ¡project ¡generates ¡, ¡the ¡researchers ¡

do ¡not ¡aEempt ¡to ¡influence ¡or ¡redirect ¡the ¡development ¡process ¡or ¡ methods ¡being ¡used. ¡

Researchers ¡assume ¡the ¡data ¡will ¡be ¡used ¡for ¡some ¡immediate ¡analysis. ¡If ¡

an ¡experimental ¡design ¡is ¡constructed ¡aLer ¡the ¡project ¡is ¡finished, ¡then ¡we ¡ call ¡this ¡a ¡historical ¡lessons-‑learned ¡study. ¡ A ¡problem ¡

is ¡the ¡difficulty ¡in ¡retrieving ¡informaAon ¡later ¡
lacks ¡any ¡experimental ¡goals ¡
lacks ¡consistency ¡in ¡the ¡collected ¡data ¡

The ¡solu3on ¡

Requires ¡some ¡minimal ¡coordinaAon ¡among ¡the ¡various ¡development ¡

acAviAes ¡in ¡an ¡organizaAon. ¡ ¡

Collect ¡ ¡the ¡informaAon ¡which ¡allow ¡ ¡ ¡to ¡establish ¡a ¡baseline ¡

OBSERVATIONAL ¡METHODS ¡ ¡

SLIDE 17

Case ¡study: ¡ ¡

researchers ¡monitor ¡a ¡project ¡and ¡collect ¡data ¡over ¡Ame ¡
data ¡collected ¡are ¡derived ¡from ¡a ¡specific ¡goal ¡for ¡the ¡project ¡ ¡
Collect ¡data ¡to ¡measure ¡the ¡aEributes ¡-‑such ¡as ¡reliability ¡or ¡cost ¡
Build ¡a ¡baseline ¡to ¡represent ¡the ¡organizaAon’s ¡standard ¡process ¡for ¡soLware ¡
development. ¡
Is ¡an ¡acAve ¡method: ¡humans ¡influence ¡the ¡development ¡process ¡itself. ¡ ¡ ¡
filling ¡out ¡a ¡form—hours ¡worked, ¡errors ¡found,..—not ¡intrusive, ¡but ¡it ¡may ¡be ¡used ¡

to ¡react ¡to ¡certain ¡issues ¡that ¡emerge ¡in ¡the ¡study. ¡ ¡ ¡ Strength ¡ ¡: ¡

That ¡the ¡development ¡is ¡going ¡to ¡happen ¡regardless ¡of ¡the ¡needs ¡to ¡collect ¡

experimental ¡data ¡

researchers ¡can ¡amass ¡data ¡from ¡many ¡projects ¡over ¡a ¡short ¡period ¡of ¡Ame. ¡

Weakness ¡ ¡ ¡ ¡

that ¡each ¡development ¡is ¡relaAvely ¡unique, ¡so ¡it ¡is ¡not ¡always ¡possible ¡to ¡compare ¡
ne ¡development ¡profile ¡with ¡another. ¡
Determining ¡trends ¡and ¡staAsAcal ¡validity ¡is ¡oLen ¡difficult. ¡ ¡
The ¡pracAcality ¡of ¡compleAng ¡a ¡project ¡on ¡Ame ¡may ¡mean ¡that ¡experimental ¡goals ¡

must ¡be ¡sacrificed. ¡ExperimentaAon ¡may ¡be ¡a ¡risk ¡that ¡management ¡is ¡not ¡willing ¡to ¡ undertake ¡

¡ ¡

OBSERVATIONAL ¡METHODS ¡ ¡

SLIDE 18

Asser3on ¡: ¡ ¡ ¡

The ¡ ¡developers ¡are ¡ ¡both ¡experimenters ¡and ¡

subjects ¡of ¡study ¡

Favour ¡the ¡proposed ¡technology ¡over ¡

alternaAves ¡. ¡

The ¡goal ¡is ¡not ¡to ¡understand ¡the ¡difference ¡

between ¡two ¡treatments, ¡but ¡to ¡show ¡the ¡ superiority ¡of ¡one. ¡ ¡ OBSERVATIONAL ¡METHODS ¡ ¡

SLIDE 19

Field ¡study ¡: ¡ ¡

May ¡examine ¡data ¡collected ¡from ¡several ¡projects ¡(or ¡subjects) ¡simultaneously. ¡
Is ¡less ¡intrusive ¡than ¡the ¡case ¡study ¡ ¡ ¡

¡ ¡

a ¡primary ¡ ¡ ¡goal ¡is ¡ ¡not ¡to ¡perturb ¡the ¡subject ¡under ¡study ¡: ¡impossible ¡to ¡collect ¡all ¡relevant ¡

data ¡in ¡a ¡field ¡study. ¡ ¡ ¡

Typically, ¡data ¡are ¡collected ¡from ¡each ¡acAvity ¡in ¡order ¡to ¡determine ¡an ¡acAvity’s ¡
effecAveness. ¡ ¡

¡ ¡

Outside ¡group ¡will ¡monitor ¡the ¡acAons ¡of ¡each ¡subject ¡group ¡to ¡collect ¡the ¡relevant ¡

informaAon ¡ ¡ ¡ ¡ ¡

This ¡model ¡best ¡represents ¡an ¡organizaAon ¡that ¡wishes ¡to ¡measure ¡its ¡development ¡

pracAces ¡without ¡changing ¡its ¡processes ¡ ¡ ¡ ¡ ¡

works ¡best ¡for ¡products ¡that ¡are ¡already ¡complete ¡: ¡ ¡monitoring ¡project ¡groups ¡that ¡use ¡a ¡

new ¡tool ¡and ¡those ¡don’t ¡use ¡it ¡in ¡order ¡to ¡determine ¡differences. ¡ ¡

OBSERVATIONAL ¡METHODS ¡ ¡

SLIDE 20

literature ¡search ¡
¡legacy ¡data ¡
¡lessons ¡learned ¡
¡staAc ¡analysis. ¡

HISTORICAL ¡METHODS ¡ ¡

SLIDE 21

Literature ¡search ¡: ¡ ¡

the ¡invesAgator ¡should ¡ ¡analyze ¡ ¡the ¡results ¡of ¡papers ¡and ¡other ¡documents ¡that ¡

are ¡publicly ¡available ¡

confirm ¡an ¡exisAng ¡hypothesis ¡ ¡
enhance ¡the ¡data ¡collected ¡on ¡one ¡project ¡with ¡data ¡that ¡has ¡been ¡previously ¡

published ¡on ¡similar ¡projects ¡

places ¡no ¡demands ¡on ¡a ¡given ¡project ¡ ¡

¡ ¡ weakness ¡ ¡

selecAon ¡bias ¡: ¡tendency ¡of ¡researchers, ¡authors, ¡and ¡journal ¡editors ¡to ¡publish ¡

posiAve ¡results. ¡Contradictory ¡results ¡oLen ¡are ¡not ¡reported ¡

the ¡lack ¡of ¡quanAtaAve ¡data ¡

HISTORICAL ¡METHODS ¡ ¡

SLIDE 22

Legacy ¡data ¡: ¡ ¡a ¡form ¡of ¡soLware ¡archaeology ¡/ ¡data ¡mining ¡ ¡

researchers ¡consider ¡the ¡available ¡data ¡ ¡from ¡previous ¡completed ¡project ¡in ¡
rder ¡to ¡apply ¡it ¡to ¡a ¡new ¡project ¡
Lot ¡of ¡quanAtaAve ¡data ¡available ¡to ¡analysis ¡(from ¡the ¡previous ¡data ¡) ¡
Examine ¡exisAng ¡files ¡and ¡determine ¡trends ¡and ¡the ¡relaAonships ¡: ¡data ¡mining ¡ ¡
There ¡is ¡no ¡: ¡costs, ¡schedules ¡, ¡on ¡project ¡delivery ¡, ¡the ¡real-‑Ame ¡pressures ¡of ¡

delivering ¡a ¡finished ¡product ¡ ¡

Great ¡variability ¡of ¡the ¡collected ¡informaAon’s ¡availability ¡
Don’t ¡compare ¡one ¡project ¡with ¡another ¡ ¡

HISTORICAL ¡METHODS ¡ ¡

SLIDE 23

Lessons ¡learned ¡ ¡

Produce ¡lessons ¡learned ¡documents ¡aLer ¡compleAng ¡a ¡large ¡industrial ¡project. ¡ ¡
A ¡study ¡of ¡these ¡documents ¡oLen ¡reveals ¡qualitaAve ¡aspects ¡(good ¡aspects) ¡

that ¡can ¡be ¡used ¡to ¡improve ¡future ¡developments. ¡

¡If ¡project ¡personnel ¡are ¡sAll ¡available, ¡it ¡is ¡possible ¡to ¡interview ¡them ¡to ¡

understand ¡the ¡effects ¡of ¡methods ¡used. ¡ ¡ Weakness ¡ ¡

Can ¡indicate ¡various ¡trends ¡but ¡cannot ¡be ¡used ¡for ¡staAsAcally ¡validaAng ¡the ¡
results. ¡ ¡
always ¡the ¡same ¡comments ¡about ¡what ¡should ¡have ¡been ¡done ¡are ¡repeated ¡in ¡

each ¡successive ¡document. ¡We ¡never ¡seem ¡to ¡learn ¡from ¡our ¡previous ¡mistakes. ¡

HISTORICAL ¡METHODS ¡ ¡

SLIDE 24

Sta3c ¡analysis ¡: ¡ ¡

obtain ¡informaAon ¡by ¡looking ¡at ¡a ¡completed ¡product: ¡same ¡with ¡legacy ¡data ¡ ¡
¡we ¡centralize ¡our ¡concerns ¡on ¡the ¡product ¡developed, ¡(whereas ¡legacy ¡data ¡

includes ¡measuring ¡the ¡development ¡process.) ¡

¡analyze ¡the ¡structure ¡of ¡the ¡product ¡to ¡determine ¡its ¡characterisAcs. ¡
ApplicaAon ¡: ¡ ¡ ¡SoLware ¡complexity ¡and ¡dataflow ¡

Weakness ¡ ¡

it ¡is ¡difficult ¡to ¡show ¡that ¡a ¡model’s ¡quanAtaAve ¡definiAon ¡relates ¡directly ¡to ¡

the ¡aEribute ¡of ¡interest. ¡ ¡

HISTORICAL ¡METHODS ¡ ¡

SLIDE 25

Replicated ¡ syntheAc ¡environment ¡ dynamic ¡analysis ¡ ¡ ¡simulaAon ¡ CONTROLLED ¡METHODS ¡ ¡

SLIDE 26

¡ ¡

Replicated ¡experiment ¡

projects ¡(or ¡subjects) ¡are ¡staffed ¡to ¡perform ¡a ¡task ¡in ¡mulAple ¡ways ¡
control ¡variables ¡: ¡ ¡duraAon, ¡staff ¡level, ¡methods ¡used. ¡ ¡
establish ¡staAsAcal ¡validity ¡more ¡easily ¡than ¡by ¡relying ¡on ¡case ¡studies. ¡ ¡
Replace ¡a ¡given ¡task ¡with ¡another ¡task ¡: ¡replace ¡Ada ¡with ¡C++ ¡
Researchers ¡form ¡several ¡treatments ¡that ¡implement ¡products ¡using ¡either ¡the ¡old ¡or ¡new ¡task. ¡Then ¡

they ¡collect ¡data ¡on ¡both ¡approaches ¡and ¡compare ¡the ¡results. ¡ ¡

If ¡there ¡are ¡enough ¡replicaAons—perhaps ¡20 ¡to ¡40—researchers ¡can ¡establish ¡the ¡staAsAcal ¡validity ¡of ¡

the ¡method ¡under ¡consideraAon. ¡ ¡ ¡ ¡ Problem ¡ ¡ ¡

The ¡enormous ¡cost ¡of ¡replicaAons ¡ ¡
experiment ¡human ¡subjects ¡ ¡such ¡as ¡the ¡development ¡team ¡ ¡disturb ¡ ¡the ¡experiment ¡: ¡ ¡the ¡groups ¡know ¡

that ¡they ¡are ¡a ¡part ¡of ¡a ¡replicated ¡experiment ¡. ¡ ¡ ¡ Solu3on ¡ ¡

each ¡replicaAon ¡represent ¡a ¡slightly ¡different ¡product, ¡each ¡one ¡required ¡by ¡a ¡different ¡customer. ¡ ¡

CONTROLLED ¡METHODS ¡ ¡

SLIDE 27

Synthe3c ¡environment ¡experiments ¡

¡ ¡

perform ¡soLware ¡engineering ¡replicaAons ¡in ¡a ¡smaller ¡arAficial ¡setng ¡( ¡syntheAc ¡

environment ¡experiments). ¡ ¡In ¡the ¡large ¡projects ¡ ¡ ¡

seeks ¡to ¡invesAgate ¡some ¡aspect ¡in ¡system ¡design ¡or ¡use. ¡
Researchers ¡idenAfy ¡a ¡relaAvely ¡small ¡objecAve ¡and ¡fix ¡all ¡variables ¡except ¡for ¡the ¡control ¡

method ¡being ¡modified. ¡randomize ¡personnel ¡from ¡a ¡homogeneous ¡pool ¡of ¡subjects, ¡fix ¡the ¡ duraAon ¡of ¡the ¡experiment, ¡and ¡monitor ¡as ¡many ¡variables ¡as ¡possible. ¡ ¡

easy ¡to ¡conduct ¡and ¡potenAally ¡lead ¡to ¡staAsAcal ¡validity. ¡ ¡

Problem ¡ ¡

The ¡problem ¡of ¡transferring ¡a ¡result ¡covering ¡only ¡a ¡few ¡subjects ¡may ¡not ¡apply ¡to ¡large ¡group ¡

studies ¡

We ¡lose ¡sight ¡of ¡the ¡fact ¡that ¡the ¡experiment ¡itself ¡has ¡liEle ¡value, ¡since ¡it ¡doesn’t ¡relate ¡to ¡

problems ¡actually ¡encountered ¡in ¡an ¡industrial ¡setng. ¡

CONTROLLED ¡METHODS ¡ ¡

SLIDE 28

Dynamic ¡analysis ¡

look ¡at ¡dynamic ¡analysis ¡methods ¡that ¡analyze ¡the ¡product ¡itself ¡(others ¡controlled ¡

methods ¡we ¡have ¡so ¡evaluate ¡the ¡development ¡process.) ¡

Many ¡instruments ¡can ¡test ¡a ¡product ¡by ¡adding ¡debugging ¡or ¡tesAng ¡code ¡so ¡that ¡a ¡

product’s ¡features ¡can ¡be ¡demonstrated ¡and ¡evaluated ¡when ¡it ¡executes. ¡ ¡

The ¡ ¡scripts ¡can ¡be ¡used ¡to ¡compare ¡different ¡products ¡that ¡have ¡similar ¡funcAonality. ¡
The ¡dynamic ¡behavior ¡of ¡a ¡product ¡can ¡be ¡determined ¡oLen ¡without ¡the ¡need ¡to ¡

understand ¡the ¡design ¡of ¡the ¡product ¡itself. ¡ ¡ Weakness ¡ ¡

if ¡we ¡instrument ¡the ¡product ¡by ¡adding ¡source ¡statements, ¡we ¡may ¡be ¡perturbing ¡its ¡

behavior ¡in ¡unpredictable ¡ways. ¡ ¡

ExecuAng ¡a ¡program ¡shows ¡its ¡behavior ¡for ¡that ¡specific ¡data ¡set, ¡which ¡cannot ¡oLen ¡be ¡

generalized ¡to ¡other ¡data ¡sets. ¡ ¡

CONTROLLED ¡METHODS ¡ ¡

SLIDE 29

Simula3on ¡

Researchers ¡can ¡evaluate ¡a ¡technology ¡by ¡execuAng ¡the ¡product ¡using ¡a ¡model ¡of ¡the ¡

real ¡environment. ¡ ¡

If ¡researchers ¡can ¡model ¡the ¡behavior ¡of ¡the ¡environment ¡for ¡certain ¡variables, ¡they ¡can ¡
Len ¡ignore ¡other ¡harder-‑to-‑obtain ¡variables ¡
By ¡ignoring ¡extraneous ¡variables, ¡simulaAon ¡is ¡oLen ¡easier, ¡faster, ¡and ¡less ¡expensive ¡to ¡

run ¡than ¡the ¡full ¡product ¡in ¡the ¡real ¡environment. ¡ ¡

Researchers ¡can ¡oLen ¡test ¡a ¡technology ¡without ¡the ¡risk ¡of ¡failure ¡on ¡an ¡important ¡
project. ¡ ¡

Weakness ¡ ¡

Is ¡not ¡knowing ¡how ¡well ¡the ¡syntheAc ¡environment ¡models ¡reality ¡

CONTROLLED ¡METHODS ¡ ¡

SLIDE 30

hEps://books.google.ca/books?id=tAYaNNaneuMC&pg=PA236&lpg=PA236&dq=an +informal+%28asserAon%29+form+of+va.............. ¡

SLIDE 31

MODEL ¡VALIDATION ¡

To ¡test ¡whether ¡the ¡classifica/on ¡presented ¡here ¡reflects ¡the ¡so4ware ¡engineering ¡

community’s ¡idea ¡of ¡experimental ¡design ¡and ¡data ¡collec/on, ¡we ¡examined ¡ ¡612 ¡ so4ware ¡engineering ¡publica/ons ¡covering ¡three ¡different ¡years: ¡1985, ¡1990, ¡and ¡1995. ¡ ¡

We ¡classified ¡each ¡paper ¡according ¡to ¡the ¡data ¡collecAon ¡method ¡used ¡to ¡validate ¡the ¡

claims ¡in ¡the ¡paper. ¡For ¡completeness ¡we ¡added ¡the ¡following ¡two ¡classificaAons: ¡ ¡

Not ¡applicable ¡: ¡ ¡there ¡is ¡no ¡new ¡technology ¡, ¡no ¡data ¡collecAon ¡
¡No ¡experiment ¡: ¡new ¡technology ¡without ¡experimental ¡validaAon ¡ ¡
disAnguish ¡between ¡

data ¡used ¡as ¡a ¡demonstraAon ¡of ¡concept ¡( ¡“proof ¡of ¡concept,” ¡) ¡ a ¡true ¡aEempt ¡at ¡validaAon ¡of ¡their ¡results. ¡

we ¡considered ¡a ¡ ¡DemonstraAon ¡of ¡technology ¡via ¡example ¡as ¡part ¡of ¡the ¡analyAcal ¡

phase ¡

SLIDE 32

MODEL ¡VALIDATION ¡

SLIDE 33

MODEL ¡VALIDATION ¡

SLIDE 34

Quan3ta3ve ¡observa3ons ¡ ¡

¡ ¡ assessed ¡: ¡612 ¡papers ¡and ¡judged ¡50 ¡to ¡be ¡“not ¡applicable.”= ¡562 ¡papers ¡examined ¡ ¡ ¡ ¡

Case ¡studies ¡ ¡and ¡lessons ¡learned ¡ ¡: ¡the ¡most ¡prevalent ¡validaAon ¡, ¡ ¡10 ¡percent ¡each ¡ ¡ AsserAon ¡method ¡: ¡a ¡ ¡third ¡of ¡the ¡papers ¡ ¡ ¡ SimulaAon ¡method ¡: ¡5 ¡percent ¡ ¡ Remaining ¡techniques ¡: ¡1-‑3 ¡percent ¡ ¡each ¡(percent ¡of ¡the ¡paper ¡) ¡

A ¡third ¡of ¡the ¡papers ¡had ¡no ¡experimental ¡validaAon ¡: ¡36 ¡% ¡1985 ¡ ¡ ¡29 ¡% ¡1990 ¡ ¡ ¡19 ¡% ¡1995 ¡. ¡ ¡

¡ ¡ ¡Tichy ¡classified ¡some ¡papers ¡into ¡: ¡ ¡formal ¡theory, ¡design ¡and ¡modeling, ¡empirical ¡work, ¡ hypothesis ¡tesAng, ¡and ¡other. ¡ ¡ ¡His ¡major ¡observaAon ¡was ¡: ¡they ¡are ¡consistent ¡with ¡our ¡

results. ¡ ¡

¡ ¡ They ¡start ¡Comparing ¡the ¡result ¡with ¡others ¡disciplines ¡: ¡physics ¡, ¡economics ¡, ¡ behavioral ¡sciences. ¡ ¡They ¡found ¡: ¡archival ¡research ¡journals ¡do ¡not ¡differ ¡materially ¡ from ¡archival ¡journals ¡(hard ¡sciences) ¡

MODEL ¡VALIDATION ¡

SLIDE 35

Qualita3ve ¡observa3ons ¡: ¡ ¡ ¡

¡ ¡

Authors ¡oLen ¡fail ¡to ¡state ¡their ¡goals ¡clearly ¡or ¡to ¡point ¡to ¡the ¡value ¡ that ¡their ¡method ¡or ¡tool ¡adds ¡to ¡the ¡experimentaAon ¡process. ¡ ¡ ¡ ¡ Authors ¡oLen ¡fail ¡to ¡state ¡how ¡they ¡validate ¡their ¡hypotheses ¡: ¡ secAons ¡“validaAon” ¡or ¡“experimental ¡results.” ¡ ¡Not ¡found ¡ ¡ Authors ¡oLen ¡use ¡terms ¡very ¡loosely ¡: ¡ ¡ ¡“case ¡study” ¡, ¡“controlled ¡ experiment” ¡, ¡“lessons ¡learned” ¡ ¡ ¡

¡ ¡ AEempted ¡to ¡Classify ¡each ¡paper ¡by ¡what ¡the ¡authors ¡did, ¡not ¡by ¡what ¡they ¡called ¡their ¡

work. ¡ ¡

MODEL ¡VALIDATION ¡

SLIDE 36

¡ ¡ ¡ ¡

Dilemma ¡ ¡: ¡The ¡papers ¡are ¡influenced ¡greatly ¡by ¡the ¡publicaAon’s ¡editor ¡or, ¡in ¡the ¡case ¡of ¡a ¡

conference, ¡by ¡the ¡program ¡commiEee. ¡ Other ¡factor ¡: ¡ ¡In ¡the ¡study, ¡the ¡editors ¡and ¡program ¡commiEees ¡from ¡1985, ¡1990, ¡and ¡1995 ¡ were ¡all ¡different. ¡ ¡ ¡ ¡ This ¡difference ¡: ¡may ¡have ¡affected ¡our ¡outcome. ¡ ¡ The ¡only ¡way ¡to ¡try ¡ ¡to ¡understand ¡how ¡research ¡in ¡soLware ¡engineering ¡is ¡validated ¡is ¡via ¡ the ¡publicaAons ¡on ¡soLware ¡engineering ¡. ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡

MODEL ¡VALIDATION ¡

SLIDE 37

¡ ¡

too ¡many ¡papers ¡have ¡no ¡experimental ¡validaAon ¡at ¡all ¡ ¡too ¡many ¡papers ¡use ¡an ¡informal ¡form ¡of ¡validaAon ¡(asserAon) ¡ ¡ ¡researchers ¡use ¡lessons ¡learned ¡and ¡case ¡studies ¡about ¡10 ¡percent ¡of ¡the ¡ Ame, ¡with ¡the ¡other ¡techniques ¡being ¡used ¡only ¡a ¡small ¡percent ¡of ¡the ¡Ame ¡ at ¡most ¡ experimentaAon ¡terminology ¡is ¡sloppy. ¡ clearly ¡, ¡ ¡more ¡work ¡needs ¡to ¡be ¡done ¡on ¡the ¡part ¡of ¡researchers ¡ ¡(even ¡the ¡ number ¡of ¡papers ¡with ¡no ¡experimental ¡validaAon ¡seems ¡to ¡be ¡dropping,) ¡

CONCLUSION ¡ ¡ ¡ ¡

SLIDE 38

They ¡want ¡to ¡enhance ¡researchers’ ¡ ability ¡to ¡report ¡on ¡soLware ¡ engineering ¡experimentaAon ¡so ¡that ¡ research ¡can ¡beEer ¡assist ¡industry ¡in ¡ selecAng ¡new ¡technology. ¡ ¡ ¡

GOAL ¡OF ¡THE ¡PAPER ¡ ¡

SLIDE 39

Does ¡a ¡paper ¡really ¡describe ¡the ¡effort ¡made ¡by ¡researchers ¡and ¡all ¡detailed ¡ steps ¡and ¡difficulAes ¡of ¡an ¡experiment ¡in ¡such ¡a ¡way ¡we ¡can ¡deduct ¡just ¡from ¡ these ¡documents ¡ ¡if ¡the ¡authors ¡use ¡a ¡validation ¡method ¡or ¡not. ¡What ¡do ¡you ¡ think ¡on ¡the ¡fact ¡that ¡there ¡is ¡not ¡raw ¡data ¡extracted ¡ ¡from ¡the ¡612 ¡papers. ¡ What ¡can ¡be ¡others ¡factors ¡(others ¡than ¡the ¡differences ¡of ¡ ¡the ¡publicaAon’s ¡ editor ¡the ¡program ¡commiEee) ¡which ¡can ¡affect ¡outcome ¡of ¡this ¡study. ¡ What ¡can ¡we ¡use ¡in ¡your ¡opinion ¡other ¡than ¡published ¡papers ¡to ¡understand ¡ how ¡research ¡in ¡software ¡engineering ¡is ¡validated. If ¡we ¡considered ¡the ¡point ¡that ¡ ¡some ¡technologies ¡were ¡validated ¡in ¡ ¡later ¡ publicaAons ¡, ¡and ¡they ¡are ¡considered ¡like ¡no ¡validated ¡in ¡the ¡evaluated ¡papers ¡, ¡ do ¡you ¡think ¡that ¡is ¡an ¡important ¡thing ¡to ¡considered. ¡ ¡ Did ¡you ¡already ¡experiment ¡one ¡of ¡these ¡12 ¡approaches ¡? ¡ ¡How ¡you ¡find ¡this ¡ approach ¡? ¡does ¡it ¡show ¡really ¡a ¡validaAon ¡of ¡new ¡technology ¡for ¡which ¡you ¡are ¡

ne ¡of ¡the ¡creators? ¡ ¡

Questions ¡ ¡ ¡

SLIDE 40

Experimental Models for Valida3ng Technology Marvin V. - - PowerPoint PPT Presentation

Thank ¡you ¡ ¡