It is better to observe than to criticise. Bobby Wellins (Jazz - - PowerPoint PPT Presentation

it is better to observe
SMART_READER_LITE
LIVE PREVIEW

It is better to observe than to criticise. Bobby Wellins (Jazz - - PowerPoint PPT Presentation

It is better to observe than to criticise. Bobby Wellins (Jazz Line-up, 13/2/2011) Teesside University, Social Futures Institute, seminar, 18/11/2015 1 Best of all is is to to co convey vey th the mag e magnitude nitude of


slide-1
SLIDE 1

Teesside University, Social Futures Institute, seminar, 18/11/2015 1

“It is better to observe than to criticise.”

– Bobby Wellins (Jazz Line-up, 13/2/2011)

slide-2
SLIDE 2

Teesside University, Social Futures Institute, seminar, 18/11/2015 2

“Best of all is is to to co convey vey th the mag e magnitude nitude

  • f
  • f th

the eff e effect ect an and d th the e de degr gree ee of

  • f ce

cert rtai ainty nty ex expl plicitly icitly.”

– Pinker (2014, p. 45)

slide-3
SLIDE 3

Teesside University, Social Futures Institute, seminar, 18/11/2015 3

“Usually wh what at on

  • ne

e wa wants nts to to kn know

  • w is

is no not t wh whether the cha ether the change nge ma make kes s an any di diff fferenc erence, e, bu but t to to kn know

  • w how

w li like kely ly it it is is th that the at the ch chan ange ge wi will ll be be bi big g en enou

  • ugh

gh.”

– (Landauer, 1997, p. 222)”)

slide-4
SLIDE 4

Teesside University, Social Futures Institute, seminar, 18/11/2015 4

Ma Magnitude gnitude-based based in infer erence ence in in be beha haviour vioural al resear search Paul ul van an Schaik haik

p.van an-sc schaik haik@t @tee ees.ac .ac.uk .uk http://sss p://sss-studne udnet.tees t.tees.a .ac.uk/p .uk/psy sycholog hology/staf /staff/P /Paul aul_vs/i vs/inde ndex.htm .htm

slide-5
SLIDE 5

Teesside University, Social Futures Institute, seminar, 18/11/2015 5

Ou Outline tline

  • Problem and proposed solution
  • Quantification in behavioural research
  • Statistical inference in behavioural

research

  • Magnitude-based inference
  • The application of magnitude-based

inference in behavioural research

  • Other approaches
  • Limitations
  • Recommendations
slide-6
SLIDE 6

Teesside University, Social Futures Institute, seminar, 18/11/2015 6

The e pr prob

  • blem

lem

A researcher conducts a study comparing two software designs in terms of their usability She conducts usability tests with two groups, each using one of the designs, and collects various measures These include perceived usability, error rate and time-

  • n-task

She then compares the two groups in terms of their mean scores on the measures, using a t test She finds that, although differences in mean scores are apparent, the test results do not show statistical significance What should the researcher conclude about the difference in usability between the two designs?

slide-7
SLIDE 7

Teesside University, Social Futures Institute, seminar, 18/11/2015 7

A pr proposed posed solution lution

As an altnernative to null-hypothesis significance- testing (NHST), use information about

  • uncertainty in the data,
  • the observed value of the effect and
  • smallest substantial values for the effect

to make two kinds of magnitude-based inference: mechanistic and practical Use the results of (NHST) as input Use spreadsheets available on the Internet to generate inferences Developed and influential in sport- and exercise science

slide-8
SLIDE 8

Teesside University, Social Futures Institute, seminar, 18/11/2015 8

Qua uantifi ntification cation in in us user rese search arch

  • “The systematic study of the goals, needs,

and capabilities of users so as to specify design, construction, or improvement of tools to benefit how users work and live” (Schumacher, 2009, p. 6)

  • Usability- and user-experience data
  • E.g. psychometric data, error rate and time-on-task
  • Formative research
  • users’ interaction with an artefact is studied to generate

data that, when analysed, provide information to inform system improvement

  • Summative research
  • establishes the quality interaction of an artefact in

comparison with another artefact or a benchmark

slide-9
SLIDE 9

Teesside University, Social Futures Institute, seminar, 18/11/2015 9

Sta tatistical tistical in inferen erence ce in in us user er re rese search arch

Usually, null-hypothesis significance testing (NHST) is used; limitations:

1. null hypothesis of no effect is (almost) always false 2. ignores the smallest important effect: has no effect on the inference that is made in NHST 3. does not address practical relevance; does not clearly define or distinguish practical and mechanistic significance 4. a non-significant result is inconclusive and a crude classification of inference is used (reject or retain H0) 5. sample size estimation is based on NHST

slide-10
SLIDE 10

Teesside University, Social Futures Institute, seminar, 18/11/2015 10

Me Merits its of

  • f magnitude

gnitude-based based in inference rence

  • 1. Requires the researcher to define smallest

important effect, rather than null effect

  • 2. Uses smallest important effect as integral part
  • f inference, so inferences are not an artefact
  • f sample size
  • 3. Provides a rigorous and principled approach

to infer practical significance; provides a rigorous distinction between practical and mechanistic significance

slide-11
SLIDE 11

Teesside University, Social Futures Institute, seminar, 18/11/2015 11

Mo More merits its

  • 4. Provides a more refined classification of

inferences that can be made than merely rejecting or retaining the null hypothesis

  • 5. Estimates of required sample size are based
  • n practical significance or mechanistic

significance and researcher-defined smallest important effect

slide-12
SLIDE 12

Teesside University, Social Futures Institute, seminar, 18/11/2015 12

Inf nference erence of

  • f me

mech chanistic anistic sig ignificance nificance (1 (1)

  • For descriptive purposes, an effect can be

classified in terms of its size

  • in relation to smallest important + and - effect size
  • as positive, trivial or negative
  • For inference proper, the chances of an effect

being positive, negative or trivial are used

  • The chances of the effect being positive: effect falling

above the threshold of the smallest important + effect

  • The chances of the effect being negative: effect falling

below the threshold of the smallest important - effect

  • The chances of a trivial effect: 100% minus the sum of

the chances of a + effect and those of a - effect

slide-13
SLIDE 13

Teesside University, Social Futures Institute, seminar, 18/11/2015 13

Inf nference erence of

  • f me

mech chanistic anistic sig ignificance nificance (2 (2)

  • An inference is then made from the chances of

each of three ranges of outcome (positivity, triviality and negativity) as follows

  • Unclear effect: both the chances of the obtained effect

being + and the chances of the effect being - effect are too large (e.g., both greater than the default value of 0.05

  • r other appropriate cut-offs).
  • Otherwise, clear effect, seen as substantially +, - or trivial

and considered to have the size of the observed value, with a qualification of probability

  • Proposed interpretation of probability ranges
slide-14
SLIDE 14

Teesside University, Social Futures Institute, seminar, 18/11/2015 14

Probability Chances Odds The effect … positive/trivial/negative beneficial/negligible/harmful <0; 0.005] <0; 0.5%] <0; 1:199] is almost certainly not … <0.005; 0.05] <0.5%; 5%] <1:199: 1:19] is very unlikely to be … <0.05; 0.25] <5%; 25%] <1:19; 1:3] is unlikely to be …, is probably not … <0.25; 0.75] <25%; 75%] <1:3; 3:1] is possibly (not) …, may (not) be … <0.75; 0.95] <75%; 95%] <3:1; 19:1] is likely to be ..., is probably … <0.95; 0.995] <95%; 99.5%] <19:1; 199:1] is very likely to be … <0.995; 1> <99.5; 100> <199:1; > is almost certainly …

slide-15
SLIDE 15

Teesside University, Social Futures Institute, seminar, 18/11/2015 15

slide-16
SLIDE 16

Teesside University, Social Futures Institute, seminar, 18/11/2015 16

slide-17
SLIDE 17

Teesside University, Social Futures Institute, seminar, 18/11/2015 17

Inf nference erence of

  • f pra

ractical ctical sig ignific nificance ance (1 (1)

  • For descriptive purposes, an effect can be

classified in terms of its size

  • in relation to smallest important beneficial and harmful

effect size

  • as beneficial, negligible or harmful
  • For inference proper, the chances of an effect

being beneficial, harmful or negligible are used

  • The chances of the effect being beneficial: effect falling

above the threshold of the smallest important ben. effect

  • The chances of the effect being harmful: effect falling

below the threshold of the smallest important harmf. effect

  • The chances of a negligible effect: 100% minus the sum of

the chances of a ben. effect and those of a harmf. effect

slide-18
SLIDE 18

Teesside University, Social Futures Institute, seminar, 18/11/2015 18

Inf nference erence of

  • f pra

ractical ctical sig ignific nificance ance (2 (2)

  • Type-1 practical error
  • analogous to that of Type-I error in NHST (rejecting the

null hypothesis when it is true)

  • Type-2 practical error
  • analogous to that of Type-II error in NHST (retaining the

null hypothesis when it is false)

  • In the practical (‘clinical’) application of effects
  • the chance of using a harmful effect (a Type-1 practical

error) needs to be far smaller than

  • the chance of not using a beneficial effect (a Type-2

practical error)

slide-19
SLIDE 19

Teesside University, Social Futures Institute, seminar, 18/11/2015 19

Inf nference erence of

  • f pra

ractical ctical sig ignific nificance ance (3 (3)

  • An inference is then made from the chances of

each of three ranges of outcome (benefit, negligibility and harm) as follows

  • If the chances of benefit are greater than the suggested

cut-off of 25% for a Type-2 practical error and the chances

  • f harm are greater than the suggested cut-off of 0.5% for

a Type-1 practical error then the effect is unclear

  • If the chances of benefit are greater than 25% and the

chances of harm are smaller than 0.5% then the effect is clearly beneficial

  • Otherwise, the effect is clearly negligible or harmful.
  • Proposed interpretation of probability ranges

as before

slide-20
SLIDE 20

Teesside University, Social Futures Institute, seminar, 18/11/2015 20

Exa xample mple fr from

  • m sp

sport

  • rt sc

science ience (1 (1)

  • I am grateful to Matt Weston for providing this

example

  • A sports researcher is interested in whether a new,

commercially available nutritional supplement has a beneficial or harmful effect on elite cyclists’ 40 km time trial performance (the faster the time, the better the performance)

  • The researcher conducts an experiment to examine

the effect of two different doses of the supplement (a low dose and a high dose)

  • Experimental crossover design
  • all of the cyclists perform the time trial under three different

conditions (placebo [no supplement], low dose and high dose),

  • in a counterbalanced manner and
  • the researcher’s experience led to the belief that the smallest

worthwhile change in 40 km time trial performance was -1%

slide-21
SLIDE 21

Teesside University, Social Futures Institute, seminar, 18/11/2015 21

Exa xample mple fr from

  • m sp

sport

  • rt sc

science ience (2 (2)

  • The mean (± SD) performance times
  • 59.5 ± 1.6 min (low dose),
  • 60.9 ± 2.2 min (high dose) and
  • 60.5 ± 1.9 min (placebo)
  • Magnitude-based inferences
  • calculate the chances of benefit (or harm), with reference to a

change of -1%

  • compared to placebo, the low dose performance improved by -

1.7% (90% confidence interval -2.4 to -0.9%) with a 92% chance of benefit and 0.0% chance of harm

  • a low dose of the supplement is therefore likely to be beneficial and

recommended

  • however, compared to placebo the high dose impaired

performance by 0.7% (90% confidence interval -0.1 to 1.5%) with a 0% chance of benefit and a 25% chance of harm

  • a high dose of the supplement is therefore most unlikely beneficial

and not recommended

slide-22
SLIDE 22

Teesside University, Social Futures Institute, seminar, 18/11/2015 22

Dem emonstratio

  • nstration
  • Example: unrelated t test
  • Mechanistic inference
  • Practical inference
  • Spreadsheets available at

http://www.sportsci.org/

slide-23
SLIDE 23

Teesside University, Social Futures Institute, seminar, 18/11/2015 23

Obse serva vation tions

  • Practical and mechanistic inference, but not

for statistical inference, depend on smallest worthwhile effect

  • The range of practical and mechanistic

inferences (e.g., “is very (un)likely to be harmful/trivial/beneficial”) is greater than that of statistical inference (dichotomous)

  • The results of practical and mechanistic

inference concur about half of the time with those of statistical inference; when the results differ, the latter is more conservative

  • Practical and mechanistic inference mostly

concur

slide-24
SLIDE 24

Teesside University, Social Futures Institute, seminar, 18/11/2015 24

Smallest harmful/

  • ive d

Smallest beneficial/ +ive d Total sample size (N) Sample size ratio P M S S/P S/M M/P

  • 0.2

0.2 268 274 788 2.94 2.88 1.02

  • 0.3

0.3 122 122 352 2.89 2.89 1.00

  • 0.4

0.4 70 70 198 2.83 2.83 1.00

  • 0.5

0.5 46 46 128 2.78 2.78 1.00

  • 0.6

0.6 34 32 90 2.65 2.81 0.94

  • 0.7

0.7 26 24 66 2.54 2.75 0.92

  • 0.8

0.8 22 20 52 2.36 2.60 0.91

  • 0.9

0.9 18 16 42 2.33 2.63 0.89

  • 1.0

1.0 14 14 34 2.43 2.43 1.00

  • 1.1

1.1 14 12 28 2.00 2.33 0.86

  • 1.2

1.2 14 10 24 1.71 2.40 0.71

slide-25
SLIDE 25

Teesside University, Social Futures Institute, seminar, 18/11/2015 25

Fu Further ther alt lternatives ernatives to

  • NH

NHST ST

  • Counter-null statistic (Rosenthal & Rubin,

1994)

  • prep (Killeen, 2005)
  • p-intervals (Cumming, 2008)
  • Minimum-effect tests (Murphy & Myors, 1999)
  • Equivalence-testing (Tryon, 2001)
  • Non-inferiority-testing (Head et al., 2014)
  • Bayesian statistics (Rouder et al., 2009)
slide-26
SLIDE 26

Teesside University, Social Futures Institute, seminar, 18/11/2015 26

Li Limita itations tions

  • Apparent
  • As in NHST, need to make several choices or accept

recommended choices

  • Confidence level
  • Type-1 and Type-2 practical-error rates
  • The smallest important effect
  • The mapping of quantitative probabilities onto qualitative

descriptors

  • As in NHST, assumptions about sampling distribution
  • f the outcome statistic; can use bootstrapping
  • Substantive

The decision rules do not necessarily take all relevant factors into account, for example the (financial) value of inputs to and outputs from using a harmful or beneficial effect (Murphy & Myors, 1999)

slide-27
SLIDE 27

Teesside University, Social Futures Institute, seminar, 18/11/2015 27

Recommendati commendations

  • ns
  • 1. Plan sample size using magnitude-based inference
  • 2. Analyse data using NHST; make better use of the

results as input for magnitude-based inference

  • 3. Always analyse data using mechanistic inference;

also use practical inference for effects where benefit and harm can be meaningfully defined

  • 4. Use appropriate spreadsheets for sample size

estimation and magnitude-based inference (http://www.sportsci.org/)

  • 5. When preparing for journal publication, cogently

argue why it is appropriate to use magnitude-based inference in your research; in your section Data Analysis explain the specific magnitude-based inference that you have used (see, e.g., Barnes et al., 2014)

slide-28
SLIDE 28

Teesside University, Social Futures Institute, seminar, 18/11/2015 28

So Some pu publ blica ications tions

Barnes, K. R., Hopkins, W. G., McGuigan, M. R., & Kilding, A. E. (2015). Warm-up with a weighted vest improves running performance via leg stiffness and running economy. Journal of Science and Medicine in Sport, 18, 103-108. doi:10.1016/j.jsams.2013.12.005 Batterham, A. M., & Hopkins, W. G. (2006). Making meaningful inferences about magnitudes. International Journal of Sports Physiology and Performance, 1(1), 50-57. Hopkins, W. G. (2006). Estimating sample size for magnitude-based

  • inference. Sport Science, 10, 63-70.

Hopkins, W. G. (2006). Spreadsheets for analysis of controlled trials, with adjustment for a subject characteristic. Sport Science, 10, 46-50. Hopkins, W. G., Marshall, S. W., Batterham, A. M., & Hanin, J. (2009). Progressive statistics for studies in sports medicine and exercise

  • science. Medicine and Science in Sports and Exercise, 41(1), 3-
  • 12. doi:10.1249/MSS.0b013e31818cb278

Schaik, P. van & Weston, M. (2016). Magnitude-based inference and its application in user research. International Journal of Human- Computer Studies, 88, 38-50.