Experimental Design & Evaluation 11. Controlled Experiment - - PowerPoint PPT Presentation

experimental design evaluation
SMART_READER_LITE
LIVE PREVIEW

Experimental Design & Evaluation 11. Controlled Experiment - - PowerPoint PPT Presentation

Experimental Design & Evaluation 11. Controlled Experiment SunyoungKim,PhD Todays agenda Hypothesis testing Threats Experimental Biases Hypothesis Testing Hypothesis testing How to prove a hypotheses in


slide-1
SLIDE 1

Experimental Design & Evaluation

  • 11. Controlled Experiment

SunyoungKim,PhD

slide-2
SLIDE 2

Today’s agenda

  • Hypothesis testing
  • Threats
  • Experimental Biases
slide-3
SLIDE 3

Hypothesis Testing

slide-4
SLIDE 4

Hypothesis testing

How to “prove” a hypotheses in science? In most cases, it is impossible to prove the hypothesis directly. This is done by disproving the null hypothesis.

  • Easier to disprove things, by counter-example
  • First we suppose the null hypothesis true: Null

hypothesis=opposite of hypothesis

  • Then a conflicting result is found
  • Disprove the null hypothesis – Hence, the hypothesis is proved
slide-5
SLIDE 5

Hypothesis testing

1. Perform statistical analysis 2. Draw conclusion 3. Communicate results

slide-6
SLIDE 6

T-test

1. Assume that the true means of the two populations are not different: Null Hypothesis (H0) 2. Compute the means of the two samples 3. Compute the difference between the two sample means 4. Compute the chance of observing this much difference: P-value 5. If the chance is low, this seems contradictory: P < 0.05 6. Thus, the assumption is unlikely to be true 7. Thus, the true means are different: H1: Alternative hypothesis

slide-7
SLIDE 7

Example) Kindle vs. iPad

Hypothesis: College students type faster using iPad’s keyboard than using Kindle’s keyboard.

  • Independent variable: Device (iPad or Kindle)
  • Dependent variable: Typing speed
  • Confounding variable: Prior technology experience
slide-8
SLIDE 8

Subject Subject Kindle T Kindle Time (s) ime (s) iPad T iPad Time (s) ime (s) 1 43 34 2 33 3 43 36 4 35 31 5 36 41 6 39 39 7 42 5 8 43 29 9 41 30 10 39 41 “College students type faster using iPad than Kindle, t(16) = 2.827, P = 0.012.”

slide-9
SLIDE 9

Example) Drinking Water

Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water.

  • Independent variable:
  • Dependent variable:
slide-10
SLIDE 10

bottom surface 1 0.43 0.415 2 0.266 0.238 3 0.567 0.39 4 0.531 0.41 5 0.707 0.605 6 0.716 0.609 7 0.651 0.632 8 0.589 0.523 9 0.469 0.411 10 0.723 0.612 “There is no difference in the concentration of Zinc at the bottom and the surface of the water, t(18) = 1.309, P = 0.207.”

slide-11
SLIDE 11

Example) Caffeine and Metabolism

A study of the effect of caffeine on muscle metabolism used eighteen male volunteers who each underwent arm exercise tests. Nine of the men were randomly selected to take a capsule containing pure caffeine one hour before the test. The other men received a placebo

  • capsule. During each exercise the subject's respiratory exchange ratio

(RER) was measured. (RER is the ratio of CO2 produced to O2 consumed and is an indicator of whether energy is being obtained from carbohydrates or fats).

  • Independent variable:
  • Dependent variable:
slide-12
SLIDE 12

Placebo Placebo Caf Caffeine feine 1 105 96 2 119 99 3 100 94 4 97 89 5 96 96 6 101 93 7 94 88 8 95 105 9 98 88

slide-13
SLIDE 13

What if you have more than two cases?

Use “One-way ANOVA”

  • While t-test is for comparing 2 means, ANOVA is for >2
  • Calculate ‘F’ ratio
slide-14
SLIDE 14

Example) Tar Contents in Cigarettes

We want to see whether the tar contents (in milligrams) for three different brands of cigarettes is different. Lab Precise took 6 samples from each of the three brands and got the following measurements:

  • Independent variable:
  • Dependent variable:
slide-15
SLIDE 15

Brand A Brand B Brand C 10.21 11.32 11.6 10.25 11.2 11.9 10.24 11.4 11.8 9.8 10.5 12.3 9.77 10.68 12.2 9.73 10.9 12.2 The three cigarette brands resulted in having different mean amount

  • f tar, F(2,15) = 65.464, P = 0.000
slide-16
SLIDE 16

Threats

slide-17
SLIDE 17

Threats to Your Findings

  • Validity
  • Reliability
slide-18
SLIDE 18

Validity

Validity is concerned with the study's success at measuring what the researchers set out to measure: how well a test measures what it is purported to measure?

  • Is it internally valid?
  • Is it externally valid?
slide-19
SLIDE 19

Internal Validity

How well an experiment is done, especially whether it avoids confounding (more than one possible independent variable [cause] acting at the same time)? The extent to which a causal conclusion based on a study is warranted

  • Differences (in means) should be a result of experimental factors

(e.g. what we are testing)

  • Variances in means result from differences in participants
  • Other variances are controlled or exist randomly
slide-20
SLIDE 20

Threats to Internal Validity

  • Ordering effects: Effects might be due to the test conditions
  • People learn, and people get tired
  • Don’t present tasks or interfaces in same order for all users
  • Randomize or counterbalance the ordering
  • Selection effects: Effects might be due to participant differences
  • Don’t use pre-existing groups
  • Randomly assign users to independent variables
  • Experimental bias: Effects might be due to the test conditions
  • Experimenter may be enthusiastic about interface X but not Y
slide-21
SLIDE 21

An industrial psychologist wants to study the effects of soft classical music on the productivity of a group of typists in a typing pool. At the beginning of the month, the psychologist meets with the typists to explain the study, gets their consent to play the music during the working day, and then begins to have music piped into the office where the typists work. At the end of the month, the typists' supervisor reports a 30% increase in the number of documents completed by the typing pool that month. "Soft music increases productivity," the psychologist concludes.

Internal Validity: Example

slide-22
SLIDE 22

External Validity

How generalizable is the result? The extent to which the results of a study can be generalized to other situations and to other people

  • Extent to which results can be generalized to broader context
  • Participants in your study are “representative”
  • Test conditions can be generalized to real world
slide-23
SLIDE 23

Threats to External Validity

  • Population: Findings are not generalizable to other people
  • Draw a random sample from your real target population
  • Ecological: Findings are not generalizable to other situations
  • Make lab conditions as realistic as possible in important

respects

  • Training
  • Training should mimic how real interface would be

encountered and learned

  • Task
  • Base your tasks on task analysis
slide-24
SLIDE 24

An educational researcher wants to study the effectiveness of a new method of teaching reading to first graders. The researcher asks all 30 of the first-grade teachers in a particular school district if they would like to receive training in the new method and then use it during the coming school year. Fourteen teachers volunteer to learn and use the new method; 16 teachers say that they would prefer to use their current approach. At the end of the school year, students who have been instructed with the new method have significantly higher average scores on a reading achievement test than students who have received more traditional reading instruction. "The new method is definitely better than the old one," the researcher concludes.

External Validity: Example

slide-25
SLIDE 25

Reliability

  • The quality of measurements
  • Reliability is the "consistency" or "repeatability" of your measures
slide-26
SLIDE 26

Threats to Reliability

  • Uncontrolled variation
  • Previous experience (e.g., Novice vs. Experts)
  • User differences
  • Task design: Do tasks measure what you try to measure?
  • Measurement error
  • Solutions
  • Eliminate uncontrolled variation
  • Repetition
slide-27
SLIDE 27

Validity vs. Reliability

slide-28
SLIDE 28

Threats to Your Findings

  • Internal Validity: Are observed results actually caused by the

independent variables?

  • External Validity: Can observed results be generalized to the

world outside the lab?

  • Reliability: Will consistent results be obtained by repeating

the experiment?

slide-29
SLIDE 29

Experimental Biasis

slide-30
SLIDE 30

Experimental Biases

  • Hawthorne effect
  • Experimenter effect
  • Placebo effect
  • Novelty effect
slide-31
SLIDE 31

Hawthorne Effect

  • The phenomenon that subject behavior changes by the mere fact

that they are being observed.

slide-32
SLIDE 32

Experimenter Effect

  • A researcher’s bias influences what they see
  • Example from Wikipedia: music backmasking
  • Once the subliminal lyrics are pointed out, they become
  • bvious
  • Dowsing
  • Not more likely than chance
  • The issue: If you expect to see something, maybe something in

that expectation leads you to see it

  • Solved via double-blind studies
slide-33
SLIDE 33

Placebo Effect

  • Subject expectancy
  • If you think the treatment, condition, etc has some benefit,

then it may

  • Placebo-based anti-depressants, muscle relaxants, etc.
  • In computing, an improved GUI, a better device,etc.
  • Steve Jobs: http://www.youtube.com/watch?v=8JZBLjxPBUU
  • Bill Buxton: http://www.youtube.com/watch?v=Arrus9CxUiA
slide-34
SLIDE 34

Novelty Effect

  • Typically with technology
  • Performance improves when technology is instituted because

people have increased interest in new technology

  • Examples: Computer-Assisted instruction in secondary schools,

computers in the classroom in general, etc.