Experimental Design & Evaluation 10. Controlled Experiment - - PowerPoint PPT Presentation

experimental design evaluation
SMART_READER_LITE
LIVE PREVIEW

Experimental Design & Evaluation 10. Controlled Experiment - - PowerPoint PPT Presentation

Experimental Design & Evaluation 10. Controlled Experiment SunyoungKim,PhD Last week Prototyping Recap. What is Prototyping? Prototypes are experimental and incomplete designs which are cheaply and fast developed An


slide-1
SLIDE 1

Experimental Design & Evaluation

  • 10. Controlled Experiment

SunyoungKim,PhD

slide-2
SLIDE 2

Last week

  • Prototyping
slide-3
SLIDE 3
  • Recap. What is Prototyping?
  • Prototypes are experimental and incomplete designs which are

cheaply and fast developed

  • An integral part of iterative user-centered design
  • It enables designers to try out their ideas with users and to gather

feedback

slide-4
SLIDE 4
  • Recap. Sketches vs. Prototypes
  • Sketches
  • Early ideation

stages of design

  • Exploring ideas
  • Prototypes
  • Capturing/detailing

the actual design

  • Testing ideas
slide-5
SLIDE 5
  • Recap. Wireframe (Mid-fi prototype)
  • Using computer-based tools (e.g., balsamiq, wireframe.cc)
  • Take more time and effort but look more formal and refined: more

detailed than sketches

  • Interactivity can be simulated

à You don’t need to make these things pretty but you do need to include enough detail to see how the system performs à Force users to view it as a draft or work in progress, rather than a polished and finished product à Prototype a high visual fidelity (e.g., done in Photoshop) makes the user to focus on the visual design and look and feel, including color, fonts, layout, logo and images

slide-6
SLIDE 6
  • Recap. High fidelity
  • The most realistic but time-intensive
  • The only way to create high-fidelity prototypes used to actually code

using a programming language; these days, you can create high-fidelity prototypes that simulate the Functionality of the final product without coding (e.g., Axure, iRise, omni graffle)

  • Appropriate when high visual and functional fidelity is required
  • An excellent reference for developers
  • Tools: https://www.cooper.com/prototyping-tools
slide-7
SLIDE 7
  • Recap. Wizard of Oz
  • A rapid-prototyping method for systems costly to build or

requiring new technology. A human “Wizard” simulates the system’s intelligence and interacts with the user through a real or mock computer interface.

  • Makes it possible to test functionality that does not yet

exist

  • Can simulate different system behaviors and test result

(e.g., speed of ticket from input to output)

  • Can simulate errors and test result
  • Common in areas such as intelligent agents, human-

robotic interaction

slide-8
SLIDE 8

Today’s agenda

  • Controlled experiments
  • Hypothesis testing
  • Threats
slide-9
SLIDE 9

Controlled Experiment

slide-10
SLIDE 10
  • Controlled (lab) experiment
  • Field Experiment

HCI Research Methods

slide-11
SLIDE 11
  • Controlled (lab) experiment
  • Create a situation with desired conditions
  • Manipulate some variables while controlling
  • thers
  • Examine the dependent variable
  • Field Experiment
  • Conduct study in a natural setting
  • Manipulate some variables
  • Controlling other variables won’t be permitted
  • Examine the dependent variables

HCI Research Methods

slide-12
SLIDE 12

A test of the effect of a single variable by changing it while keeping all other variables the same. A controlled experiment generally compares the results obtained from an experimental sample against a control sample.

Controlled Experiment

slide-13
SLIDE 13
  • 1. Create a situation with desired conditions
  • 2. Manipulate some variables while controlling others
  • 3. Examine the dependent variable

Controlled Experiment

slide-14
SLIDE 14
  • 1. State a research question(s)
  • 2. State a testable hypothesis
  • 3. Identify independent and dependent variables
  • 4. Design the experimental protocol
  • 5. Choose the user population
  • 6. Run an experiment

1) Manipulate an independent variable 2) Measure dependent variables 3) Use statistical tests to accept or reject the hypothesis

Designing an Experiment

slide-15
SLIDE 15

Protocol

slide-16
SLIDE 16

It's essential to develop a research question to focus your research. 1. Choose an appropriate topic or issue for your research 2. List all of the questions that you'd like answered yourself 3. Choose the best question, one that is neither too broad nor too narrow

  • What is the 1994 rate of juvenile delinquency in the U.S.?
  • What can we do to reduce juvenile delinquency in the U.S.?
  • Does education play a role in reducing juvenile delinquents' return

to crime?

Research Question

slide-17
SLIDE 17
  • How does the temperature of sea water affect the amount of

calcium carbonate that can be dissolved in it?

  • What can be done to stop the pH of the ocean changing?
  • How does the amount of light influence the rate of algal

growth?

  • Do paper bags biodegrade faster than plastic grocery bags?
  • What type of packaging preserves antioxidant activity in food

the best?

  • How can chemicals be used to reduce the spread of bacteria?

Example Research Questions

slide-18
SLIDE 18

A statement of the predicted or expected relationship between at least two variables

  • A provisional answer to a research question
  • Has to define the variables involved
  • Has to define a relationship
  • Example
  • Research question: Will a lower pH of seawater increase the rate of ice

melting?

  • Hypothesis: An increase in the number of ions in solution will increase

the rate that water molecules move from a solid into a liquid state

Hypothesis

slide-19
SLIDE 19

A statement of the predicted or expected relationship between at least two variables

  • Research question: How does having information on the context of a caller

affect whether the receiver picks up the call?

  • Hypothesis: Receivers will be more likely to pick up a call when they have

information of their callers’ context than they will be when they do not.

Hypothesis

Variable 1 Variable 1

Relationship Information of caller’s context

Call pickup

Improve

slide-20
SLIDE 20
  • Testable: The means for manipulating the variables and/or

measuring the outcome variable must potentially exist

  • Falsifiable: Must be able to disprove the hypothesis with data
  • Parsimonious: Should be stated in simplest adequate form Precise:

Should be specific (operationalized)

  • Useful: Relate to existing theories and/or “point” toward new
  • theories. It should lead to studies beyond the present one (often

hard to determine in advance)

Good Hypothesis

slide-21
SLIDE 21

“iPad is better than Kindles”: Is it testable hypothesis?

Hypothesis

slide-22
SLIDE 22

“iPad is better than Kindles”: Is it testable hypothesis?

No! because:

  • Broad questions are not testable
  • Broad questions can be investigated by posing multiple narrow testable

questions

Hypothesis

slide-23
SLIDE 23

“iPad is better than Kindles”: Is it testable hypothesis?

No! because: You are unclear with

  • What feature?
  • What task?
  • What measurement?
  • What population?

Hypothesis

slide-24
SLIDE 24

“College students (population) type (task) faster (measurement) using iPad’s keyboard (feature) than using Kindle’s keyboard” * Can still be even more narrow e.g., in a classroom

Hypothesis

slide-25
SLIDE 25

Variables

Variable 1 Variable 2

Relationship

Independent variable: What you manipulate Dependent variable: What you measure

slide-26
SLIDE 26
  • Independent variable: Things we want to compare
  • Dependent variable: Things we want to measure
  • Confounding variable: Things that correlate with the independent

+ dependent variable

Variables

slide-27
SLIDE 27

Independent Variable

  • IV. What you very
  • Independent of participant behavior
  • Examples: interface, visual layout, gender, age
  • Test conditions: levels, or value of an IV
  • Provide a name for both IV and its levels (test conditions)
slide-28
SLIDE 28

Dependent Variable

  • DV, what you measure
  • User performance time
  • Accuracy, errors
  • Subjective satisfaction
slide-29
SLIDE 29

Confounding Variable

  • A confounding variable is one that provides an alternative

explanation for the thing we are trying to explain with our IVs.

  • Example: we want to compare two systems (windows 7 vs. 8)
  • All participants have prior experience with windows 7, but no

experience with windows 8

  • “Prior experience” is a confounding variable
  • A major issue in observation studies is that we often don't always

know what the potential confounding factors may be.

slide-30
SLIDE 30
  • How does the temperature of sea water affect the amount of

calcium carbonate that can be dissolved in it?

  • What can be done to stop the pH of the ocean changing?
  • How does the amount of light influence the rate of algal

growth?

  • Do paper bags biodegrade faster than plastic grocery bags?
  • What type of packaging preserves antioxidant activity in food

the best?

  • How can chemicals be used to reduce the spread of bacteria?

Example Research Questions

slide-31
SLIDE 31

Hypothesis

“College students (population) type (task) faster (measurement) using iPad’s keyboard (feature) than using Kindle’s keyboard”

  • Independent variable
  • Dependent variable
  • Control variable
  • Confounding variable
slide-32
SLIDE 32

Hypothesis

“College students (population) type (task) faster (measurement) using iPad’s keyboard (feature) than using Kindle’s keyboard”

  • Independent variable: device (iPad or Kindle)
  • Dependent variable: typing speed
  • Control variable: College students
  • Confounding variable:

Prior technology experience

slide-33
SLIDE 33

Causality vs. Correlation

  • Causal: One variable depends on and is affected by the other
  • Correlational: Two variables are affected by a third variable in the

same direction

Hypothesis testing

slide-34
SLIDE 34

Causality vs. Correlation

Variable 1 Variable 2 Variable 3

Independent variable Dependent variable Dependent variable Correlation

slide-35
SLIDE 35

Causal

slide-36
SLIDE 36

Correlational

slide-37
SLIDE 37

Causality vs. Correlation

Variable 1 Variable 2 Variable 3

Independent variable Correlation Dependent variable Dependent variable

slide-38
SLIDE 38

For studies examining the relationships between variables such as personality traits, work habits, gender, etcetera, the hypothesis is a specific statement about relationships

  • If when we observe an increase in X, then we will also observe and

increase (or decrease) in Y

  • Example questions;
  • Is there a relationship between smoking and lung cancer?
  • Is there a relationship between anxiety and test-taking performance?
  • Correlation does NOT imply causation

Correlational Design

slide-39
SLIDE 39

What tasks should each subject perform? – E.g., what sentence to type Which conditions should each subject be tested on? – Within subject vs. between subject What to measure? – E.g., Task completion time

Protocol

slide-40
SLIDE 40

Within subject

  • Each subject is tested on all conditions
  • Each person is his or her own control
  • Need to deal with “order effects”
  • Longer experiment time but need less subjects

Between subject

  • Each subject is tested on one condition
  • Simpler design and analysis
  • Shorter experiment time but need more subjects
  • Easier to avoid bias

Within vs. Between Subjects

slide-41
SLIDE 41
  • Each subject is tested on one condition
  • Simpler design and analysis
  • Shorter experiment time but need more subjects
  • Easier to avoid bias

Between Subjects

slide-42
SLIDE 42
  • Each subject is tested on all conditions
  • Each person is his or her own control
  • Frequently used in HCI research

Within Subjects

slide-43
SLIDE 43

Within Subjects: Advantages

  • Fewer subjects
  • Less time
  • Less expensive
  • Increased control of subjects variability: comparisons between

conditions happen within each subject

  • More power to detect significant difference
slide-44
SLIDE 44

Within Subjects: Disadvantages

  • Learning effect
  • Carryover effect
  • Fatigue, boredom

Solutions:

  • More practice before testing; randomization; counterbalancing

(Latin square)

  • Have rest between tasks
  • Limit the testing time
slide-45
SLIDE 45

Defeats ordering effects by varying order of conditions systematically (or not randomly)

  • Particularly important in within-subject designs
  • Latin Square Design
  • Randomly assign subjects to equal-size groups
  • A, B, C... are the experimental conditions
  • Latin Square ensures that each condition occurs in every position in

the ordering from an equal number of users

Counter-balancing

slide-46
SLIDE 46
  • Target population: E.g., College students
  • Sample from this population: E.g., UMD students
  • Inclusion criteria
  • Exclusion criteria

Population

slide-47
SLIDE 47

How many subjects do we need? – Depends on how diverse the population is How do we know we have enough subjects? – At the very least when there’s statistical significance

Population

slide-48
SLIDE 48
  • Choose participants randomly from the entire population
  • Allows generalization to population
  • Randomization allows the later use of probability theory and gives

a solid foundation for statistical analysis

  • Avoid bias

Random Assignment

  • Random does not mean haphazardly
  • One needs to explicitly randomize: Random assignment at arrival,

counterbalancing, matching

Random Sampling

slide-49
SLIDE 49

Hypothesis Testing

slide-50
SLIDE 50

Hypothesis testing

How to “prove” a hypotheses in science? In most cases, it is impossible to prove the hypothesis directly. This is done by disproving the null hypothesis.

  • Easier to disprove things, by counter-example
  • First we suppose the null hypothesis true: Null

hypothesis=opposite of hypothesis

  • Then a conflicting result is found
  • Disprove the null hypothesis – Hence, the hypothesis is proved
slide-51
SLIDE 51

Hypothesis testing

1. Perform statistical analysis 2. Draw conclusion 3. Communicate results

slide-52
SLIDE 52

Example) Kindle vs. iPad

Hypothesis: College students type faster using iPad’s keyboard than using Kindle’s keyboard.

  • Independent variable: Device (iPad or Kindle)
  • Dependent variable: Typing speed
  • Control variable: College students
  • Confounding variable:

Prior technology experience

slide-53
SLIDE 53

Subject Subject Kindle T Kindle Time (s) ime (s) iPad T iPad Time (s) ime (s) 1 43 34 2 33 3 43 36 4 35 31 5 36 41 6 39 39 7 42 5 8 43 29 9 41 30 10 39 41

slide-54
SLIDE 54

Subject Subject Kindle T Kindle Time (s) ime (s) iPad T iPad Time (s) ime (s) 1 43 34 2 33 3 43 36 4 35 31 5 36 41 6 39 39 7 42 5 8 43 29 9 41 30 10 39 41

Between-subjects or Within-Subjects?

slide-55
SLIDE 55

Cleanup Data

  • Are there outliers?
  • Are there junk or missing data?
  • Some participants may have fallen asleep
  • Some participants may do random things just to earn the

money

  • Recoding devices may have failed a couple times
slide-56
SLIDE 56

Subject Subject Kindle T Kindle Time (s) ime (s) iPad T iPad Time (s) ime (s) 1 43 34 2 33 3 43 36 4 35 31 5 36 41 6 39 39 7 42 5 8 43 29 9 41 30 10 39 41

slide-57
SLIDE 57

32 33 34 35 36 37 38 39 40 41 Kindle iPad

slide-58
SLIDE 58

Statistical Questions

  • Descriptive questions (What)
  • What is the typical performance
  • How large are the differences between individuals?
  • Analytical questions (yes or no)
  • Is there a difference
  • Is the difference large or small?
  • Is the difference significant or due to chance?
slide-59
SLIDE 59

Statistical Tools

  • Descriptive statistics (what): Can’t be generalized
  • Mean
  • Median
  • Standard deviation
  • Correlation
  • Regression
  • Analytical (or Inferential) statistics (yes or no): Allows to generalize

the findings beyond our data

  • T-test
  • ANOVA
slide-60
SLIDE 60
  • Is this difference significant?
  • What if there’s no inherent difference?
  • If two are the same, can we get this result by chance?
  • We may happen to select those with higher values from one group

and those with lower values from another group.

Statistical Questions

32 33 34 35 36 37 38 39 40 41 Kindle iPad

slide-61
SLIDE 61

T-test

T-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups.

slide-62
SLIDE 62

T =

variance between groups variance within groups ( )

32 33 34 35 36 37 38 39 40 41 Kindle iPad

slide-63
SLIDE 63

Subject Subject Kindle T Kindle Time (s) ime (s) iPad T iPad Time (s) ime (s) 1 43 34 2 33 3 43 36 4 35 31 5 36 41 6 39 39 7 42 5 8 43 29 9 41 30 10 39 41

Average verage 40.1 40.1 34.9 34.9 Deviation Deviation 3.1 3.1 4.6 4.6

slide-64
SLIDE 64

T =

1.6 0.93

= 1.7

32 33 34 35 36 37 38 39 40 41 Kindle iPad

slide-65
SLIDE 65

How do we know the t-value is big enough to show a difference?

  • Each t-value has a corresponding p-value.
  • The p-value tells us the likelihood that there is a real difference.
slide-66
SLIDE 66

P value

The probability that the pattern of data in the sample could be produced by random data

  • It tells us how likely is an event to occur by chance.
  • How likely is “very likely”?
  • If p = 0.10, there is a 10% chance to get this result with a random

data

  • If p = 0.05, there is a 5% chance
  • If p = 0.01, there is a 1% chance
  • By convention, we consider “significant” those differences that
  • ccur less than .05 (5%) by chance alone.
slide-67
SLIDE 67

T =

1.6 0.93

= 1.7 (p=0.02)

32 33 34 35 36 37 38 39 40 41 Kindle iPad

There is a significant difference between two populations.

slide-68
SLIDE 68

T-test

1. Assume that the true means of the two populations are not different 2. Compute the means of the two samples 3. Compute the difference between the two sample means 4. Compute the chance of observing this much difference 5. If the chance is low, this seems contradictory. 6. Thus, the assumption is unlikely to be true. 7. Thus, the true means are different..

slide-69
SLIDE 69

T-test

1. Assume that the true means of the two populations are not different: Null Hypothesis (H0) 2. Compute the means of the two samples 3. Compute the difference between the two sample means 4. Compute the chance of observing this much difference 5. If the chance is low, this seems contradictory 6. Thus, the assumption is unlikely to be true 7. Thus, the true means are different

slide-70
SLIDE 70

T-test

1. Assume that the true means of the two populations are not different 2. Compute the means of the two samples 3. Compute the difference between the two sample means 4. Compute the chance of observing this much difference: P-value 5. If the chance is low, this seems contradictory 6. Thus, the assumption is unlikely to be true 7. Thus, the true means are different

slide-71
SLIDE 71

T-test

1. Assume that the true means of the two populations are not different 2. Compute the means of the two samples 3. Compute the difference between the two sample means 4. Compute the chance of observing this much difference 5. If the chance is low, this seems contradictory 6. Thus, the assumption is unlikely to be true 7. Thus, the true means are different: H1: Alternative hypothesis

slide-72
SLIDE 72

T-test

1. Assume that the true means of the two populations are not different 2. Compute the means of the two samples 3. Compute the difference between the two sample means 4. Compute the chance of observing this much difference 5. If the chance is low, this seems contradictory: P < 0.05 6. Thus, the assumption is unlikely to be true 7. Thus, the true means are different

slide-73
SLIDE 73

P < 0.05

  • Woohoo!
  • E.g. p=0.001
  • P<0.05
  • Found a “statistically significant” difference
  • Effect is likely to be resulted from IV
  • Null hypothesis rejected
  • Hypothesis confirmed
slide-74
SLIDE 74

P >= 0.05

  • P>0.05 (e.g., p=0.25)
  • We conclude effect could have happened by chance
  • Cannot say that IV effected DV
  • Hence, no difference?
  • Not sure
  • Did not detect a difference, but could still be different
  • Potential real effect did not overcome random variation
  • Boring, basically found nothing
  • How?
  • Not enough users
  • Measuring of DVs are not accurate enough
  • Need better tasks, data, …
slide-75
SLIDE 75

Group Project

slide-76
SLIDE 76

Group Project: Wireframe and Flowchart

Among the sketches you created in the last assignment, you either pick some, combine some, or update some. And then, come up with a final set of wireframe with a flowchart. Wireframe

  • Use a software of your choice.
  • Recommended: Wireframe.cc https://wireframe.cc/
  • Recommended: Balsamiq https://balsamiq.com/products/
  • Recommended: Indigo studio https://www.infragistics.com/products/indigo-studio
  • Check for more:

http://mashable.com/2010/07/15/wireframing-tools/#oqegDW3EXZqq

  • Do not apply colors
  • Don't focus on the details, look&feel and visual details, but focus on the

content structure, workflow, and systems usability.

slide-77
SLIDE 77

Flowchart

  • Use a software of your choice
  • Recommended: Powerpoint
  • Recommended: https://www.draw.io/

Turn in: a PDF with

  • Your project proposal on top
  • A link to your Website with wireframe and a flowchart
  • Wireframe of your entire system
  • Flowchart
  • A link to a working hi-fi prototype (optional)
  • State which team members contributed to which part
  • Due by midnight 11/13

#Disclaimer. Further instruction of this submission can be given verbally during class or through Piazza.

Group Project: Wireframe and Flowchart

slide-78
SLIDE 78

Create a hi-fi prototype

  • Recommended to use Indigo Studio from Infragistics as our hi-fi prototyping

tool.

  • Download the Free 30 Day Trial License for Infragistic’s Indigo Studio to your

computer (http://indigo.infragistics.com/).

  • You can get a free academic license to use it beyond the 30 day trial using

your Rutgers.edu email address to download and install you’re a Free 1-Year Academic License for Infragistics Indigo Studio here (http:// www.infragistics.com/products/indigo-studio/indigo-academic-license)

#Disclaimer. Further instruction of this submission can be given verbally during class or through Piazza.

Group Project: Hi-Fi prototype (optional)

slide-79
SLIDE 79

Rubric

  • Wireframe (6pt)
  • If the wireframe does not miss any screens that are important part of the

system (2pt)

  • If each screen in the wireframe clearly demonstrates what it's about (2pt)
  • If each screen has any usability problems (2pt)
  • Workflow (4pt)
  • If the workflow does not miss any steps that are important an part of the

system (2pt)

  • If the workflow clearly demonstrates how a user would navigate through a

system (1pt)

  • If the workflow does not have any usability problem (1pt)

#Disclaimer. Further instruction of this submission can be given verbally during class or through Piazza.

Group Project: Hi-Fi prototype (optional)

slide-80
SLIDE 80

Recap

slide-81
SLIDE 81

A test of the effect of a variable(s) by changing it while keeping all other variables the same. A controlled experiment generally compares the results obtained from an experimental sample against a control sample.

Recap: Controlled Experiment

slide-82
SLIDE 82
  • 1. State a research question(s)
  • 2. State a testable hypothesis
  • 3. Identify independent and dependent variables
  • 4. Design the experimental protocol
  • 5. Choose the user population
  • 6. Run an experiment

1) Manipulate an independent variable 2) Measure dependent variables 3) Use statistical tests to accept or reject the hypothesis

Recap: Designing an Experiment

slide-83
SLIDE 83

Recap: Confounding Variable

  • A confounding variable is one that provides an alternative

explanation for the thing we are trying to explain with our IVs.

  • Example: we want to compare two systems (windows 7 vs. 8)
  • All participants have prior experience with windows 7, but no

experience with windows 8

  • “Prior experience” is a confounding variable
  • A major issue in observation studies is that we often don't always

know what the potential confounding factors may be.

slide-84
SLIDE 84

Causality vs. Correlation

  • Causal: One variable depends on and is affected by the other
  • Correlational: Two variables are affected by a third variable in the

same direction

Recap: Hypothesis testing

slide-85
SLIDE 85

Within subject

  • Each subject is tested on all conditions
  • Each person is his or her own control
  • Need to deal with “order effects”
  • Longer experiment time but need less subjects

Between subject

  • Each subject is tested on one condition
  • Simpler design and analysis
  • Shorter experiment time but need more subjects
  • Easier to avoid bias

Recap: Within vs. Between Subjects

slide-86
SLIDE 86

Recap: Within Subjects: Disadvantages

  • Learning effect
  • Carryover effect
  • Fatigue, boredom

Solutions:

  • More practice before testing; randomization; counterbalancing

(Latin square)

  • Have rest between tasks
  • Limit the testing time
slide-87
SLIDE 87

Recap: Statistical Questions

  • Descriptive questions (What)
  • What is the typical performance
  • How large are the differences between individuals?
  • Analytical questions (yes or no)
  • Is there a difference
  • Is the difference large or small?
  • Is the difference significant or due to chance?
slide-88
SLIDE 88

Recap: T-test

1. Assume that the true means of the two populations are not different: Null Hypothesis (H0) 2. Compute the means of the two samples 3. Compute the difference between the two sample means 4. Compute the chance of observing this much difference: P-value 5. If the chance is low, this seems contradictory: P < 0.05 6. Thus, the assumption is unlikely to be true 7. Thus, the true means are different: H1: Alternative hypothesis

slide-89
SLIDE 89

Recap: P value

The probability that the pattern of data in the sample could be produced by random data

  • It tells us how likely is an event to occur by chance.
  • How likely is “very likely”?
  • If p = 0.10, there is a 10% chance to get this result with a random

data

  • If p = 0.05, there is a 5% chance
  • If p = 0.01, there is a 1% chance
  • By convention, we consider “significant” those differences that
  • ccur less than .05 (5%) by chance alone.