Experimental Design & Evaluation 11. Controlled Experiment - PowerPoint PPT Presentation

Experimental Design & Evaluation 11. Controlled Experiment Sunyoung�Kim,�PhD�

Today’s agenda Hypothesis testing • Threats • Experimental Biases •

Hypothesis Testing

Hypothesis testing How to “prove” a hypotheses in science? In most cases, it is impossible to prove the hypothesis directly. This is done by disproving the null hypothesis. Easier to disprove things, by counter-example • First we suppose the null hypothesis true: Null • hypothesis=opposite of hypothesis Then a conflicting result is found • Disprove the null hypothesis – Hence, the hypothesis is proved •

Hypothesis testing 1. Perform statistical analysis 2. Draw conclusion 3. Communicate results

T-test 1. Assume that the true means of the two populations are not different: Null Hypothesis (H0) 2. Compute the means of the two samples 3. Compute the difference between the two sample means 4. Compute the chance of observing this much difference: P-value 5. If the chance is low, this seems contradictory: P < 0.05 6. Thus, the assumption is unlikely to be true 7. Thus, the true means are different: H1: Alternative hypothesis

Example) Kindle vs. iPad Hypothesis: College students type faster using iPad’s keyboard than using Kindle’s keyboard. Independent variable: Device (iPad or Kindle) • Dependent variable: Typing speed • Confounding variable: Prior technology experience •

Subject Subject Kindle T Kindle Time (s) ime (s) iPad T iPad Time (s) ime (s) 1 43 34 2 33 3 43 36 4 35 31 5 36 41 6 39 39 7 42 5 8 43 29 9 41 30 10 39 41 “College students type faster using iPad than Kindle, t(16) = 2.827, P = 0.012.”

Example) Drinking Water Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water. Independent variable: • Dependent variable: •

bottom surface 1 0.43 0.415 2 0.266 0.238 3 0.567 0.39 4 0.531 0.41 5 0.707 0.605 6 0.716 0.609 7 0.651 0.632 8 0.589 0.523 9 0.469 0.411 10 0.723 0.612 “There is no difference in the concentration of Zinc at the bottom and the surface of the water, t(18) = 1.309, P = 0.207.”

Example) Caffeine and Metabolism A study of the effect of caffeine on muscle metabolism used eighteen male volunteers who each underwent arm exercise tests. Nine of the men were randomly selected to take a capsule containing pure caffeine one hour before the test. The other men received a placebo capsule. During each exercise the subject's respiratory exchange ratio (RER) was measured. (RER is the ratio of CO2 produced to O2 consumed and is an indicator of whether energy is being obtained from carbohydrates or fats). Independent variable: • Dependent variable: •

Placebo Placebo Caf Caffeine feine 1 105 96 2 119 99 3 100 94 4 97 89 5 96 96 6 101 93 7 94 88 8 95 105 9 98 88

What if you have more than two cases? Use “One-way ANOVA” While t-test is for comparing 2 means, ANOVA is for >2 • Calculate ‘F’ ratio •

Example) Tar Contents in Cigarettes We want to see whether the tar contents (in milligrams) for three different brands of cigarettes is different. Lab Precise took 6 samples from each of the three brands and got the following measurements: Independent variable: • Dependent variable: •

Brand A Brand B Brand C 10.21 11.32 11.6 10.25 11.2 11.9 10.24 11.4 11.8 9.8 10.5 12.3 9.77 10.68 12.2 9.73 10.9 12.2 The three cigarette brands resulted in having different mean amount of tar, F(2,15) = 65.464, P = 0.000

Threats

Threats to Your Findings Validity • Reliability •

Validity Validity is concerned with the study's success at measuring what the researchers set out to measure: how well a test measures what it is purported to measure? Is it internally valid? • Is it externally valid? •

Internal Validity How well an experiment is done, especially whether it avoids confounding (more than one possible independent variable [cause] acting at the same time)? The extent to which a causal conclusion based on a study is warranted Differences (in means) should be a result of experimental factors • (e.g. what we are testing) Variances in means result from differences in participants • Other variances are controlled or exist randomly •

Threats to Internal Validity Ordering effects: Effects might be due to the test conditions • o People learn, and people get tired o Don’t present tasks or interfaces in same order for all users o Randomize or counterbalance the ordering Selection effects: Effects might be due to participant differences • o Don’t use pre-existing groups o Randomly assign users to independent variables Experimental bias: Effects might be due to the test conditions • o Experimenter may be enthusiastic about interface X but not Y

Internal Validity: Example An industrial psychologist wants to study the effects of soft classical music on the productivity of a group of typists in a typing pool. At the beginning of the month, the psychologist meets with the typists to explain the study, gets their consent to play the music during the working day, and then begins to have music piped into the office where the typists work. At the end of the month, the typists' supervisor reports a 30% increase in the number of documents completed by the typing pool that month. "Soft music increases productivity," the psychologist concludes.

External Validity How generalizable is the result? The extent to which the results of a study can be generalized to other situations and to other people Extent to which results can be generalized to broader context • Participants in your study are “representative” • Test conditions can be generalized to real world •

Threats to External Validity Population: Findings are not generalizable to other people • o Draw a random sample from your real target population Ecological: Findings are not generalizable to other situations • o Make lab conditions as realistic as possible in important respects Training • o Training should mimic how real interface would be encountered and learned Task • o Base your tasks on task analysis

External Validity: Example An educational researcher wants to study the effectiveness of a new method of teaching reading to first graders. The researcher asks all 30 of the first-grade teachers in a particular school district if they would like to receive training in the new method and then use it during the coming school year. Fourteen teachers volunteer to learn and use the new method; 16 teachers say that they would prefer to use their current approach. At the end of the school year, students who have been instructed with the new method have significantly higher average scores on a reading achievement test than students who have received more traditional reading instruction. "The new method is definitely better than the old one," the researcher concludes.

Reliability The quality of measurements • Reliability is the "consistency" or "repeatability" of your measures •

Threats to Reliability Uncontrolled variation • o Previous experience (e.g., Novice vs. Experts) o User differences o Task design: Do tasks measure what you try to measure? o Measurement error Solutions • Eliminate uncontrolled variation • Repetition •

Validity vs. Reliability

Threats to Your Findings Internal Validity: Are observed results actually caused by the • independent variables? External Validity: Can observed results be generalized to the • world outside the lab? Reliability: Will consistent results be obtained by repeating • the experiment?

Experimental Biasis

Experimental Biases Hawthorne effect • Experimenter effect • Placebo effect • Novelty effect •

Hawthorne Effect The phenomenon that subject behavior changes by the mere fact • that they are being observed.

Experimenter Effect A researcher’s bias influences what they see • Example from Wikipedia: music backmasking • Once the subliminal lyrics are pointed out, they become • obvious Dowsing • Not more likely than chance • The issue: If you expect to see something, maybe something in • that expectation leads you to see it Solved via double-blind studies •

Placebo Effect Subject expectancy • If you think the treatment, condition, etc has some benefit, • then it may Placebo-based anti-depressants, muscle relaxants, etc. • In computing, an improved GUI, a better device,etc. • Steve Jobs: http://www.youtube.com/watch?v=8JZBLjxPBUU • Bill Buxton: http://www.youtube.com/watch?v=Arrus9CxUiA •

Novelty Effect Typically with technology • Performance improves when technology is instituted because • people have increased interest in new technology Examples: Computer-Assisted instruction in secondary schools, • computers in the classroom in general, etc.

Experimental Design & Evaluation 11. Controlled Experiment - PowerPoint PPT Presentation

Experimental Design & Evaluation 11. Controlled Experiment SunyoungKim,PhD Todays agenda Hypothesis testing Threats Experimental Biases Hypothesis Testing Hypothesis testing How to prove a hypotheses in

Basic Experimental Design Basic Concepts in Experimental Design Prof. Dr. Luc Duchateau Ghent

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

Experimental Design and Probability Introduction to course Robin Elahi Experimental Design and

Experimental Design in R Kaelen Medeiros Product Data Scientist at DataCamp DataCamp

Experimental evaluation of an Experimental evaluation of an open source implementation of open

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

WHAT WOULD TREX DO? From Experimental Design to Analysis, the TREX Approach EXPERIMENTAL DESIGN

Experimental Design for Simulation Experimental Design for Simulation [Law, Ch. 12][Sanchez et al.

Principles of Experimental Design Applied Statistics and Experimental Design Chapter 1 Peter

Design Exploration and Design Exploration and Experimental Validation of Experimental Validation

Latin Squares Kaelen Medeiros Content Quality Analyst DataCamp Experimental Design in R Latin

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Y PLACEBO EFFECTS P O & TRANSCRANIAL MAGNETIC C STIMULATION T INTENSIVE COURSE IN

Statistics for the terrified Amanda Burls Evidence-Based Teachers and Developers Conference,

1. Motivating Example Inferences for Ratios of Normal Means Multi-dose experiment including a

Short and Long Term Effects of Benznidazole, Posaconazole, Monotherapy and their Combina?on in

Teriparatide, alone and in I have nothing to disclose. combination with antiresorptive s Anne

Performance Evaluation of Policies Performance Evaluation of Policies and Programmes Adam B.

Experimental design Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha

The Importance of Building Student Trust Stephen L. Chew, PhD Department of Psychology Samford

Sambuz

Useful Links

Newsletter

Mail Us

Experimental Design & Evaluation 11. Controlled Experiment - PowerPoint PPT Presentation

Experimental Design & Evaluation 11. Controlled Experiment SunyoungKim,PhD Todays agenda Hypothesis testing Threats Experimental Biases Hypothesis Testing Hypothesis testing How to prove a hypotheses in

Basic Experimental Design Basic Concepts in Experimental Design Prof. Dr. Luc Duchateau Ghent

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

Experimental Design and Probability Introduction to course Robin Elahi Experimental Design and

Experimental Design in R Kaelen Medeiros Product Data Scientist at DataCamp DataCamp

Experimental evaluation of an Experimental evaluation of an open source implementation of open

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

WHAT WOULD TREX DO? From Experimental Design to Analysis, the TREX Approach EXPERIMENTAL DESIGN

Experimental Design for Simulation Experimental Design for Simulation [Law, Ch. 12][Sanchez et al.

Principles of Experimental Design Applied Statistics and Experimental Design Chapter 1 Peter

Design Exploration and Design Exploration and Experimental Validation of Experimental Validation

Latin Squares Kaelen Medeiros Content Quality Analyst DataCamp Experimental Design in R Latin

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson &amp; Pyla UX Evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Y PLACEBO EFFECTS P O &amp; TRANSCRANIAL MAGNETIC C STIMULATION T INTENSIVE COURSE IN

Statistics for the terrified Amanda Burls Evidence-Based Teachers and Developers Conference,

1. Motivating Example Inferences for Ratios of Normal Means Multi-dose experiment including a

Short and Long Term Effects of Benznidazole, Posaconazole, Monotherapy and their Combina?on in

Teriparatide, alone and in I have nothing to disclose. combination with antiresorptive s Anne

Performance Evaluation of Policies Performance Evaluation of Policies and Programmes Adam B.

Experimental design Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha

The Importance of Building Student Trust Stephen L. Chew, PhD Department of Psychology Samford

Sambuz

Useful Links

Newsletter

Mail Us

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

Y PLACEBO EFFECTS P O & TRANSCRANIAL MAGNETIC C STIMULATION T INTENSIVE COURSE IN