Rigorous Evaluation Analysis and Reporting Structure is from A - PowerPoint PPT Presentation

Rigorous Evaluation Analysis and Reporting Structure is from A Practical Guide to Usability Testing by J. Dumas, J. Redish R.I.T S. Ludi/R. Kuehl p. 1 R I T Software Engineering

Summarize and Analyze Test Data  Qualitative data - comments, observations, test logs, surveys, …  Group into meaningful categories (+ or – for a particular task/screen)  Quantitative data - times, error rates, …  Tabulate survey multiple choice questions  Use statistical analysis when appropriate R.I.T S. Ludi/R. Kuehl p. 2 R I T Software Engineering

Look for Data Trends/ Surprises  Examine the quantitative data …  Trends or patterns in task completion, error rates, etc.  Identify extremes, outliers  Outliers - what can they tell us, ignore at your peril  Non-usability anomaly such as technical problem?  Difficulties unique to one participant?  Unexpected usage patterns?  Correlate with qualitative data such as written comments – why?  If appropriate compare across program versions (A/B testing), different user groups  Identify critical instances (notable UX impact) R.I.T S. Ludi/R. Kuehl p. 3 R I T Software Engineering

Examining the Data for Problems  Have you achieved the usability goals – learnable, memorable, efficient, understandable, satisfying …?  Unanticipated usability problems ?  Usability concerns that are not addressed in the design  Have the quantitative criteria that you have set been met or exceeded ?  Was the expected emotional impact observed ? R.I.T S. Ludi/R. Kuehl p. 4 R I T Software Engineering

Task and Error Analysis  What tasks did users have the most problems with (usability goals not met)?  Conduct error analysis  Categorize errors/task - r equirement or design defect (or bug)  % of participants performing success fully within the benchmark time  % of participants performing success fully regardless of time (with or without assistance)  If low then BIG problems R.I.T S. Ludi/R. Kuehl p. 5 R I T Software Engineering

Prioritize Problems  Criticality = Severity + Probability  Severity  4: Unusable – not able/want to use that part of product due to design/implementation  3: Severe – severely limited in ability to use product (hard to workaround)  2: Moderate – can use product in most cases, with moderate workaround  1: Irritant – intermittent issue with easy workaround; cosmetic  Factor in scope – local to a task (e.g., on screen) versus global to the application (e.g., main menu) R.I.T S. Ludi/R. Kuehl p. 6 R I T Software Engineering

Prioritize Problems (cont)  Probability = frequency * scale  Frequency (% of time used)  4: 90%+, 3: 51-89%, 2: 11-50%, 1: 10% or less  Between 0 and 1  Scale (% of users)  % of target population  Between 0 and 1  When done – sort by severity (priority) R.I.T S. Ludi/R. Kuehl p. 7 R I T Software Engineering

Errors in Testing  Sample is not big enough  The sample is biased  You have failed to notice and compensate for factors that can bias the results  Sloppy measurement of data.  Outliers were left in when they should have been removed  Is an outlier a fluke or a sign of something more serious in the context of a larger data set? R.I.T S. Ludi/R. Kuehl p. 8 R I T Software Engineering

Statistical Analysis  Summarize quantitative data to help discover patterns of performance and preference , and detect usability problems  Descriptive and inferential techniques R.I.T S. Ludi/R. Kuehl p. 9 R I T Software Engineering

Descriptive Statistics  Describe the properties of a specific data set  Measures of central tendency (single variable)  Frequency distribution (e.g., of errors)  Mean (average), median (middle value), mode (most frequent value in a set)  Measures of spread (single variable)  Amount of variance from the mean, standard deviation  Relationships between pairs of variables  Scatterplot  Correlation  Sufficient to make meaningful recommendations for most tests R.I.T S. Ludi/R. Kuehl p. 10 R I T Software Engineering

Using Descriptive Statistics to Summarize Performance Data E.g., Task Completion Times  Mean time to complete – rough estimate of group as a whole  Compare with original benchmark: is it skewed above/below?  TRIMMEAN - trims the top and bottom 10% before mean calculation to exclude outliers (Excel function)  Median time to complete – use if data very skewed  Range (largest value – smallest value) spread of data  If small spread then mean is representative of the group  A good measure R.I.T S. Ludi/R. Kuehl p. 11 R I T Software Engineering

Summarizing Performance Data (Cont’d)  Interquartile range (IQR) – another measure of statistical spread  Find the three data points (quartiles) that divide the data set into four equal parts , where each part has one quarter of the data  Difference between the upper (Q 3 ) and lower (Q 1 ) quartile points is the IQR IQR = Q 3 - Q 1 (“middle fifty”)  Test for normal distribution – actual and calculated values of Q1 and Q3 are equal if normal  Find outliers - below Q 1 - 1.5(IQR) or above Q 3 + 1.5(IQR) R.I.T S. Ludi/R. Kuehl p. 12 R I T Software Engineering

Summarizing Performance Data (Cont’d)  Standard Deviation (SD) is the square root of the variance  How much variation or "dispersion" is there from the average (mean or expected value) in a normal distribution Sample SD "Bessel's Correction"  E.g., Standard deviation of completion times  If small, then performance is similar  If large, then more analysis is needed  A better measure R.I.T S. Ludi/R. Kuehl p. 13 R I T Software Engineering

Standard Deviation (SD)  The smaller the value of SD, the sharper the curve (narrow peak and steep sides)  Results grouped around the mean  The larger the value of SD, the broader the curve  And the larger the difference that values have from the mean  Influence by outliers possible, so rerun without them as well R.I.T S. Ludi/R. Kuehl p. 14 R I T Software Engineering

Normal Curve and Standard Deviation 1 SD= 68% 2 SD = 95% 3 SD= 99.7% R.I.T S. Ludi/R. Kuehl p. 15 R I T Software Engineering

Sample Data Tasks % of Participants Mean Time SD Performing within Benchmark Set Temp and 83 3.21 0.67 Pressure Set flows 33 12.08 10.15 Load the sample 100 .46 .17 tray Set oven 66 6.54 2.56 temperature program R.I.T S. Ludi/R. Kuehl p. 16 R I T Software Engineering

Correlation • Allows exploration of the strength of the linear relationship between two continuous variables • You get two pieces of information; direction and strength of the relationship • Direction • + , as one variable increases so does the other • - , as one variable increases, the other variable decreases • Strength • Small: ± .01 to .29 • Medium: ± .3 to .49 • Large: ± .5 to 1 R.I.T S. Ludi/R. Kuehl p. 17 R I T Software Engineering

Correlation  Pearson’s correlation coefficient (r) is most often used to measure correlation  Sensitive to a linear relationship between two variables  Pearson’s correlation coefficient (r) supported in Excel    ( )( ) X X Y Y X = X axis data point  Cov XY Y = Y axis data point  1 N X= mean of the X points Y = mean of the Y points cov     XY N = number of data points r SD SD SD = Standard Dev. X Y R.I.T S. Ludi/R. Kuehl p. 18 R I T Software Engineering

Scatterplots  Limitations of the Pearson correlation coefficient:  Its value generally does not completely characterize the relationship between variables  Non-linear and non-normal distributions , outliers  Need to visually examine the data points  Scatterplot – plot (X,Y) data point coordinates on a Cartesian diagram R.I.T S. Ludi/R. Kuehl p. 19 R I T Software Engineering

Scatterplot Samples 1.2 1 0.8 r = .00 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 R.I.T S. Ludi/R. Kuehl p. 20 R I T Software Engineering

Scatterplot Samples 1.2 1 0.8 r = .40 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 R.I.T S. Ludi/R. Kuehl p. 21 R I T Software Engineering

Scatterplot Samples 1.2 1 0.8 r = .99 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 R.I.T S. Ludi/R. Kuehl p. 22 R I T Software Engineering

Data Analysis Activity  See the Excel spreadsheet “Sample Usability Data File” under “Assignments and In -Class Activities” in myCourses  Follow the directions  Submit to the Activity dropbox “Data Analysis” R.I.T S. Ludi/R. Kuehl p. 23 R I T Software Engineering

Supplemental Information Inferential Statistics R.I.T S. Ludi/R. Kuehl p. 24 R I T Software Engineering

Inferential Statistics  Infer some property or general pattern about a larger data set by studying a statistically significant sample ( large enough to obtain repeatable results )  In expectation the results will generalize to the larger group  Analyze data subject to random variation as a sample from a larger data set  Techniques:  Estimation of descriptive parameters  Testing of statistical hypotheses  Can be complex to use, controversial  Keep Inferential Statistics Simple (KISS 2.0) R.I.T S. Ludi/R. Kuehl p. 25 R I T Software Engineering

Rigorous Evaluation Analysis and Reporting Structure is from A - PowerPoint PPT Presentation

Rigorous Evaluation Analysis and Reporting Structure is from A Practical Guide to Usability Testing by J. Dumas, J. Redish R.I.T S. Ludi/R. Kuehl p. 1 R I T Software Engineering Summarize and Analyze Test Data Qualitative data -

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

Rigorous Evaluation Usability Testing What is Usability Testing? Formal and rigorous testing

from rigorous science from rigorous science to impactful practice to impactful

A Rigorous Curriculum A rigorous curriculum is an inclusive set of intentionally aligned

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Rigorous Evaluations for Evidence- Based Education Policymaking South-South and North-South

Rigorous estimation of the speed of convergence to equilibrium. S. Galatolo Dip. Mat, Univ. Pisa

Acumen A Cyber-Physical (CPS) Modeling Language Rigorous Simulation Walid Taha Halmstad

A revision of propositional and first-order logics Rigorous Software Development MAPi October

Rigorous approximation of invariant measures for IFS Joint work with S. Galatolo e I. Nisoli

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Guidance 1 for Strengthening Pipeline Safety Through Rigorous Program Evaluation and Meaningful

Rigorous Evaluation Usability Testing To Review - What is Usability? A measure of the quality

Rigorous Evaluation Usability Testing R.I.T S. Ludi/R. Kuehl p. 1 R I T Software Engineering

ECE 2162 Branch Prediction Control Dependencies Branches are very frequent Approx. 20%

Multicore Processing Element for SIMD Computing Da-Qi Ren and Reiji Suda Department of Computer

Excursion 3 Tour III Capability and Severity: Deeper Concepts Frequentist Family Feud A

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

Research and Analysis for Public Policy and Management: Principles and Practices from Active

The local velocity field according to 6dFGSv Christina Magoulas (UCT) ! and the 6dFGSv team LSS

9 P T A H > T 2 4 D 0 C LinearSystemswithConstant Coefficients W e are now ready

CS 6354: Processor Networks 5 October 2016 1 To read more This days papers: Scott,

Rigorous Evaluation Analysis and Reporting Structure is from A - PowerPoint PPT Presentation

Rigorous Evaluation Analysis and Reporting Structure is from A Practical Guide to Usability Testing by J. Dumas, J. Redish R.I.T S. Ludi/R. Kuehl p. 1 R I T Software Engineering Summarize and Analyze Test Data Qualitative data -

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

Rigorous Evaluation Usability Testing What is Usability Testing? Formal and rigorous testing

from rigorous science from rigorous science to impactful practice to impactful

A Rigorous Curriculum A rigorous curriculum is an inclusive set of intentionally aligned

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Rigorous Evaluations for Evidence- Based Education Policymaking South-South and North-South

Rigorous estimation of the speed of convergence to equilibrium. S. Galatolo Dip. Mat, Univ. Pisa

Acumen A Cyber-Physical (CPS) Modeling Language Rigorous Simulation Walid Taha Halmstad

A revision of propositional and first-order logics Rigorous Software Development MAPi October

Rigorous approximation of invariant measures for IFS Joint work with S. Galatolo e I. Nisoli

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Guidance 1 for Strengthening Pipeline Safety Through Rigorous Program Evaluation and Meaningful

Rigorous Evaluation Usability Testing To Review - What is Usability? A measure of the quality

Rigorous Evaluation Usability Testing R.I.T S. Ludi/R. Kuehl p. 1 R I T Software Engineering

ECE 2162 Branch Prediction Control Dependencies Branches are very frequent Approx. 20%

Multicore Processing Element for SIMD Computing Da-Qi Ren and Reiji Suda Department of Computer

Excursion 3 Tour III Capability and Severity: Deeper Concepts Frequentist Family Feud A

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

Research and Analysis for Public Policy and Management: Principles and Practices from Active

The local velocity field according to 6dFGSv Christina Magoulas (UCT) ! and the 6dFGSv team LSS

9 P T A H &gt; T 2 4 D 0 C LinearSystemswithConstant Coefficients W e are now ready

CS 6354: Processor Networks 5 October 2016 1 To read more This days papers: Scott,

9 P T A H > T 2 4 D 0 C LinearSystemswithConstant Coefficients W e are now ready