Reliability and Validity of Angoff Ratings J. Anthony Bayless - PowerPoint PPT Presentation

Reliability and Validity of Angoff Ratings J. Anthony Bayless Henry Busciglio Personnel Research and Assessment Division Office of Human Resources Management

Standard Setting  Process to establish a performance standard, cut score, or passing score  Process not purely technical or empirical  Process involves value judgments ( Standards for Educational and Psychological Testing )  Various methods of standard setting, for example:  Contrasting Groups and Borderline Groups (Livingston & Zieky, 1982)  Angoff (1971)  Ebel (1972)  Nedelsky (1954) OHRM/PRAD June 10, 2008 2

Angoff Procedure  SMEs are administered the test  SMEs estimate the proportion of “minimally qualified” or “minimally competent” examinees who would answer each item correctly  Average Angoff rating is calculated for each item  Grand average of the Angoff ratings across items is calculated to represent the recommended performance standard (or cut score) OHRM/PRAD June 10, 2008 3

Promotional Assessments  Career Experience Inventory  Critical Thinking Skills  In-Basket Job Simulation  Managerial Writing Skills  Job Knowledge Test OHRM/PRAD June 10, 2008 4

Job Knowledge Test  80 items for each occupation’s (IEA and DO) test  Multiple-choice items with four response options  Dichotomously scored items  Power tests OHRM/PRAD June 10, 2008 5

Research Interest  How good are SMEs at conceptualizing and consistently applying a hypothetical construct of “minimally qualified” examinees?  Specifically, how reliable are the SME estimates?  Specifically, how valid are the SME estimates? OHRM/PRAD June 10, 2008 6

Methodology – Angoff IEA SMEs DO SMEs n=5 (Time 1 + Time 2) n=8 No group discussion Group discussion OHRM/PRAD June 10, 2008 7

Methodology - Study  Two post hoc studies, one per occupation  DO sample (N=259 examinees)  IEA sample (N=318 examinees)  Assessed interjudge reliability via internal consistency estimate of reliability  Assessed validity via correlation of average Angoff rating and actual (observed) item difficulty index for a “minimally qualified” group of examinees OHRM/PRAD June 10, 2008 8

Results - Reliability  DO Sample (72 scored items, 8 SMEs)  Alpha = .863, no removable SMEs  Item-total correlations from .582 to .680  IEA Sample (70 usable items, 5 SMEs)  Initial Alpha = .429, with 2 removable SMEs  Final Alpha = .547, using 3 SMEs  Item-total correlations from .364 to .422  We used both 5- and 3-SME groups for further analyses. OHRM/PRAD June 10, 2008 9

Results - Validity  Validity - agreement between SMEs’ Angoff estimates and actual p- values among group of “minimally qualified” test takers.  “Minimally qualified” defined two ways:  Candidates scoring close to 50 th percentile  Candidates getting 70% of items correct  Used both correlations and t-tests to assess validity OHRM/PRAD June 10, 2008 10

Results – Validity (Corr.)  For DO sample, correlations were:  .591** for 50 th percentile group  .479** for 70% correct group  For IEA sample, correlations (for 5- and 3-SME groups, respectively) were:  .311** and .243* for 50 th percentile group  .282* and .183 for 70% correct group ** p<.01. *p<.05. OHRM/PRAD June 10, 2008 11

Results – Validity (T-tests)  Agreement – magnitude of mean differences between the Angoff ratings for each item and the corresponding p-value among minimally qualified test takers.  Used paired-samples t-tests  For DO sample:  Grand average Angoff rating = .6310  Average p-value for 50 th percentile group = .6315  t = 0.025, df = 71, p = .980  Average p-value for 70% correct group = .6906  t = 2.750, df = 71, p = .008 OHRM/PRAD June 10, 2008 12

Results – Validity (T-tests) For IEA sample:  Grand average Angoff ratings  5-SME = .7716  3-SME = .7710  Average p-values  50 th percentile group = .6810  70% correct group = .6980 OHRM/PRAD June 10, 2008 13

Results – Validity (T-tests) For IEA sample, continued:  Comparisons:  1: 50 th perc p-values compared to 5-SME Angoffs  t = -3.233, p = .002  2: 70% corr p-values compared to 5-SME Angoffs  t = -2.685, p = .009  3: 50 th perc p-values compared to 3-SME Angoffs  t = -3.148, p = .002  4: 70% corr p-values compared to 3-SME Angoffs  t = -2.587, p = .012 OHRM/PRAD June 10, 2008 14

Results – Validity (T-tests) IEA T-Test Comparisons 50th Percentile p -values 70% Correct p -values Avg. Angoffs for t = -3.233 t = -2.685 5 SMEs p = .002 p = .009 Avg. Angoffs for t = -3.148 t = -2.587 3 SMEs p = .002 p = .012 OHRM/PRAD June 10, 2008 15

Results – Summary  DO SMEs gave reasonably reliable and valid estimates of actual p-values, especially for test takers at the 50 th percentile.  IEA SMEs gave less reliable and valid estimates by exhibiting less interrater agreement, demonstrating less insight into the relative difficulty of items, and overestimating p-values.  The notably superior performance of the DO SMEs is reasonable given the differences between the procedures used to obtain Angoff estimates from the two groups. OHRM/PRAD June 10, 2008 16

Limitations of Current Study  Post hoc studies  Did not retain initial round of Angoff ratings prior to group discussions during second round OHRM/PRAD June 10, 2008 17

How Does This Help You?  The more SMEs, the merrier!  Group discussion is critical  SMEs need to be experienced and representative of occupational workforce OHRM/PRAD June 10, 2008 18

References American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association. Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorndike (Ed.), Educational measurement (pp. 508-600). Washington, DC: American Council on Education. Cizek, G.J. (2001). Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates. Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational measurement: Issues and practice, 23(4), 31-50. OHRM/PRAD June 10, 2008 19

References (continued) Ebel, R.L. (1972). Essentials of educational measurement . Englewood Cliffs, NJ: Prentice-Hall. Goodwin, L.D. (1999). Relations between observed item difficulty levels and Angoff minimum passing levels for a group of borderline examinees. Applied measurement in education, 12(1), 13-28. Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and psychological measurement, 14, 3-19. OHRM/PRAD June 10, 2008 20

Reliability and Validity of Angoff Ratings J. Anthony Bayless - PowerPoint PPT Presentation

Reliability and Validity of Angoff Ratings J. Anthony Bayless Henry Busciglio Personnel Research and Assessment Division Office of Human Resources Management Standard Setting Process to establish a performance standard, cut score, or

External Validity of NYC Macroscope Electronic Health External Validity of NYC Macroscope

ASSESSING THE MEASUREMENT MODEL RELIABILITY AND VALIDITY USING SPSS/AMOS USING SPSS/AMOS

External Validity March 25 1 / 16 Definition How do we define external validity? Mundane

Circuit Validity Checker D. Mitch Bailey Shuhari System, Japan WOSET 2020 CVC: Circuit Validity

Software Reliability and System Reliability Introduction 1 Software Reliability and System

Reliability Engineering - Discussions and Clarifications Reliability Engineering VS.

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

RESEARCH VALIDITY Winfred Arthur, Jr. Department of Psychological and Brain Sciences and

First-Order Necessity and Validity First-Order Necessity and Validity Mark Criley IWU

Proving the Validity of an Argument Torben Amtoft Kansas State University Torben Amtoft Kansas

Cue validity Cue validity - predictiveness of a cue for a given category Central

Reliability Perspectives on Clean Power Plan Implications NERC Reliability Assessments John Moura

The Future of Reliability: Stanton Energy Reliability Center DCBO Bidders Conference

1 Interactive procedures for qualitative inquiry: Reliability and validity checking Abstract In

The Brief Assessment of Cognition for Schizophrenia: Validity and Reliability of the Filipino

preschoolers in Sweden: reliability and validity of an instrument Mina Sedem, Eva Siljehag,

SCSC Academic Accountability Schools Commission of Georgia SCSC Mission Statement The mission

West Virginia Police & Fire Pension Trustees

Performance and Termination Decisions Under the FMLA and ADA Avoiding Retaliation Claims and

Market Spreads vs Modelled Spreads in Fixed Income and Equity Markets Prof. Gianluca Oricchio

will exemplify excellence and equity such that all students are equipped with the knowledge and

PARENT MATH NIGHT Welcome! Math at Lake Country School student performance data

State Accountability Update TETN Session #36663 | February 25, 2016 | 1:00 3:00 p.m . Texas

The Photometric Performance of NICMOS L. Colina 1 Space Telescope Science Institute, 3700 San

Sambuz

Useful Links

Newsletter

Mail Us

Reliability and Validity of Angoff Ratings J. Anthony Bayless - PowerPoint PPT Presentation

Reliability and Validity of Angoff Ratings J. Anthony Bayless Henry Busciglio Personnel Research and Assessment Division Office of Human Resources Management Standard Setting Process to establish a performance standard, cut score, or

External Validity of NYC Macroscope Electronic Health External Validity of NYC Macroscope

ASSESSING THE MEASUREMENT MODEL RELIABILITY AND VALIDITY USING SPSS/AMOS USING SPSS/AMOS

External Validity March 25 1 / 16 Definition How do we define external validity? Mundane

Circuit Validity Checker D. Mitch Bailey Shuhari System, Japan WOSET 2020 CVC: Circuit Validity

Software Reliability and System Reliability Introduction 1 Software Reliability and System

Reliability Engineering - Discussions and Clarifications Reliability Engineering VS.

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

RESEARCH VALIDITY Winfred Arthur, Jr. Department of Psychological and Brain Sciences and

First-Order Necessity and Validity First-Order Necessity and Validity Mark Criley IWU

Proving the Validity of an Argument Torben Amtoft Kansas State University Torben Amtoft Kansas

Cue validity Cue validity - predictiveness of a cue for a given category Central

Reliability Perspectives on Clean Power Plan Implications NERC Reliability Assessments John Moura

The Future of Reliability: Stanton Energy Reliability Center DCBO Bidders Conference

1 Interactive procedures for qualitative inquiry: Reliability and validity checking Abstract In

The Brief Assessment of Cognition for Schizophrenia: Validity and Reliability of the Filipino

preschoolers in Sweden: reliability and validity of an instrument Mina Sedem, Eva Siljehag,

SCSC Academic Accountability Schools Commission of Georgia SCSC Mission Statement The mission

West Virginia Police &amp; Fire Pension Trustees

Performance and Termination Decisions Under the FMLA and ADA Avoiding Retaliation Claims and

Market Spreads vs Modelled Spreads in Fixed Income and Equity Markets Prof. Gianluca Oricchio

will exemplify excellence and equity such that all students are equipped with the knowledge and

PARENT MATH NIGHT Welcome! Math at Lake Country School student performance data

State Accountability Update TETN Session #36663 | February 25, 2016 | 1:00 3:00 p.m . Texas

The Photometric Performance of NICMOS L. Colina 1 Space Telescope Science Institute, 3700 San

Sambuz

Useful Links

Newsletter

Mail Us

West Virginia Police & Fire Pension Trustees