HFES Webinar Series How Do You Know That Your Metrics Work? - PowerPoint PPT Presentation

HFES Webinar Series How Do You Know That Your Metrics Work? Fundamental Psychometric Approaches to Evaluating Metrics Presented by Fred Oswald, Rice University Moderated by Rebecca A. Grier, Ford Motor Company Hosted by the Perception and Performance Technical Group

HFES Webinar Series • Began in 2011 • Organized by the Education & Training Committee • This webinar is organized and hosted by the HFES Perception and Performance Technical Group, http://hfes-pptg.org/. • See upcoming and past webinars at http://bit.ly/HFES_Webinars

HFES Webinar FAQs 1. There are no CEUs for this webinar. 2. This webinar is being recorded. HFES will post links to the recording and presentation slides on the HFES Web site within 3-5 business days. Watch your e-mail for a message containing the links. 3. Listen over your speakers or via the telephone. If you are listening over your speakers, make sure your speaker volume is turned on in your operating system and your speakers are turned on. 4. All attendees are muted. Only the presenters can be heard. 5. At any time during the webinar, you can submit questions using the Q&A panel. The moderator will read the questions following the last presentation. 6. Trouble navigating in Zoom? Type a question into Chat. HFES staff will attempt to help. 7. HFES cannot resolve technical issues related to the webinar service. If you have trouble connecting or hearing the audio, click the “Support” link at www.zoom.us.

About the Presenters Presenter Fred Oswald, PhD, is a professor in the Department of Psychology at Rice University. An organiza:onal psychologist, he addresses issues pertaining to personnel selec:on, college admission, military selec:on and classifica:on, and school-to-work transi:on. Oswald publishes sta:s:cal and methodological research in the areas of big data, meta-analysis, measure development, and psychometrics. He is an Associate Editor of Journal of Management, Psychological Methods, Advances in Methods and Prac9ce in Psychological Science, and Journal of Research in Personality. Fred received his MA and PhD in industrial-organiza:onal psychology from the University of Minnesota in 1999. Moderator Rebecca A. Grier, PhD, is a human factors scien:st at Ford Motor Company researching human interac:on with highly automated vehicles. In addi:on, she is secretary/treasurer of the HFES Percep:on and Performance Technical Group and chair of the Society of Automo:ve Engineers Taskforce on Iden:fying Automated Driving Systems - Dedicated Vehicles (ADS-DV) User Issues for Persons with Disabili:es. Rebecca received her MA and PhD in human factors/ experimental psychology from the University of Cincinna: and a BS With Honors in psychology from Loyola University, Chicago.

animalmascots.com/01-00887/German-Shepard-Mascot-Costume How do you know that your metrics work? Fundamental questions about metrics Fred Oswald Rice University HFES Webinar Series April 12, 2018

Outline: Questions Addressed Q0: Context of measurement? Q1: Develop a new measure? Q2: How to develop good items? Q3: Format of the measure? Q4: Evidence for reliability? Q5: Practical analysis tips?

Q0: Context of measurement? Purposes • Evaluative (e.g., system comparison, individual differences) • Developmental (e.g., training evaluation) • Managerial decision-making (e.g., compensate, promote, transfer, terminate)

Q0: Context of measurement? Content • General ßà Specific/Contextualized • Multiple ßà Single measures • Strong ßà Weak or “ subtle ” indicators Form • Many items ßà Few items • Self-report ßà “ Other ” report • Traditional ßà Innovative

Q0: Context of measurement? Broad Contexts • Academic vs. organizational • IRB vs. organizational climate for surveying • Perceptions of fairness, relevance (from all stakeholders, including those of the test-taker) • Legal concerns

Q1: Develop a new measure? No à Use an existing measure when • there is a strong theoretical basis • past empirical research demonstrates reliability and validity • you are not interested in a measure- development study (do not toss an ad hoc measure into a study) …

Q1: Develop a new measure? Yes à Develop a new measure when • access to existing measures is limited (expensive, proprietary) • there is room for improvement (improved theory, aligning the measure with the intended purpose, increased sensitivity to the test-taker perspective, updating language) • test security is of concern ( “ freshening ” item pools, previous test compromises) • there is limited testing time …

A Common Context: Limited Testing Time Problem : Too many constructs and not enough time, resources … or test-taker patience Reasons : Theories get complex; organizations place high demand on measures/data to answer many practical organizational questions Solutions : Reduce constructs to “ essential ” ones? Abandon use of multiple scales for a construct? Shorten measures?

Q2: How to develop good items? Good measure development – and therefore good results – requires sound investment. • Expertise (substantive researchers, SMEs, psychometricians, sensitivity review) • Development process (item generation, refinement, translation/backtranslation) • Research/evidence (reliability, validity, low adverse impact, generalizability)

Q2: How to develop good items? Item content can be evaluated for relevancy, deficiency and contamination; however, these three characteristics can also be psychological phenomena (e.g., did the test-taker forget or get confused by the item content?).

Q2: How to develop good items? Appropriate content sampling from a construct domain is a necessary condition for obtaining interpretable reliability evidence for a set of items. High reliability coefficient does not ensure adequate content sampling: collections of items can covary due to shared contaminants or shared deficiencies.

Q2: How to develop good items? job job construct construct satisfaction satisfaction items … item 1 item 2 item 3 item k • Items sample different aspects of the theoretical construct. • e.g, satisfaction with: autonomy, salary, job variety, management, coworkers … . • Controlled heterogeneity entails varying these aspects across items to triangulate on the psychological construct. • Varying items allows for distinguishing item content that is item-specific vs. construct-relevant.

Q2: How to develop good items? workload construct (task load) items … item 1 item 2 item 3 item k Another construct: Workload • Identified facets (aspects) under its construct umbrella: NASA-TLX: Mental Demand, Physical Demand, Temporal Demand, Performance, Frustration, Effort • Controlled heterogeneity: create items reflecting each facet • Given enough items, a facet can become a reliable scale on its own (e.g., high alpha, strong single factor).

Q2: How to develop good items? Who generates items? • SMEs: domain-specific experts/theorists; job analysts; job incumbents; researchers themselves relying on past theories and measures • Item categorization process How many people are needed to generate items? • Generate items (from SMEs, research literature) until themes (and possibly content) are redundant. Often need fewer SMEs than one might think.

Q2: How to develop good items? What items are appropriate given the measurement goals? e.g., • Knowledge: easier items (minimal competence); difficult items (certify professionals); full range (accurately measure people ’ s knowledge across the range) • Personality: screen out extremes (e.g., antisocial) vs. assess normal personality(e.g., agreeableness) • Adaptive items (regardless of domain): initial item is in the “ middle ” of the construct continuum; subsequent items are tailored to test-takers ’ past responses (reliable items consistent with one ’ s true score improves estimation, reducing test time)

Q2: How to develop good items? Items developed will eventually be refined, based on psychometric analysis (item-remainder correlations; CFA factor loadings).

Q3: Format of the measure? Instructions • Often, instructions are written at too high of a grade level (5 grade levels too high for patient discharge interview; Spandorfer et al., 1993). • Very few people read them (Novick & Ward, 2006), though novices who read them will improve (Catrambone, 1990). • Detailed instructions about providing “ objective ” information (BARS, time spans, frequencies) often do not change the subjective response process. • General suggestion: Assume that test-takers will ignore (or skim) instructions and proceed accordingly! Novel formats requiring instructions demand pilot testing to ensure comprehension and quality responding.

Q3: Format of the measure? Look and feel? • Grammar and syntax matter, not just for understandability and better data but for credibility. • Clear, readable font (this is HFES! J ). • Intuitive method of responding. • For web-based measures, check browser types and screen resolutions. • Minimize drudgery; maximize simplicity and readability. Keep all cognitive burdens to a necessary minimum (see Dillman’s Tailored Design Method ; Dillman, 2014)

HFES Webinar Series How Do You Know That Your Metrics Work? - PowerPoint PPT Presentation

HFES Webinar Series How Do You Know That Your Metrics Work? Fundamental Psychometric Approaches to Evaluating Metrics Presented by Fred Oswald, Rice University Moderated by Rebecca A. Grier, Ford Motor Company Hosted by the Perception and

HFES Public Outreach Webinar Series HFES Public Outreach Webinar Series Organized by the

HFES Public Outreach Webinar Series The Real Reasons You Want Sit/Stand Workstations in Your

What You Dont Know What You Dont Know What You Dont Know What You Dont Know That

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

webinar Series Healthy Buildings webinar Series Todays Goals: Healthy Buildings webinar

(11-14) How much do you know about the internet? Make sure you stay SAFE AND SECURE ONLINE YOU

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

WELCOME! You need to know what you know, and know what you dont know. Then work on your areas

HOW TO BECOME AN EFFECTIVE GROUP FACILITATOR How do I prepare? Know your Know your Know your

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Metrics You Should Use (but Dont) @catswetel at #YOW18 Metrics You Should Use (but Dont)

Know how. Know now. Know how. Know now. Please Thank our sponsor! The Nebraska Soybean Board

TSMO Workforce Webinar Series 1 TSMO Workforce Webinar Series Webinar #1 - TSMO Workforce

Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt Jason Hick (NSF award number

Final Program Comments Meeting Wrap-up Special Thank Yous Thanks to our Host,

Security Principles CS 161: Computer Security Prof. Vern Paxson TAs: Paul Bramsen, Apoorva

Foundations of Network and Foundations of Network and Computer Security Computer Security J ohn

Could Everything Be A Process? Antony Galton Department of Computer Science University of

From IoT to Cloud Via Fog and SDN Antonio Brogi Stefano Forti Federica Paganelli

OWL-P: Processes = Protocols + Policies Munindar P. Singh ( Students: Amit K. Chopra, Nirmit V.

How am I going to skim through these data ? 1 Trends Computers keep getting faster But