Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen - PowerPoint PPT Presentation

Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen lei.chen@liulishuo.com Liulishuo Silicon Valley AI Lab 1

Outline • A brief introduction of Item Response Theory (IRT) • Edward, a new probabilistic programming (PP) toolkit • An experiment of using Edward to do IRT model estimation on both CPU and GPU computing platforms • Summary 2

A concise introduction of adaptive learning • What's up with adaptive learning 3

Adaptive learning is hot in the eduTech market • Increasing demands • Districts’ spending on adaptive learning products has grown threefold between 2013 and 2016 , according to a new analysis. EdWeek market brief 7/14/2017 • Increasing suppliers 4

Precisely knowing students ability levels is important • Adaptive learning needs correct inputs about students’ ability levels, which are latent • Assessment are developed for inferring latent abilities • For a Yes/No question, the probability a student provides a correct answer p(X=1) depends on • his/her latent ability (theta) • Also other related factors, e.g., item’s di ffi culty, making a lucky guess, carelessness … 5

Item Response Theory (IRT) • IRT provides a principled statistical method to quantify these factors and has been widely used to build up modern assessment industry • A widely used 2 parameter logistic model (2-PL) 6

IRT with fewer or more parameters • 1-PL • Only having b, assume all items share same a • 3-PL • c for random guessing • 4-PL • d for inattention 7

IRT’s wide usages 8

IRT’s wide usages • More precise description of item performance 8

IRT’s wide usages • More precise description of item performance • More precise scoring 8

IRT’s wide usages • More precise description of item performance • More precise scoring • More powerful test assembly 8

IRT’s wide usages • More precise description of item performance • More precise scoring • More powerful test assembly • Supporting advanced linking & equating to make standard tests be possible 8

IRT’s wide usages • More precise description of item performance • More precise scoring • More powerful test assembly • Supporting advanced linking & equating to make standard tests be possible • Supporting adaptive testing by placing examinees and items on the same scale 8

Concrete examples • “ Item response theory and computerized adaptive testing ” presentation made for a hands-on workshop by Rust, Cek, Sun, and Kosinski from University of Cambridge The Psychometrics Center • Very nice animations to explain IRT, how to use IRT to score, and CAT. 9

Item Response Function Binary items Probability of getting item right 1 Parameters: Models: Measured concept (theta) 10

Item Response Function Binary items Probability of getting item right 1 Parameters: Difficulty • Models: Difficulty 1 Parameter • Measured concept (theta) 10

Item Response Function Binary items Probability of getting ) e item right 1 p o l s ( n Parameters: o i t Difficulty • a n Discrimination • i m i r c s i D Models: Difficulty 1 Parameter • 2 Parameter • Measured concept (theta) 10

Item Response Function Binary items Probability of getting ) e item right 1 p o l s ( n Parameters: o i t Difficulty • a n Discrimination • i m Guessing • i r c s i D Models: Difficulty 1 Parameter • 2 Parameter • 3 Parameter • Guessing Measured concept (theta) 10

Item Response Function Binary items Probability of getting ) e item right 1 p o Inattention l s ( n Parameters: o i t Difficulty • a n Discrimination • i m Guessing • i r c Inattention • s i D Models: Difficulty 1 Parameter • 2 Parameter • 3 Parameter • Guessing 4 Parameter • Measured concept (theta) 10

Item Response Function Binary items Probability of getting ) e item right 1 p o Inattention l s ( n Parameters: o i t Difficulty • a n Discrimination • i m Guessing • i r c Inattention • s i D Models: Difficulty 1 Parameter • 2 Parameter • 3 Parameter • Guessing 4 Parameter • unfolding • Measured concept (theta) 10

Scoring Test: 1.0 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

Scoring Test: 1. Normal distribution 1.0 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 11

Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 11

Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 4. q3 - Incorrect Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 4. q3 - Incorrect Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 11

Computer Adaptive Testing • Standard tests • Containing fixed number of questions • Some are too simple and some are too di ffi cult for a specific test-taker • CAT • Items can be tailored • Save time/money • Measure test-taker’s ability more accurately 12

Example of CAT Start the test: 1.0 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 13

Example of CAT Start the test: Incorrect response Correct response 1. Ask first question, e.g. of 1.0 medium difficulty 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 13

Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 13

Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 3. Score it Probability Normal distribution 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 13

Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 3. Score it Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 13

Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 3. Score it Probability 4. Select next item with a 0.6 difficulty around the most Difficulty likely score (or with the max 0.4 information) 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 13

Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 3. Score it Probability 4. Select next item with a 0.6 difficulty around the most Difficulty likely score (or with the max 0.4 information) 5. And so on…. Until the 0.2 stopping rule is reached 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 13

IRT model estimation • Mostly used Marginal Maximum Likelihood (MMLE) • Finding the marginal distribution of the item parameters by integrating over theta • Estimate item parameters by MLE • Obtain theta by MLE based on estimated item parameters • For a more e ffi cient estimation, use EM • Other ways • Joint Maximum Likelihood (JML) 14

Bayesian solution • Issues with MLE • Depends on distribution of data • Estimation is not accurate when samples are small- sized • Hard to handle ability distribution is not normal • Bayesian solutions consider theta priors 15

MCMC • Markov chain Monte Carlo (MCMC) used for Bayesian estimation • Ultimate goal is approximate p(parameters|data) by sampling many data points from the posterior probability • Hamiltonian MC is good at dealing with high-dimensional parameter spaces. HMC utilizes the geometry of the important regions of the posterior for making better proposals. 16

Variational Inference • To approximate intractable distribution by using a family of distributions and finding the member of this family that can minimizes divergence to the true posterior • By approximating the posterior with a simpler function, leading to faster estimation • Kullback–Leibler (K-L) divergence was frequently used to measure two distributions’ closeness 17

Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen - PowerPoint PPT Presentation

Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen lei.chen@liulishuo.com Liulishuo Silicon Valley AI Lab 1 Outline A brief introduction of Item Response Theory (IRT) Edward, a new probabilistic programming (PP) toolkit

Unidimensional and Multidimensional IRT Modeling with the mirt Package Phil Chalmers York

IRT Recommendations on RPMs in the new gTLDs: A Summary How the IRT hopes ICANN will protect the

Rethinking public transport (Part 1: IRT) Everyone in Wales can access effective IRT public

ABB ABB FlexTrack IRT 501 LC BIW - 1 - ABB BiW product FlexTrack IRT 501 Material handling

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Mental Health Adult Pre-Charge Diversion Program Agenda Why Pre-Charge Diversion? Item 1 Item 1

Week 4 Video 4 Knowledge Inference: Item Response Theory Item Response Theory A classic

Pathfinder: /| child::element person { item* } (iter, item1) /| child::element closed_auction {

Logistic mixed models for DIF IRT models can be regarded as logistic mixed models (e.g., Adams,

Flexible Latent Trait Metrics An Application of the Filtered Monotonic Polynomial Item Response

FAST OMC April 2, 2020 Item #1 CITIZENS PARTICIPATION 1 Item #2 APPROVAL OF THE MINUTES:

IRT models and mixed models: Theory and lmer practice Paul De Boeck Sun-Joo Cho U. Amsterdam

IRT 5000/7000 Infrared Microscopes JASCO Pr oduc t Se minar e b 9 th 2018, Kuala L F umpur

Shaping the Future of Railway Monash University IRT 2017 Jim Hunter GM Network Engineering |

Screening Common Items for IRT Equating Yi Du, Ph.D. Data Recognition Corporation Presentation

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

On The Distribution of Linear Biases: Three Instructive Examples Mohamed Ahmed Abdelraheem 1 ,

DATE: May 11, 2017 TO: Planning & Organization Committee/Recycling Board FROM: Tom Padia,

Has War Really Declined since World War 2? by Michael Spagat Department of Economics Royal

ANALYSTS BRIEFING 01 01 MARCH H 20 2018 18 1 TABLE OF CONTENTS Consolidated Financial

RTI 101 Why RTI? The normal curve is not sacred. It describes the outcome of a random

Deep Dive Into Mann-Whitney and Spearman Rank Deliverance Bougie Sr. Statistician August 2018

NDPERS Public Safety Retirement Plan DEFINED BENEFIT Public Safety Plan Funded by

Retirement Plan Objectives Future benefits should be fully financed The plan should provide

Sambuz

Useful Links

Newsletter

Mail Us

Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen - PowerPoint PPT Presentation

Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen lei.chen@liulishuo.com Liulishuo Silicon Valley AI Lab 1 Outline A brief introduction of Item Response Theory (IRT) Edward, a new probabilistic programming (PP) toolkit

Unidimensional and Multidimensional IRT Modeling with the mirt Package Phil Chalmers York

IRT Recommendations on RPMs in the new gTLDs: A Summary How the IRT hopes ICANN will protect the

Rethinking public transport (Part 1: IRT) Everyone in Wales can access effective IRT public

ABB ABB FlexTrack IRT 501 LC BIW - 1 - ABB BiW product FlexTrack IRT 501 Material handling

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Mental Health Adult Pre-Charge Diversion Program Agenda Why Pre-Charge Diversion? Item 1 Item 1

Week 4 Video 4 Knowledge Inference: Item Response Theory Item Response Theory A classic

Pathfinder: /| child::element person { item* } (iter, item1) /| child::element closed_auction {

Logistic mixed models for DIF IRT models can be regarded as logistic mixed models (e.g., Adams,

Flexible Latent Trait Metrics An Application of the Filtered Monotonic Polynomial Item Response

FAST OMC April 2, 2020 Item #1 CITIZENS PARTICIPATION 1 Item #2 APPROVAL OF THE MINUTES:

IRT models and mixed models: Theory and lmer practice Paul De Boeck Sun-Joo Cho U. Amsterdam

IRT 5000/7000 Infrared Microscopes JASCO Pr oduc t Se minar e b 9 th 2018, Kuala L F umpur

Shaping the Future of Railway Monash University IRT 2017 Jim Hunter GM Network Engineering |

Screening Common Items for IRT Equating Yi Du, Ph.D. Data Recognition Corporation Presentation

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

On The Distribution of Linear Biases: Three Instructive Examples Mohamed Ahmed Abdelraheem 1 ,

DATE: May 11, 2017 TO: Planning &amp; Organization Committee/Recycling Board FROM: Tom Padia,

Has War Really Declined since World War 2? by Michael Spagat Department of Economics Royal

ANALYSTS BRIEFING 01 01 MARCH H 20 2018 18 1 TABLE OF CONTENTS Consolidated Financial

RTI 101 Why RTI? The normal curve is not sacred. It describes the outcome of a random

Deep Dive Into Mann-Whitney and Spearman Rank Deliverance Bougie Sr. Statistician August 2018

NDPERS Public Safety Retirement Plan DEFINED BENEFIT Public Safety Plan Funded by

Retirement Plan Objectives Future benefits should be fully financed The plan should provide

Sambuz

Useful Links

Newsletter

Mail Us

DATE: May 11, 2017 TO: Planning & Organization Committee/Recycling Board FROM: Tom Padia,