“Why is ‘Chicago’ deceptive?”: Towards Building Model-Driven Tutorials for Humans
Why is Chicago deceptive?: Towards Building Model-Driven Tutorials - - PowerPoint PPT Presentation
Why is Chicago deceptive?: Towards Building Model-Driven Tutorials - - PowerPoint PPT Presentation
Why is Chicago deceptive?: Towards Building Model-Driven Tutorials for Humans Vivian Lai, Han Liu, and Chenhao Tan @vivwylai | @HanLiuAI | @ChenhaoTan University of Colorado Boulder machineintheloop.com 1 AI used in societally
AI used in societally critical tasks
Amazon secret AI hiring tool Recidivism prediction Autonomous driving Geiger et al. 2012; European Parliament 2016; Kleinberg et al. 2017; Dastin 2018 Medical diagnosisExplanations!
Explaining AI is tricky
5Why is explaining AI tricky?
6Two distinct learning modes
Emulating Discovering
Why is explaining AI tricky?
7Two distinct learning modes
- Emulating
Why is explaining AI tricky?
8Two distinct learning modes
- Discovering
Why is explaining AI tricky?
9Two distinct learning modes
- Discovering
AI can discover inconspicuous and counterintuitive patterns.
So, how can explaining AI be less tricky?
10Model-driven tutorials
Elucidate counterintuitive patterns Enhance humans' ability to understand patternsModel-driven tutorials: Guidelines
11State-of-the-art science communication
Model-driven tutorials: Examples
12 How do we choose examples?- SP-LIME
- Spaced repetition
Model-driven tutorials: Examples
13 How do we choose examples?- SP-LIME
- Sp
Experimental Design & Research Questions
14 R1: Effect of different tutorialsTraining Prediction
Experimental Design & Research Questions
15RQ1: Effect of different tutorials
Different tutorials No assistanceTraining Prediction
Experimental Design & Research Questions
16RQ1: Effect of different tutorials
Training
Experimental Design & Research Questions
17RQ2: Effect of real-time assistance
Same tutorial (Spaced repetition) Different real-time assistanceTraining Prediction
Experimental Design & Research Questions
18RQ1: Effect of different tutorials
Training
RQ2: Effect of real-time assistance
Prediction
Experimental Design & Research Questions
19Linear model Deep model RQ3 RQ1 & RQ2
Experimental Design & Research Questions
20RQ1: Effect of different tutorials
Training
RQ2: Effect of real-time assistance
Prediction
- RQ3: Effect
- f model
complexity
Experimental Design & Research Questions
21RQ1: Effect of different tutorials
Training
RQ2: Effect of real-time assistance
Prediction
- RQ3: Effect
- f model
complexity
Performed qualitative study to improve interface design.
Can model-driven tutorials improve human performance without any real-time assistance in the prediction phase?
Research question 1
Training PredictionTutorials are useful to some extent
# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05Tutorials are useful to some extent
# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05The tutorial is helpful but it’s just hard not being able to reference it.
“ ”
If not, how do varying levels of real-time assistance in prediction phase affect human performance after training?
Research question 2
Full human agency Full automation?
Information from AI increases from left to right.
Prediction: various levels of real-time assistance
Prediction: various levels of real-time assistance
27 Full human agency Full automation Signed explanations Signed explanations + predicted label Signed explanations + predicted label + guidelines Unsigned explanations Signed explanations + predicted label + guidelines + accuracy statementUnsigned explanations Signed explanations
28Real-time assistance improves performance
# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05Signed highlights is sufficient
# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05 p>0.05Gap between human+AI & AI
# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05 Poursabzi-Sangdeh et al. 2018; Green & Chen 2019; Lage et al. 2019; Lai & Tan 2019; Carton et al. 2020; Lai et al. 2020Can our results generalize in
- ther models? How do
model complexity and explanation methods affect human performance with/without training?
Research question 3
Simple model Deep model vs.SVM explanations BERT attention explanations
33SVM explanations BERT LIME explanations
34Simple model = better human performance
35 p=0.001*** p=0.001*** # of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05 Lai et al. 2019Simple model = better human performance
36 59.2% 54.1% 64.1% 64.9% 58.2% 72.8% 50 55 60 65 70 75 80 BERT-LIME BERT-ATT SVM Accuracy (%) Training No training p=0.001*** p=0.001***Training leads to better performance
37 p=0.001*** p=0.001*** # of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05 p=0.001***- Tutorials somewhat improve
- Explanations from simple
- Future directions for human-
Takeaway
38