Why is Chicago deceptive?: Towards Building Model-Driven Tutorials - - PowerPoint PPT Presentation

why is chicago deceptive towards building model driven
SMART_READER_LITE
LIVE PREVIEW

Why is Chicago deceptive?: Towards Building Model-Driven Tutorials - - PowerPoint PPT Presentation

Why is Chicago deceptive?: Towards Building Model-Driven Tutorials for Humans Vivian Lai, Han Liu, and Chenhao Tan @vivwylai | @HanLiuAI | @ChenhaoTan University of Colorado Boulder machineintheloop.com 1 AI used in societally


slide-1
SLIDE 1 Vivian Lai, Han Liu, and Chenhao Tan @vivwylai | @HanLiuAI | @ChenhaoTan University of Colorado Boulder machineintheloop.com 1

“Why is ‘Chicago’ deceptive?”: Towards Building Model-Driven Tutorials for Humans

slide-2
SLIDE 2 2

AI used in societally critical tasks

Amazon secret AI hiring tool Recidivism prediction Autonomous driving Geiger et al. 2012; European Parliament 2016; Kleinberg et al. 2017; Dastin 2018 Medical diagnosis
slide-3
SLIDE 3 3
slide-4
SLIDE 4 4

Explanations!

slide-5
SLIDE 5

Explaining AI is tricky

5
slide-6
SLIDE 6

Why is explaining AI tricky?

6

Two distinct learning modes

Emulating Discovering

slide-7
SLIDE 7

Why is explaining AI tricky?

7

Two distinct learning modes

  • Emulating
slide-8
SLIDE 8

Why is explaining AI tricky?

8

Two distinct learning modes

  • Discovering
slide-9
SLIDE 9

Why is explaining AI tricky?

9

Two distinct learning modes

  • Discovering

AI can discover inconspicuous and counterintuitive patterns.

slide-10
SLIDE 10

So, how can explaining AI be less tricky?

10

Model-driven tutorials

Elucidate counterintuitive patterns Enhance humans' ability to understand patterns
slide-11
SLIDE 11

Model-driven tutorials: Guidelines

11

State-of-the-art science communication

slide-12
SLIDE 12 Ribeiro et al. 2016

Model-driven tutorials: Examples

12 How do we choose examples?
  • SP-LIME
  • Spaced repetition
slide-13
SLIDE 13

Model-driven tutorials: Examples

13 How do we choose examples?
  • SP-LIME
  • Sp
Spaced repetit itio ion Ribeiro et al. 2016
slide-14
SLIDE 14

Experimental Design & Research Questions

14 R1: Effect of different tutorials

Training Prediction

slide-15
SLIDE 15

Experimental Design & Research Questions

15

RQ1: Effect of different tutorials

Different tutorials No assistance

Training Prediction

slide-16
SLIDE 16

Experimental Design & Research Questions

16

RQ1: Effect of different tutorials

Training

slide-17
SLIDE 17

Experimental Design & Research Questions

17

RQ2: Effect of real-time assistance

Same tutorial (Spaced repetition) Different real-time assistance

Training Prediction

slide-18
SLIDE 18

Experimental Design & Research Questions

18

RQ1: Effect of different tutorials

Training

RQ2: Effect of real-time assistance

Prediction

slide-19
SLIDE 19

Experimental Design & Research Questions

19

Linear model Deep model RQ3 RQ1 & RQ2

slide-20
SLIDE 20

Experimental Design & Research Questions

20

RQ1: Effect of different tutorials

Training

RQ2: Effect of real-time assistance

Prediction

  • RQ3: Effect
  • f model

complexity

slide-21
SLIDE 21

Experimental Design & Research Questions

21

RQ1: Effect of different tutorials

Training

RQ2: Effect of real-time assistance

Prediction

  • RQ3: Effect
  • f model

complexity

Performed qualitative study to improve interface design.

slide-22
SLIDE 22 Model- driven tutorials Human accuracy? 22

Can model-driven tutorials improve human performance without any real-time assistance in the prediction phase?

Research question 1

Training Prediction
slide-23
SLIDE 23 59.2% 57.9% 60.4% 54.6% 50 55 60 65 70 75 80 Spaced repetition + guidelines Spaced repetition Guidelines Control Accuracy (%) p=0.018* p=0.1 23

Tutorials are useful to some extent

# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05
slide-24
SLIDE 24 59.2% 57.9% 60.4% 54.6% 50 55 60 65 70 75 80 Spaced repetition + guidelines Spaced repetition Guidelines Control Accuracy (%) p=0.018* p=0.1 24

Tutorials are useful to some extent

# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05

The tutorial is helpful but it’s just hard not being able to reference it.

“ ”

slide-25
SLIDE 25 25

If not, how do varying levels of real-time assistance in prediction phase affect human performance after training?

Research question 2

Full human agency Full automation

?

slide-26
SLIDE 26 26 Full human agency Full automation Signed explanations Signed explanations + predicted label Signed explanations + predicted label + guidelines Unsigned explanations Signed explanations + predicted label + guidelines + accuracy statement

Information from AI increases from left to right.

Prediction: various levels of real-time assistance

slide-27
SLIDE 27

Prediction: various levels of real-time assistance

27 Full human agency Full automation Signed explanations Signed explanations + predicted label Signed explanations + predicted label + guidelines Unsigned explanations Signed explanations + predicted label + guidelines + accuracy statement
slide-28
SLIDE 28

Unsigned explanations Signed explanations

28
slide-29
SLIDE 29 86 74% 70.7% 57.8% 60.4% 50 60 70 80 90 Machine Signed + predicted label + guidelines + accuracy Signed Unsigned No assistance Accuracy (%) p=0.001*** p=0.001*** 29

Real-time assistance improves performance

# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05
slide-30
SLIDE 30 86 74% 70.7% 57.8% 60.4% 50 60 70 80 90 Machine Signed + predicted label + guidelines + accuracy Signed Unsigned No assistance Accuracy (%) 30

Signed highlights is sufficient

# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05 p>0.05
slide-31
SLIDE 31 86 74% 70.7% 57.8% 60.4% 50 60 70 80 90 Machine Signed + predicted label + guidelines + accuracy Signed Unsigned No assistance Accuracy (%) 31

Gap between human+AI & AI

# of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05 Poursabzi-Sangdeh et al. 2018; Green & Chen 2019; Lage et al. 2019; Lai & Tan 2019; Carton et al. 2020; Lai et al. 2020
slide-32
SLIDE 32 32

Can our results generalize in

  • ther models? How do

model complexity and explanation methods affect human performance with/without training?

Research question 3

Simple model Deep model vs.
slide-33
SLIDE 33

SVM explanations BERT attention explanations

33
slide-34
SLIDE 34

SVM explanations BERT LIME explanations

34
slide-35
SLIDE 35 59.2% 54.1% 64.1% 64.9% 58.2% 72.8% 50 55 60 65 70 75 80 BERT-LIME BERT-ATT SVM Accuracy (%) Training No training

Simple model = better human performance

35 p=0.001*** p=0.001*** # of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05 Lai et al. 2019
slide-36
SLIDE 36

Simple model = better human performance

36 59.2% 54.1% 64.1% 64.9% 58.2% 72.8% 50 55 60 65 70 75 80 BERT-LIME BERT-ATT SVM Accuracy (%) Training No training p=0.001*** p=0.001***
slide-37
SLIDE 37 59.2% 54.1% 64.1% 64.9% 58.2% 72.8% 50 55 60 65 70 75 80 BERT-LIME BERT-ATT SVM Accuracy (%) Training No training

Training leads to better performance

37 p=0.001*** p=0.001*** # of stars indicates p-values ***: p < 0.001 **: p < 0.01 *: p < 0.05 p=0.001***
slide-38
SLIDE 38 Vivian Lai, Han Liu, Chenhao Tan @vivwylai | vivwylai@gmail.com @HanLiuAI | @ChenhaoTan University of Colorado Boulder Website:machineintheloop.com Paper:https://tinyurl.com/model- driven-tutorials Workshop:https://tinyurl.com/harn ess-explanations
  • Tutorials somewhat improve
human performance
  • Explanations from simple
models are preferred
  • Future directions for human-
centered tutorials and explanations

Takeaway

38