How Do Visual Explanations Foster End Users Appropriate Trust In - - PowerPoint PPT Presentation

how do visual explanations foster end users appropriate
SMART_READER_LITE
LIVE PREVIEW

How Do Visual Explanations Foster End Users Appropriate Trust In - - PowerPoint PPT Presentation

Honorable Mention at 25th International Conference on Intelligent User Interfaces (IUI '20) How Do Visual Explanations Foster End Users Appropriate Trust In Machine Learning? Fumeng Yang 1 , Zhuanyi (Yi) Huang 2 , Jean Scholtz 3 , and Dustin


slide-1
SLIDE 1

Fumeng Yang1, Zhuanyi (Yi) Huang2, Jean Scholtz3, and Dustin L. Arendt2

How Do Visual Explanations Foster End Users’ Appropriate Trust In Machine Learning?

1 Fumeng Yang is with Brown University. She conducted this research as a Ph.D. Intern at Pacific Northwest National Laboratory. 2 Zhuanyi Huang and Dustin L. Arendt are with Pacific Northwest National Laboratory. 3 Jean Scholtz retired from Pacific Northwest National Laboratory in September 2018.

Honorable Mention at 25th International Conference on Intelligent User Interfaces (IUI '20)

Brown Visual Computing Seminar | May 11, 2020

slide-2
SLIDE 2
  • Visual explanations improve end users’ trust in an automated system.
  • Such trust must be appropriate.
  • The design of visual explanations affects users’ appropriate trust.

Highlights

  • 2
slide-3
SLIDE 3

“Human-computer Trust is defined in this study to

be, the extent to which a user is confident in, and willing to act on the basis of, the recommendations, actions, and decisions of an artificially intelligent decision aid. “ Madsen and Gregor

  • 3
Madsen, M., & Gregor, S. (2000, December). Measuring human-computer trust. In 11th australasian conference on information systems (Vol. 53, pp. 6-8).
slide-4
SLIDE 4

Appropriate Trust is the alignment

between the perceived and actual performance of the system.

  • 4
McBride, M., & Morgan, S. (2010). Trust calibration for automated decision aids. Institute for Homeland Security Solutions.[Online]. Available: https://www. ihssnc. org/portals/0/Documents/VIMSDocuments/McBride_Research_Brief. pdf. McGuirl, J. M., & Sarter, N. B. (2006). Supporting trust calibration and the effective use of decision aids by presenting dynamic system confidence information. Human factors, 48(4), 656-665. Marsh, S., & Dibben, M. R. (2005, May). Trust, untrust, distrust and mistrust–an exploration of the dark (er) side. In International conference on trust management (pp. 17-33). Springer, Berlin, Heidelberg. de Visser, E. J., Cohen, M., Freedy, A., & Parasuraman, R. (2014, June). A design methodology for trust cue calibration in cognitive agents. In International conference on virtual, augmented and mixed reality (pp. 251-262). Springer, Cham.
slide-5
SLIDE 5

Appropriate trust Appropriate trust Overtrust Undertrust

Not follow Follow Correct Incorrect

  • 5
Marsh, S., & Dibben, M. R. (2005, May). Trust, untrust, distrust and mistrust–an exploration of the dark (er) side. In International conference on trust management (pp. 17-33). Springer, Berlin, Heidelberg.

Appropriate Trust

User Decision System Recommendation

slide-6
SLIDE 6
  • 6

Example: My trust in an iRobot

My confidence in that it could clean the floor, my willingness to get it do the work;

  • vertrust is when I think it would avoid hitting the wall, but it does not;

undertrust is when I think it would hit the wall, but it makes a turn.

slide-7
SLIDE 7

Goals

  • The relationship between users' trust in a system and visual explanations;
  • The effects of different visualization designs on users' trust in machine learning;
  • An understanding of users' appropriate trust for proper usage of an automated system.
  • 7
slide-8
SLIDE 8

Experiment

  • Materials Example-based explanation
  • Experimental variables Instance representation, Spatial layout
  • Measures Appropriate trust metrics, usability, individual differences
  • Task Assistant botanists and classify leaves aided by classifiers

with or without visual explanations

  • 8
slide-9
SLIDE 9

Example-based Explanation

  • k-nearest neighbors graph
  • Internal representation of the training set
  • Minkowski distance
  • A shortest path tree rooted at the input node
  • Prune until only leaves may have a different class

from the input node

?

Nashville Knoxville Chattanooga Jackson Memphis Jonesboro

ARKANSAS KENTUCKY TENNESSEE

Fayetteville Paducah Lexington Bowling Green Fort Smith Clarksville Little Rock

“Escape Routes”

The shortest paths to travel to another state (class)

  • 9
slide-10
SLIDE 10

Instance Representation

To represent each instance in a dataset

Rose charts (Roses) for feature vector Images

  • 10
  • 10
slide-11
SLIDE 11

Spatial Layout

To arrange instances and illustrate the relationship between them

Grid

?

Sort instances within a column by their weighted geodesic distance to the input node

Tree

?

Use a layered graph layout of the pruned shortest path tree

Graph

?

Use a force-directed layout algorithm to arrange instances based on their connections

  • 11
slide-12
SLIDE 12

Examples

Grid Tree Graph

  • 12
slide-13
SLIDE 13

Grid Tree Graph

  • 13

Examples

slide-14
SLIDE 14
  • 14

Interface & Task

  • 14
slide-15
SLIDE 15

Measuring Trust in the Classifier

“Participants’ willingness to follow the recommendation and their self-confidence in the decision.”

  • Will you follow this recommendation?
  • How do you feel about your decision above?
  • Was the explanation helpful in making the decision above?
  • A linear ''Trust Meter'' ranged from -100 to +100
  • 15
slide-16
SLIDE 16

Experimental Design

A series of trials

27 trials for each condition 20 correct, 7 incorrect = 74% vs. classifier 71% a fixed sequence by MC with randomized instances

A complete within-subjects design

Each participant finished two instance representations on two different days three layouts and a control condition (no explanation) e.g., tree + roses, none + images

33 participants from PNNL

19 female, 14 male 16 data scientists, 17 others

  • 16
slide-17
SLIDE 17

Data Collected

Trust Measures

Appropriate trust - correct decision rate Overtrust - follow an incorrect recommendation Undertrust - not follow a correct recommendation Self-confidence

Perceived helpfulness Trust meter 8,184 / 7,128 trials

= (3+1) layout conditions x 2 representations x 27 trials x 33 participants

  • 17
  • 17
slide-18
SLIDE 18

Analyses and Results

Methods bootstrapped 95% CIs, effect sizes, mixed-effects models for individual differences, aggregated each participant, and subtracted within participants

  • 18

Interpretation Summarizing all confidence intervals Research Questions Five research questions (four for this talk)

slide-19
SLIDE 19

−0.2 −0.1 0.0 0.1 0.2

grid/tree/graph − none

graph − none tree − none grid − none

grid/tree/graph − none

graph − none tree − none grid − none (a) Appropriate trust

0.2 0.1 0.0 −0.1 −0.2

(b) Overtrust

0.2 0.1 0.0 −0.1 −0.2

(c) Undertrust images roses

−2 −1 1 2

(d) Self−confidence images roses grid tree graph mean and 95%CI Differences by subtraction “better”

RQ1 Do our visual explanations foster more appropriate trust?

  • 19

All our visual explanations largely increase appropriate trust, decrease overtrust and underthrust, and improve self-confidence.

  • 19
slide-20
SLIDE 20

−0.1 0.0 0.1

graph − grid tree − graph grid − tree graph − grid tree − graph grid − tree (a) Appropriate trust

0.1 0.0 −0.1

(b) Overtrust

0.1 0.0 −0.1

(c) Undertrust

−1 1

(d) Self−confidence

−1 1

(e) Helpfulness images roses images roses mean and 95%CI Differences by subtraction “better” grid tree graph

RQ2 How did the three spatial layouts (grid, tree, and graph) affect users’ trust?

Images: grid explanations are slightly more helpful than tree explanations, which are slightly more helpful than graph explanations.

  • 20

Roses: tree and graph explanations, especially tree, lead to more appropriate trust than grid explanations.

slide-21
SLIDE 21

graph tree grid grid/tree/graph none

0.0 0.1 0.2 0.3

(a) Appropriate trust

0.0 −0.1 −0.2 −0.3

(b) Overtrust

0.0 −0.1 −0.2 −0.3

(c) Undertrust

1 2 3

(d) Self−confidence

1 2 3

(e) Helpfulness grid tree graph mean and 95%CI images - roses Differences by subtraction “better”

RQ3 How did the two instance representations (images and roses) affect users’ trust?

Image-based explanations outperform rose-based explanations on all the dimensions.

  • 21
slide-22
SLIDE 22

grid/tree/graph cf. none images cf. roses leaf familiarity non- cf. scitists propensity

−0.3 −0.2 −0.1 0.0 0.1 0.2

(a) Appopriate trust

0.3 0.2 0.1 0.0 −0.1 −0.2

(b) Overtrust

0.3 0.2 0.1 0.0 −0.1 −0.2

(c) Undertrust

−3 −2 −1 1 2

(d) Self−confidence “better” Coefficients of fixed effects

RQ4 How did individual differences (e.g., expert users vs. non-expert users, prior knowledge, and propensity to trust) affect users’ trust?

The strongest effects come from the two experimental variables: images outperform roses; having a visual explanation outperforms no explanation.

  • 22

The only exception is that non-expert users seem to have more confidence in their decisions.

slide-23
SLIDE 23

Use a grid layout if the representation is easy to understand; Use a tree layout if the representation is difficult to read or its usability is unknown. Future research should consider appropriate trust, instead of simply measuring an increase in users' trust. Overtrust and undertrust should be avoided.

Summary & Takeaways

Understanding and trust are relevant but different.

  • 23
slide-24
SLIDE 24

Thank You

Fumeng Yang Zhuanyi (Yi) Huang Jean Scholtz Dustin L. Arendt fy@brown.edu zhuanyi.huang@pnnl.gov jean.scholtz@pnnl.gov dustin.arendt@pnnl.gov

“HOW DO VISUAL EXPLANATIONS FOSTER END USERS’ APPROPRIATE TRUST IN MACHINE LEARNING?”

  • 24

http://www.fmyang.com/projs/ml-trust