On Human Predictions with Explanations and Predictions of Machine - - PowerPoint PPT Presentation

▶

Oct 22, 2022 141 likes •395 views

On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection Vivian Lai and Chenhao Tan @vivwylai | @chenhaotan vivlai.github.io | chenhaot.com University of Colorado Boulder

SLIDE 1

On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection

Vivian Lai and Chenhao Tan @vivwylai | @chenhaotan vivlai.github.io | chenhaot.com University of Colorado Boulder deception.machineintheloop.com

SLIDE 2

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Risk assessment: COMPAS

SLIDE 3

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Most previous studies are concerned with the impact of such tools used in full automation

SLIDE 4

Judges are required to take account of the algorithm’s limitations in Wisconsin In the end, though, Justice Bradley allowed sentencing judges to use Compas. They must take account of the algorithm's limitations and the secrecy surrounding it, she wrote, but she said the software could be helpful ”in providing the sentencing court with as much information as possible in order to arrive at an individualized sentence.”

https://www.nytimes.com/2017/05/01/us/politics/sent-to-prison-by-a-software-programs-secret-algorithms.html

SLIDE 5

Full automation is not desired

SLIDE 6

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

How judges make decisions with COMPAS?

SLIDE 7

How humans make decisions with machine assistance in challenging tasks?

Full human agency Full automation

SLIDE 8

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

A spectrum between full human agency and full automation

Full human agency Full automation

SLIDE 9

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Deception Detection as a Case Study

87% ~50%

SLIDE 10

I would not stay at this hotel again. The rooms had a fowl

dor. It seemed as though the carpets have never been
cleaned. The neighborhood was also less than desirable.

The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.

SLIDE 11

I would not stay at this hotel again. The rooms had a fowl

dor. It seemed as though the carpets have never been
cleaned. The neighborhood was also less than desirable.

The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.

SLIDE 12

The machine predicts that the below review is deceptive I would not stay at this hotel again. The rooms had a fowl

dor. It seemed as though the carpets have never been
cleaned. The neighborhood was also less than desirable.

The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.

SLIDE 13

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Can explanations alone improve human performance?

SLIDE 14

87% 57.6% 55.9% 54.4% 51.1% 45 55 65 75 85 Machine Heatmap Highlight Examples Control

p=0.006 p<0.001

Explanations alone slightly improve human performance

Accuracy (%)

p=0.056

SLIDE 15

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Predicted labels > explanations

SLIDE 16

87% 74.6% 61.9% 57.6% 51.1% 45 55 65 75 85 Machine Predicted label with accuracy Predicted label without accuracy Heatmap Control

Explicit accuracy improve human performance drastically

Accuracy (%)

p<0.001 p<0.001 p<0.001

SLIDE 17

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Tradeoff between human performance and human agency

Higher agency, lower performance Lower agency, higher performance

SLIDE 18

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Can explanations moderate this tradeoff?

SLIDE 19

87% 74.6% 72.5% 61.9% 45 55 65 75 85 Machine Predicted label with accuracy Predicted label & heatmap Predicted label without accuracy

Predicted labels + explanations ≈ explicit accuracy

Accuracy (%)

p<0.001 p<0.001

SLIDE 20

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

How much do humans trust the predictions?

SLIDE 21

79.6% 78.7% 64.4% 45 55 65 75 85 Predicted label with accuracy Predicted label & heatmap Predicted label without accuracy

Explanations help increase humans trust on predictions

Trust (%)

p<0.001 p<0.001

SLIDE 22

69.8% 74.1% 60% 81.1% 79.4% 65.1% 45 55 65 75 85 Predicted label with accuracy Predicted label & heatmap Predicted label without accuracy Correct Incorrect

Humans are more likely to trust predictions when they are correct

Trust (%)

SLIDE 23

Other analysis

Showing varying accuracies Heterogeneity between participants

50 60 70

SLIDE 24

Showing machine predicted labels Showing machine predicted labels and suggesting high accuracy Higher agency, lower performance Lower agency, higher performance

On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection

Vivian Lai and Chenhao Tan @vivwylai | @chenhaotan vivlai.github.io | chenhaot.com University of Colorado Boulder deception.machineintheloop.com

Risk assessment: COMPAS

Most previous studies are concerned with the impact of such tools used in full automation

Full automation is not desired

How judges make decisions with COMPAS?

How humans make decisions with machine assistance in challenging tasks?

Full human agency Full automation

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

A spectrum between full human agency and full automation

Full human agency Full automation

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Deception Detection as a Case Study

87% ~50%

I would not stay at this hotel again. The rooms had a fowl

The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.

I would not stay at this hotel again. The rooms had a fowl

The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.

The machine predicts that the below review is deceptive I would not stay at this hotel again. The rooms had a fowl

The housekeepers seemed to be snooping around while they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted.

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Can explanations alone improve human performance?

87% 57.6% 55.9% 54.4% 51.1% 45 55 65 75 85 Machine Heatmap Highlight Examples Control

Explanations alone slightly improve human performance

Accuracy (%)

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Predicted labels > explanations

87% 74.6% 61.9% 57.6% 51.1% 45 55 65 75 85 Machine Predicted label with accuracy Predicted label without accuracy Heatmap Control

Explicit accuracy improve human performance drastically

Accuracy (%)

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Tradeoff between human performance and human agency

Higher agency, lower performance Lower agency, higher performance

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

Can explanations moderate this tradeoff?

87% 74.6% 72.5% 61.9% 45 55 65 75 85 Machine Predicted label with accuracy Predicted label & heatmap Predicted label without accuracy

Predicted labels + explanations ≈ explicit accuracy

Accuracy (%)

Showing machine predicted labels Showing machine predicted labels and explanations Showing machine predicted labels and suggesting high accuracy Showing only explanations (by highlighting salient information)

How much do humans trust the predictions?

79.6% 78.7% 64.4% 45 55 65 75 85 Predicted label with accuracy Predicted label & heatmap Predicted label without accuracy

Explanations help increase humans trust on predictions

Trust (%)

69.8% 74.1% 60% 81.1% 79.4% 65.1% 45 55 65 75 85 Predicted label with accuracy Predicted label & heatmap Predicted label without accuracy Correct Incorrect

Humans are more likely to trust predictions when they are correct

Trust (%)

Other analysis

Showing varying accuracies Heterogeneity between participants

50 60 70

Showing machine predicted labels Showing machine predicted labels and suggesting high accuracy Higher agency, lower performance Lower agency, higher performance

Vivian Lai and Chenhao Tan @vivwylai | @chenhaotan vivlai.github.io | chenhaot.com University of Colorado Boulder deception.machineintheloop.com

Takeaway

Explanations alone only slightly improve human performance Explanations can moderate the tradeoff