Deception Detection in Transcribed Speech and Written Text Rebecca - - PowerPoint PPT Presentation

deception detection in transcribed speech and written text
SMART_READER_LITE
LIVE PREVIEW

Deception Detection in Transcribed Speech and Written Text Rebecca - - PowerPoint PPT Presentation

Deception Detection in Transcribed Speech and Written Text Rebecca Pottenger Background Detecting deception is a difficult problem Human detection accuracy is low For text: on average, correctly classify 47% of lies and 61% of


slide-1
SLIDE 1

Deception Detection in Transcribed Speech and Written Text

Rebecca Pottenger

slide-2
SLIDE 2

Background

  • Detecting deception is a difficult problem
  • Human detection accuracy is low
  • For text: on average, correctly classify 47% of lies and 61% of truths1
  • For transcribed speech: on average, correctly classify 58.2%2
  • Not many automated methods; most existing automated methods

have low accuracy

  • Logistic regression; Linguistic Inquiry and Word Count (LIWC) features; 5-

fold CV – 59% of lies and 62% of truths3

  • Ripper rule induction; Acoustic, LIWC and speaker-dependent features; 10-

fold CV – 66.4%4

  • Naïve Bayes and SVM; bag-of-words; 10-fold CV – 70%5
  • Best method in existing literature: SVM; LIWC and bigram features; 5-fold CV

– 89.8%6

slide-3
SLIDE 3

Questions To Explore

  • Is there an underlying distribution to deceptive

language? What is it?

  • Is this distribution different depending on whether

the person was speaking (i.e. transcribed text) or writing?

  • Can we improve the accuracy of automated

deception detection with better features? What should those features be?

slide-4
SLIDE 4

Dataset

  • 400 truthful (Trip Advisor) and 400 gold-standard

deceptive (Amazon Mechanical Turk) hotel reviews6

  • Michigan State University cheating game7
  • 7 lied about cheating, 9 confessed to cheating, 44 did not

cheat

  • Possibly: Testimony from convicted perjurers and
  • ther cases
slide-5
SLIDE 5

Experimental Methods

1) Re-create existing best method on dataset #2

  • Use variety of supervised algorithms in addition to SVM (Naïve Bayes,

Artificial Neural Networks etc.)

2) Distribution building

  • Identify space of possible features to work with (entire word set,

bigram, LIWC, etc.)

  • Build probability distributions from deceptive and truthful data for

both datasets

3) Use new sets of features to learn the model on dataset #1 and #2

  • Use variety of features as well as variety of supervised algorithms
slide-6
SLIDE 6

Methods of Analysis

  • Maximum Likelihood Estimate to find best fitting

distribution

  • 10-fold cross validation
  • Accuracy, Precision, Recall, F-score
  • Feature weights
slide-7
SLIDE 7

Sources

1) C.F. Bond and B.M. DePaulo. 2006. Accuracy of deception judgments. Personality and Social Psychology Review, 10(3): 214 2)

  • F. Enos, S. Benus, R.L. Cautin, M. Graciarena, J. Hirschberg, and E. Shriberg. Personality Factors in Human Deception

Detection: Comparing Human to Machine Performance. In Proceedings of INTERSPEECH-2006, Pittsburgh, Pennsylvania, USA. 3) M.L. Newman, J.W . Pennebaker, D.S. Berry, and J.M. Richards. 2003. Lying words: Predicting deception from linguistic

  • styles. Personality and Social Psychology Bulletin, 29(5):665.

4)

  • J. Hirschberg, S. Benus, J. Brenier, F. Enos, S. Friedman, S. Gilman, C. Girand, M. Graciarena, A. Kathol, L. Michaelis, B.

Pellom, E. Shriberg, and A. Stolcke. 2005. Distinguishing deceptive from non-deceptive speech. In Proceedings of INTERSPEECH-2005, Lisbon, Portugal. 5)

  • R. Mihalcea and C. Strapparava. 2009. The lie detector: Explorations in the automatic recognition of deceptive language. In

Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 309–312. Association for Computational Linguistics. 6)

  • M. Ott, Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination.

In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 309-319. Association for Computational Linguistics. 7)

  • M. Ali & T. Levine. 2008. The Language of Truthful and Deceptive Denials and Confessions, Communication Reports,

21:2, 82-91.