Fools Gold: Understanding the Linguistic Features of Deception and - - PowerPoint PPT Presentation

fools gold
SMART_READER_LITE
LIVE PREVIEW

Fools Gold: Understanding the Linguistic Features of Deception and - - PowerPoint PPT Presentation

Fools Gold: Understanding the Linguistic Features of Deception and Humour Through April Fools Hoaxes Ed Dearden e.dearden@lancaster.ac.uk Hell Planet Why do we care about April Fools? False Information But where does April


slide-1
SLIDE 1

Fools’ Gold:

Understanding the Linguistic Features of Deception and Humour Through April Fools’ Hoaxes

e.dearden@lancaster.ac.uk

Ed Dearden

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19

Hell Planet

slide-20
SLIDE 20

Why do we care about April Fools’?

slide-21
SLIDE 21

False Information

slide-22
SLIDE 22

But where does April Fools’ day fit into this?

slide-23
SLIDE 23

April Fools’ Day

slide-24
SLIDE 24

What’s the Difference?

slide-25
SLIDE 25

What’s the Difference?

slide-26
SLIDE 26

Deceptive Intent:

Is the author trying to deceive me?

Not Deceive? Deceive?

slide-27
SLIDE 27

Research Questions

slide-28
SLIDE 28

What are the Linguistic features of an April Fools’ article compared to regular news?

slide-29
SLIDE 29

How similar are the features of April Fools’ to those of “Fake News”?

slide-30
SLIDE 30

I need some background

slide-31
SLIDE 31

Deception

  • Exaggeration.
  • Vagueness.
  • Details.
slide-32
SLIDE 32

Humour

  • Contextual Imbalance.
  • Emotional Language.
  • Ambiguity.
slide-33
SLIDE 33

Irony

  • Part humour, part deception.
  • Negative Emotional Language.
  • Polarity Contrast.
slide-34
SLIDE 34

How about the data?

slide-35
SLIDE 35

Catching Fools’!

  • 519 April Fools’ articles.
  • 371 websites.
  • 213776 words.
  • 2004-2018
slide-36
SLIDE 36

Matching Fools’!

  • 519 regular news articles.
  • 240 Websites.
  • 344927 Words.
  • 2004-2018
slide-37
SLIDE 37

Fake News!

  • Flagged as fake by Buzzfeed.
  • 2016 Election.
  • Horne and Adali, 2017.
slide-38
SLIDE 38

But what are you going to do with it?

slide-39
SLIDE 39

Building a feature set

Vagueness Imagination Humour Deception Details Formality Complexity

slide-40
SLIDE 40

Vagueness CLAWS Ambiguity Superlatives Exaggeration Comparative Adverbs Degree Adverbs Vague Degree USAS Ambiguity Wordnet Ambiguity

slide-41
SLIDE 41

Details Time Related Spatial Terms Dates Numbers Proper Nouns Sense Terms Motion Terms

slide-42
SLIDE 42

Imagination Imaginative Conjunctions Articles Adjectives Imaginative Determiners Prepositions Informative Verbs Imaginative Verbs

slide-43
SLIDE 43

Deception Negative Emotional Terms Negations First Person Pronouns

slide-44
SLIDE 44

Humour Positive Emotion Body Contextual Imbalance Profanity Alliteration Relationships Head Contextual Imbalance

slide-45
SLIDE 45

Formality Associated Press Title Guidelines Spelling Errors Associated Press Date Guidelines Associated Press Number Guidelines

slide-46
SLIDE 46

Complexity Average Sentence Length Lexical Diversity Lexical Density Function Words Readability Body Punctuation Head Punctuation

slide-47
SLIDE 47

Corpus Create feature matrix Feature Selection Classification Analysis

Feature 1 … Feature N Class 0.111 … 0.552 AF … … … … 0.444 … 0.654 NAF

Which features are most informative? Can we learn to automatically differentiate? What do the results mean?

slide-48
SLIDE 48
  • Chi-squared test
  • ANOVA
  • Mutual Information
  • Recursive Feature Elimination
  • Logistic Regression Coefficients

Feature Selection

slide-49
SLIDE 49

Feature Selection

Compl plexity ty

  • Avg Sentence Length
  • Body Punctuation
  • Readabili

lity

  • Lexical

l Diversity

Details ls

  • Time Rela

lated Term rms

  • Sense Terms
  • Proper Nouns

Im Imagination

  • Preposition
  • Adjectives
  • Imagination Conjunctions

Decepti tion

  • First Person

n Prono nouns uns

Formali lity ty

  • Associated Press Date
  • Associated Press Number

Vag agueness

  • Degree Adverb

rbs

slide-50
SLIDE 50

Classification

Feature 1 … Feature N Class 0.111 … 0.552 AF … … … … 0.444 … 0.654 NAF Artjcle Predictjon Truth 1 AF AF 2 NAF AF … … … n-1 NAF NAF n AF NAF

slide-51
SLIDE 51

Classification Accuracies for all Feature Sets

Hoax Set: 74% Bag-of-Words: 80% Complexity: 71% + Detail

slide-52
SLIDE 52

What are we seeing so far?

slide-53
SLIDE 53

Our feature set can differentiate between hoax and genuine.

slide-54
SLIDE 54

Most individual feature groups don’t do so well.

slide-55
SLIDE 55

Complexity and Detail are Important.

slide-56
SLIDE 56

How does this compare to Fake News?

slide-57
SLIDE 57

Classifying Fakes

1. One classifier trained on Fake News. 2. Second Classifier trained on April Fools’ and tested on Fake News.

slide-58
SLIDE 58

Classification Accuracies for Fake News

Hoax Set: 76.9% Bag-of-Words: 77.7% Complexity: 78.1% + Detail

slide-59
SLIDE 59

Classification Accuracies for Fake News

Hoax Set: 64.5% Bag-of-Words: 49.4% Complexity: 65.7% + Detail Complexity: 75.7%

slide-60
SLIDE 60

What does this suggest?

slide-61
SLIDE 61

Our feature set differentiates fake news similarly well to April Fools’.

slide-62
SLIDE 62

Some feature groups perform much worse.

slide-63
SLIDE 63

Complexity and Detail remain the most important feature groups.

slide-64
SLIDE 64

Our classifier trained

  • n AF seems to work

(to some extent) on Fake News.

slide-65
SLIDE 65

But what does the data say?

slide-66
SLIDE 66

Readability (Complexity)

slide-67
SLIDE 67

Lexical Diversity (Complexity)

slide-68
SLIDE 68

Time Related Vocabulary (Detail)

slide-69
SLIDE 69

Proper Nouns (Detail)

slide-70
SLIDE 70

Dates (Detail)

slide-71
SLIDE 71

First Person Pronouns (Deception)

slide-72
SLIDE 72

Can you sum it all up?

slide-73
SLIDE 73

Conclusions – Part 1

  • Cr

Created a a ne new c corp rpus us of April F ril Fools ls’ h hoax axes.

  • Used features from deception, humour, and irony

detection to classify hoaxes with moderate success.

  • Showed that features relating to complexity and detail

seem to be the most important.

slide-74
SLIDE 74

 Created a new corpus of April Fools’ hoaxes.  Us

Used f d features f from d m deceptio tion, humo umour, an r, and ir d irony ny de detectio tion t n to c clas assif ify h hoax axes w with ith mo mode derate s suc uccess.

 Showed that features relating to complexity and detail

seem to be the most important.

Conclusions – Part 1

slide-75
SLIDE 75
  • Created a new corpus of April Fools’ hoaxes.
  • Used features from deception, humour, and irony

detection to classify hoaxes with moderate success.

  • Sh

Showed d that f t featu atures r rela lating ting t to compl mplexit ity an and d d detail ail seem t m to be be the mo most t impo important. ant.

Conclusions – Part 1

slide-76
SLIDE 76

Conclusions – Part 2

  • Found

und th that s at simil imilar ar featur atures ar are us useful in l in ide identif tifyin ying Apr April il Fools ls’ an ’ and d Fake Ne News.

  • Some of these features manifest themselves similarly

for both AF Hoaxes and Fake News.

slide-77
SLIDE 77

Conclusions – Part 2

  • Found that similar features are useful in identifying April

Fools’ and Fake News.

  • So

Some o

  • f th

these featur tures ma manif nifest th t thems mselv lves s simil imilarly arly for bo both th AF H AF Hoax axes and and F Fak ake N News.

slide-78
SLIDE 78

Future Work

slide-79
SLIDE 79

Thanks for listening!

Questions? e.dearden@lancaster.ac.uk