Personality Driven Di ff erences in Paraphrase Preference Daniel - - PowerPoint PPT Presentation

personality driven di ff erences in paraphrase preference
SMART_READER_LITE
LIVE PREVIEW

Personality Driven Di ff erences in Paraphrase Preference Daniel - - PowerPoint PPT Presentation

Personality Driven Di ff erences in Paraphrase Preference Daniel Preot iuc-Pietro Joint work with Jordan Carpenter (Kenan Institute for Ethics, Duke) Lyle Ungar (Computer & Information Science, UPenn) 3 August 2017 Motivation User


slide-1
SLIDE 1

Personality Driven Differences in Paraphrase Preference

Daniel Preot ¸iuc-Pietro

Joint work with Jordan Carpenter (Kenan Institute for Ethics, Duke) Lyle Ungar (Computer & Information Science, UPenn)

3 August 2017

slide-2
SLIDE 2

Motivation

User attribute prediction from text is successful:

◮ Age (Rao et al. 2010 ACL) ◮ Gender (Burger et al. 2011 EMNLP) ◮ Location (Eisenstein et al. 2010 EMNLP) ◮ Personality (Schwartz et al. 2013 PLoS One) ◮ Impact (Lampos et al. 2014 EACL) ◮ Political Orientation (Volkova et al. 2014 ACL) ◮ Mental Illness (Coppersmith et al. 2014 ACL) ◮ Occupation (Preot

¸iuc-Pietro et al. 2015 ACL)

◮ Income (Preot

¸iuc-Pietro et al. 2015 PLoS One)

slide-3
SLIDE 3

However...

Most text prediction methods uncover topical differences

relative frequency

a

a

a

correlation strength

Openness to Experience

slide-4
SLIDE 4

However...

Most text prediction methods uncover topical differences

relative frequency

a

a

a

correlation strength

Extraversion

slide-5
SLIDE 5

Stylistic differences

We need to be aware of style differences, rather than topical Not useful for many practical applications that adapt to traits:

◮ machine translation (Mirkin et al. 2015 EMNLP, Rabinovich et al 2017 EACL) ◮ agents (e.g. customer service, tutoring) ◮ controlling for gender or racial bias

slide-6
SLIDE 6

Stylistic differences

One type of stylistic difference is phrase choice in context. Splendid Excellent Remarkable

Source: https://wikispaces.psu.edu/display/P5PFL/TRAIT+Theory+Page

Openness Magnificent Fabulous Tremendous

Source: http://inwilmingtonde.com/events/thanksgiving-eve-karaoke

Extraversion

slide-7
SLIDE 7

Data

We study the Big Five personality traits:

◮ 115,312 Facebook users ◮ Personality scores obtained through the MyPersonality

app (Kosinski et al, 2013)

◮ For each trait, take top and bottom 20% of users

slide-8
SLIDE 8

Paraphrasing

Paraphrases – alternative ways to convey the same information Paraphrase Database (PPDB) 2.0 (Pavlick et al. 2015 ACL):

◮ annotated with type and confidence (filter ‘equivalent’

paraphrases with >.2 confidence)

◮ >6M automatically derived paraphrase pairs ◮ we use only 1–3 grams ◮ difference in a pair more than just change of stopwords or

root form of word

slide-9
SLIDE 9

Prediction

.603 .551 .519 .551 .549 .573 .589 .578 .553 .590 .623 .639 .597 .593 .631 0.50 0.55 0.60 0.65 Openness Conscientiousness Extraversion Agreeableness Neuroticism

Paraphrases only Phrases w/o paraphrases All Phrases

Accuracy, Naive Bayes, 90-10 training-testing, balanced data

slide-10
SLIDE 10

Quantifying Preference

Straightforward measure: Extraversion(w) = log Extravert(w) Introvert(w)

  • (1)

Within a paraphrase pair (w1, w2), the difference Extraversion(w1) − Extraversion(w2) is the stylistic distance. Used previously to study paraphrase preference across age, gender and occupational class (Preot

¸iuc-Pietro, Xu & Ungar, AAAI 2016).

slide-11
SLIDE 11

Linguistic Theories

Study which attributes of words in a pair are preferred by one group:

◮ Word Length in Characters ◮ Word Length in Syllables

Simple proxies for word complexity

◮ Affective Norms: Valence, Arousal, Dominance

14k rated words Valence: suicide (0.15) → bacon (0.70) → laughter (1)

◮ Concreteness

40k rated words: spirituality (1) → morning (3.44) → tiger (5)

◮ Age of Acquisition

30k rated words: great (5.05) → splendid (7.22) → tremendous (10.63)

◮ More in the paper ...

slide-12
SLIDE 12

Linguistic Theories

.163

  • .068
  • .043
  • .012
  • .041

.067 .182

  • .002
  • .014

.036

  • .001

.050 .045 .097

  • .060

.010 .031 .028 .050 .047 .080

  • .032
  • .007

.030 .005 .040 .016 .010

  • .014

.023 .000

  • .024

.004

  • .020
  • .065
  • .200
  • .150
  • .100
  • .050

.000 .050 .100 .150 .200 Age of Acquisition Concreteness Dominance Arousal Happiness #Syllables Word Length Openess Conscientiousness Extraversion Agreeableness Neuroticism

Correlation coefficients between paraphrase pair preference and user group usage.

slide-13
SLIDE 13

Linguistic Theories

.163

  • .068
  • .043
  • .012
  • .041

.067 .182

  • .002
  • .014

.036

  • .001

.050 .045 .097

  • .060

.010 .031 .028 .050 .047 .080

  • .032
  • .007

.030 .005 .040 .016 .010

  • .014

.023 .000

  • .024

.004

  • .020
  • .065
  • .200
  • .150
  • .100
  • .050

.000 .050 .100 .150 .200 Age of Acquisition Concreteness Dominance Arousal Happiness #Syllables Word Length Openess Conscientiousness Extraversion Agreeableness Neuroticism

Correlation coefficients between paraphrase pair preference and user group usage.

slide-14
SLIDE 14

Take Aways

◮ Stylistic difference between user groups have important

applicability

◮ Paraphrase choice contains valuable information ◮ Shed light on psycholinguistic theories ◮ Potential way to generate text perceived to be from a

different user trait

See our EMNLP 2017 paper (Preot ¸iuc-Pietro, Guntuku, Ungar - Controlling Human Perception of Basic User Traits)

slide-15
SLIDE 15

Thank you! Questions?