Fabio Celli fabio.celli@unitn.it Unsupervised Personality - - PowerPoint PPT Presentation

fabio celli
SMART_READER_LITE
LIVE PREVIEW

Fabio Celli fabio.celli@unitn.it Unsupervised Personality - - PowerPoint PPT Presentation

Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications Sep 16, 2014 Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality?


slide-1
SLIDE 1 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Unsupervised Personality Recognition from Text: Possible Applications

slide-2
SLIDE 2 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Unsupervised Personality Recognition from Text: Possible Applications

  • what is personality?
slide-3
SLIDE 3 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Unsupervised Personality Recognition from Text: Possible Applications

  • what is personality?
  • what is personality recognition?
slide-4
SLIDE 4 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Unsupervised Personality Recognition from Text: Possible Applications

  • what is personality?
  • what is personality recognition?
  • how can we recognize personality from text?
slide-5
SLIDE 5 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Unsupervised Personality Recognition from Text: Possible Applications

  • what is personality?
  • what is personality recognition?
  • how can we recognize personality from text?
  • how can we recognize it in an unsupervised way?
slide-6
SLIDE 6 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Unsupervised Personality Recognition from Text: Possible Applications

  • what is personality?
  • what is personality recognition?
  • how can we recognize personality from text?
  • how can we recognize it in an unsupervised way?
  • which applications?
slide-7
SLIDE 7 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Personality describes persistent human behavioral responses to broad classes of environmental stimuli. [Adelstein et al 2011]

“ ”

slide-8
SLIDE 8 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

The Big 5 factor theory

slide-9
SLIDE 9 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

The Big 5 factor theory

  • self assessments
  • observed assessments (+agreement)
slide-10
SLIDE 10 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

The Big 5 factor theory

  • self assessments
  • observed assessments (+agreement)
  • 100 item test
  • 50 item test
  • 44 item test
  • 10 item test

Ground truth

slide-11
SLIDE 11 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

X Y X X X X

personality recognition
slide-12
SLIDE 12 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990]

slide-13
SLIDE 13 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990] [Mairesse et Al. 2007] predict classes f ~.57% better predict scores rae ~.97% bad

x e a c

slide-14
SLIDE 14 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990] 5 classifiers (one per trait) predict binary classes or scores

slide-15
SLIDE 15 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

X Y X X X X

personality recognition from text
slide-16
SLIDE 16 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Approaches to Personality Recognition from text Bottom-Up approach Search for patterns associated to Personality trait poles labeled text extraling. feature pattern Top-down approach Exploit lexical resources as features, finding correlations with personality trait poles resource labeled text correlation

[Oberlander & Nowson 2006] [Iacobelli et al 2011] [Mairesse et al 2007] [Scwartz et al 2013]
slide-17
SLIDE 17 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Approaches to Personality Recognition from text Bottom-Up approach Search for patterns associated to Personality trait poles labeled text extraling. feature pattern Top-down approach Exploit lexical resources as features, finding correlations with personality trait poles resource labeled text correlation Mixed approach Use many resources (sentiment, Psycholinguistic, semantic) + word patterns + feature selection

[Oberlander & Nowson 2006] [Iacobelli et al 2011] [Mairesse et al 2007] [Scwartz et al 2013] [Markovikj et al 2013]
slide-18
SLIDE 18 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Approaches to Personality Recognition from text 5 classifiers (one per trait) predict binary classes or scores Large feature space, reuced with feature selction

slide-19
SLIDE 19 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

X Y X X X X

Unsupervised personality recognition from text
slide-20
SLIDE 20 Sep 16, 2014

Unsupervised personality recognition from text We need:

  • unlabeled text + authors (many texts per author)
  • small labeled test set
  • correlations between language and personality

Fabio Celli

fabio.celli@unitn.it

In literature: 3 classes: high, (y) mid, (o) low (n) 2 classes: high (y) low (n)

slide-21
SLIDE 21

Author1 text1-1 Author2 text2-1 Author1 text1-2 AuthorN textN-1 Author2 text2-2 AuthorN textN-2 ...

Sep 16, 2014

Unsupervised personality recognition from text We need:

  • unlabeled text + authors (many texts per author)
  • small labeled test set
  • correlations between language and personality

Fabio Celli

fabio.celli@unitn.it
slide-22
SLIDE 22

data distrib

sample data Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-23
SLIDE 23

data distrib

sample data post > onoon post > nnnoy post > noyny Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-24
SLIDE 24

conf

  • noon

noyyo user1 post ynnyn user1 post nnyon user2 post nnyyy user2 post nyyyo data distrib

sample data post > onoon post > nnnoy post > noyny Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-25
SLIDE 25
  • noon

noyyo

Sep 16, 2014

Test set evaluate

Fabio Celli

fabio.celli@unitn.it
slide-26
SLIDE 26 supervised approaches to Computational Personality Recognition Sep 16, 2014

labeled data model new unseen data Problems of supervised: 1) overfitting → social network data samples are too small to extract good models and bottom up approaches extract very few good patterns 2) multilinguality → top down approaches use language dependent resources data

features (bottom-up
  • r top-down)

Fabio Celli

fabio.celli@unitn.it
slide-27
SLIDE 27 supervised approaches to Computational Personality Recognition

labeled data model new unseen data Problems of supervised: 1) overfitting → social network data samples are too small to extract good models and bottom up approaches extract very few good patterns 2) multilinguality → top down approaches use language dependent resources data

features (bottom-up
  • r top-down)
Sep 16, 2014

Domain adaptation is a learning problem where a model is generalized across domains, and it is successfull when it minimizes the difference

  • f performance from a source to a target domain

[BenDavid et Al. 2006] model source domain sample data

features (bottom-up
  • r top-down)

target domain

=

Avantage of unsupervised Personality recognition:

  • domain

adaptability

Fabio Celli

fabio.celli@unitn.it
slide-28
SLIDE 28 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

We added a part of the algorithm (semi-supervised). We explot the high confidence predictions from the unsupervised system to label an unlabeled large training set and extract n-grams from there that we add to the initial correlation set

High conf labels Large Unlabeled set n-grams

slide-29
SLIDE 29 Sep 16, 2014

High conf labels Large Unlabeled set n-grams Two different datasets essays fb essays

[Pennebaker & King 1999] [Mairesse et Al. 2007] is a big collection of stream
  • f consciousness writings
  • f studentswho took
the big5. Lang: English Unlabeled= ~2000 users Test= ~200 users

PersFB

[Celli & Polonio (2013)] is a small collection
  • f Facebook statuses
  • f students
who took the big5. Lang: Italian. Unlabeled= ~200 users Test= ~30 users

Fabio Celli

fabio.celli@unitn.it
slide-30
SLIDE 30 Sep 16, 2014

High conf labels Large Unlabeled set n-grams Two different datasets many different correlation sets:

  • MRC (mairesse et al 2007)
  • LIWC (mairesse et al 2007)
  • lang.indep (mairesse et al 2007)
  • LIWC (golbek et al 2011)
  • n-grams (iacobelli et al 2011)
  • n-grams (from unlbeleld text)

essays fb essays

[Pennebaker & King 1999] [Mairesse et Al. 2007] is a big collection of stream
  • f consciousness writings
  • f studentswho took
the big5. Lang: English Unlabeled= ~2000 users Test= ~200 users

PersFB

[Celli & Polonio (2013)] is a small collection
  • f Facebook statuses
  • f students
who took the big5. Lang: Italian. Unlabeled= ~200 users Test= ~30 users

Fabio Celli

fabio.celli@unitn.it
slide-31
SLIDE 31 Sep 16, 2014

many different correlation sets:

  • MRC (mairesse et al 2007)
  • LIWC (mairesse et al 2007)
  • lang.indep (mairesse et al 2007)
  • LIWC (golbek et al 2011)
  • n-grams (iacobelli et al 2011)
  • n-grams (from unlbeleld text)

Fabio Celli

fabio.celli@unitn.it
slide-32
SLIDE 32 Sep 16, 2014

many different correlation sets:

  • MRC (mairesse et al 2007)
  • LIWC (mairesse et al 2007)
  • lang.indep (mairesse et al 2007)
  • LIWC (golbek et al 2011)
  • n-grams (iacobelli et al 2011)
  • n-grams (from unlbeleld text)

Fabio Celli

fabio.celli@unitn.it

12dimensions: Nchar, Nphon, Nsyl, Kffrq, Kfcat, Brownfrq Tlfrq, Conc, Fam Imag, aoa

slide-33
SLIDE 33 Sep 16, 2014

many different correlation sets:

  • MRC (mairesse et al 2007)
  • LIWC (mairesse et al 2007)
  • lang.indep (mairesse et al 2007)
  • LIWC (golbek et al 2011)
  • n-grams (iacobelli et al 2011)
  • n-grams (from unlbeleld text)

Fabio Celli

fabio.celli@unitn.it

60+ dimensions Posemo, negemo, Anx, anger, sad, Cogmech, insight, cause, Certain, incl, excl See, hear, feel, Bio, body, health, sex, Space, time, work Achieve, leisure, home, Money, relig, death … ...

slide-34
SLIDE 34 Sep 16, 2014

many different correlation sets:

  • MRC (mairesse et al 2007)
  • LIWC (mairesse et al 2007)
  • lang.indep (mairesse et al 2007)
  • LIWC (golbek et al 2011)
  • n-grams (iacobelli et al 2011)
  • n-grams (from unlbeleld text)

Fabio Celli

fabio.celli@unitn.it
slide-35
SLIDE 35 Sep 16, 2014

many different correlation sets:

  • MRC (mairesse et al 2007)
  • LIWC (mairesse et al 2007)
  • lang.indep (mairesse et al 2007)
  • LIWC (golbek et al 2011)
  • n-grams (iacobelli et al 2011)
  • n-grams (from unlbeleld text)

Fabio Celli

fabio.celli@unitn.it
slide-36
SLIDE 36 Sep 16, 2014

Evaluation Since each personality trait is bipolar, we considered: true positives = correct predictions for both false positives = wrong predictions for both resullts

Fabio Celli

fabio.celli@unitn.it
slide-37
SLIDE 37 Sep 16, 2014

= Results dataset parameters avg F1 persfb essays persfb essays rand baseline (2c) rand baseline (2c) All features (2c) All features (2c) .608 .655 .686 .686 Evaluation Since each personality trait is bipolar, we considered: true positives = correct predictions for both false positives = wrong predictions for both resullts

Fabio Celli

fabio.celli@unitn.it
slide-38
SLIDE 38 Sep 16, 2014 Applications of Unsupervised personality recognition from text

Fabio Celli

fabio.celli@unitn.it
slide-39
SLIDE 39

emotional stability in Twitter Conversations

[Celli & Rossi 2012] Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-40
SLIDE 40 collected > 200.000 posts and > 13.000
  • Authors. automatically
annotated withPersonality (Secure / Neurotic) + added new correlations extracted from Twitter from recent literature [Quercia et Al. 2011]

emotional stability in Twitter Conversations

[Celli & Rossi 2012] Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-41
SLIDE 41 collected > 200.000 posts and > 13.000
  • Authors. automatically
annotated withPersonality (Secure / Neurotic) + added new correlations extracted from Twitter from recent literature [Quercia et Al. 2011]

validation: comparison against analyzewords.com and essays

AnalyzeWords helps reveal your personality by looking at how you use words in Twitter. It is based on good scientific research [Pennebaker et al 2001] connecting word use to who people are.

emotional stability in Twitter Conversations

[Celli & Rossi 2012] Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-42
SLIDE 42

Fabio Celli

fabio.celli@unitn.it collected > 200.000 posts and > 13.000
  • Authors. automatically
annotated withPersonality (Secure / Neurotic) + added new correlations extracted from Twitter from recent literature [Quercia et Al. 2011]

emotional stability in Twitter Conversations

[Celli & Rossi 2012]

Results Secure users tend to build mutual connections while having conversations. Neurotic users instead tend to build longer chains and have conversations with distant people

Sep 16, 2014
slide-43
SLIDE 43

emotional stability in Twitter Conversations

[Celli & Rossi 2012] Sep 16, 2014

We are collecting the personality of Twitter users with 2 apps: http://personality.altervista.org/personalitwit.php (under dev) http://personality.altervista.org/mypersonality/en/mypersonality.php

Fabio Celli

fabio.celli@unitn.it
slide-44
SLIDE 44

Analysis of Ego-Networks in Facebook

[Celli & Polonio 2013] Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-45
SLIDE 45

Analysis of Ego-Networks in Facebook

[Celli & Polonio 2013] collected > 5.000 posts and > 100 authors from one access user, automatically annotated with Personality types data Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-46
SLIDE 46

Analysis of Ego-Networks in Facebook

[Celli & Polonio 2013] collected > 5.000 posts and > 100 authors from one access user, automatically annotated with Personality types data test set: 23 students took Big5 test and fb + off data Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-47
SLIDE 47

Paper and pen 100-items big5 test

Fabio Celli

fabio.celli@unitn.it

Analysis of Ego-Networks in Facebook

[Celli & Polonio 2013] collected > 5.000 posts and > 100 authors from one access user, automatically annotated with Personality types data test set: 23 students took Big5 test and fb + off data Sep 16, 2014
slide-48
SLIDE 48 Open minded and introvert users have the highest Edge weight (interaction strength)

Analysis of Ego-Networks in Facebook

[Celli & Polonio 2013] collected > 5.000 posts and > 100 authors from one access user, automatically annotated with Personality types data test set: 23 students took Big5 test and fb + off data Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-49
SLIDE 49

Analysis of Ego-Networks in Facebook

[Celli & Polonio 2013] collected > 5.000 posts and > 100 authors from one access user, automatically annotated with Personality types data test set: 23 students took Big5 test and fb + off data Uncooperative users have the highest clustering coefficient nodes that tend to participate to conversations Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
slide-50
SLIDE 50 Sep 16, 2014

Deception Detection Via Personality

[Fornaciari et. al. 2013]

Fabio Celli

fabio.celli@unitn.it
slide-51
SLIDE 51 Sep 16, 2014

Deception Detection Via Personality

[Fornaciari et. al. 2013]

Task: predict deceptions using personality traits as features Can we detect liars exploiting personality?

Data: DeCour, 35 defendants from 4 hearings guity for calumny and false testimony in 4 different Italian courts Language: Italian

DeCour

Fabio Celli

fabio.celli@unitn.it
slide-52
SLIDE 52 Sep 16, 2014

Deception Detection Via Personality

[Fornaciari et. al. 2013] averaged over the 5 traits

Task: predict deceptions using personality traits as features Can we detect liars exploiting personality?

Data: DeCour, 35 defendants from 4 hearings guity for calumny and false testimony in 4 different Italian courts Language: Italian

DeCour

Fabio Celli

fabio.celli@unitn.it
slide-53
SLIDE 53 Sep 16, 2014

Deception Detection Via Personality

[Fornaciari et. al. 2013] averaged over the 5 traits

Task: predict deceptions using personality traits as features Can we detect liars exploiting personality?

Data: DeCour, 35 defendants from 4 hearings guity for calumny and false testimony in 4 different Italian courts Language: Italian

DeCour

Fabio Celli

fabio.celli@unitn.it
slide-54
SLIDE 54 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Labeled training Labeled test Unsup./semisup. Personality recognition is useful in those domains where it is difficult to retrieve labeled data Summing up:

slide-55
SLIDE 55 Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it

Labeled training Labeled test Unsup./semisup. Personality recognition is useful in those domains where it is difficult to retrieve labeled data It is domain adaptive model source domain target domain

=

Summing up:

slide-56
SLIDE 56

in conclusion:

Sep 16, 2014

Fabio Celli

fabio.celli@unitn.it
  • unsupervised:

adaptability, applicability in extreme conditions

  • supervised:

domain dependent, high performance

slide-57
SLIDE 57 Sep 16, 2014