FRIENDSHIPS RIVALRIES TRYSTS Chenhao Tan Dallas Card (CMU), Noah - - PowerPoint PPT Presentation

friendships rivalries trysts
SMART_READER_LITE
LIVE PREVIEW

FRIENDSHIPS RIVALRIES TRYSTS Chenhao Tan Dallas Card (CMU), Noah - - PowerPoint PPT Presentation

FRIENDSHIPS RIVALRIES TRYSTS Chenhao Tan Dallas Card (CMU), Noah Smith (UW) 1 RELATIONS rivals pro-choice pro-life undocumented rivals illegal alien immigrants friends small free market government word machine friends alignment


slide-1
SLIDE 1

FRIENDSHIPS RIVALRIES TRYSTS

Chenhao Tan Dallas Card (CMU), Noah Smith (UW)

1

slide-2
SLIDE 2

RELATIONS

2

pro-choice pro-life rivals undocumented immigrants illegal alien rivals small government free market friends word alignment machine translation friends

Chong and Druckman, 2007; Dawkins 1976; Entman, 1993; Gitlin, 1980; Lakoff, 2014; Milton 1964

slide-3
SLIDE 3

3

undocumented immigrants illegal alien rivals small government free market friends

First quantitative framework to systematically describe relations between ideas Demonstrate effective explorations with this framework on a wide range of datasets

slide-4
SLIDE 4
  • Topics as ideas
  • Keywords as ideas

4

Our focus is on relations between ideas.

We will use standard approaches

  • Topics from latent Dirichlet

allocation (Blei et al. 2003)

  • Keywords (Monroe et al.

2008)

Hall et al. 2008 Culturomics, Michel et al. 2011

slide-5
SLIDE 5

QUANTITATIVELY

  • Given a corpus of documents over time,

each document consists of a set of ideas

5

Cooccurrence Pointwise mutual information [Church and Hanks 1990]

Rarely cooccur undocumented immigrants illegal alien rivals

slide-6
SLIDE 6
  • Given a corpus of documents over time,

each document consists of a set of ideas

– Cooccurrence does not capture which is winning or losing

6

Pearson correlation

QUANTITATIVELY

frequency time undocumented immigrants illegal alien

slide-7
SLIDE 7
  • Given a corpus of documents over time,

each document consists of a set of ideas

7

QUANTITATIVELY

Cooccurrence Prevalence correlation Within- document Across- document &

slide-8
SLIDE 8

RARELY COOCCUR

8

1980 1990 2000 2010

immigrant, undocumented illegal, alien

slide-9
SLIDE 9

9

Rarely cooccur Always cooccur Correlated Anti-correlated Friendship Arms-race Tryst Head-to-head

slide-10
SLIDE 10

10

Rarely cooccur Always cooccur Correlated Anti-correlated Friendship Arms-race Tryst Head-to-head Friendship

slide-11
SLIDE 11

11

1980 1990 2000 2010

immigrant, undocumented

  • bama, president

LIKELY TO COOCCUR

slide-12
SLIDE 12

12

Rarely cooccur Always cooccur Correlated Anti-correlated Friendship Arms-race Tryst Head-to-head

slide-13
SLIDE 13

13

Rarely cooccur Always cooccur Correlated Anti-correlated Friendship Arms-race Tryst Head-to-head Arms-race

slide-14
SLIDE 14

RARELY COOCCUR

14

1980 1990 2000 2010

immigration, deportation republican, party

slide-15
SLIDE 15

15

Rarely cooccur Always cooccur Correlated Anti-correlated Friendship Arms-race Tryst Head-to-head

slide-16
SLIDE 16

16

Rarely cooccur Always cooccur Correlated Anti-correlated Friendship Arms-race Tryst Head-to-head Arms-race Tryst

slide-17
SLIDE 17

LIKELY TO COOCCUR

17

1980 1990 2000 2010

immigration, deportation detainee, detention

slide-18
SLIDE 18

18

Rarely cooccur Always cooccur Correlated Anti-correlated Friendship Arms-race Tryst Head-to-head

We have shown a framework to quantitatively describe relations between ideas. Can we use them to effectively explore relations between ideas?

slide-19
SLIDE 19
  • Newspapers and research articles as datasets

– Immigration – Terrorism – Same-sex marriage – Abortion – Tobacco – ACL – NIPS

19

slide-20
SLIDE 20

20

  • 1.0 -0.5 0.0

0.5 1.0

prevalence correlation

  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4 0.6

cooccurrence

pearsonr = 0.55

Correlated, but many pairs in all four quadrants!

slide-21
SLIDE 21

21

Strength = |PMI| × |correlation| Extreme pairs are the interesting ones!

  • 1.0 -0.5 0.0

0.5 1.0

prevalence correlation

  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4 0.6

cooccurrence

pearsonr = 0.55

slide-22
SLIDE 22
  • Terrorism

– Keywords – Topics

22

slide-23
SLIDE 23
  • Terrorism

– Keywords – Topics

23

slide-24
SLIDE 24

24

1980 1990 2000 2010 0.1 0.2 0.3

frequency

arab islam

slide-25
SLIDE 25
  • Terrorism

– Keywords – Topics

25

slide-26
SLIDE 26

26

The relations between these topics are consistent with structural balance theory: the enemy of an enemy is a friend [Cartwright and Harary, 1956; Heider, 1946]

slide-27
SLIDE 27

27

PMI Correlation Joint

Keywords arab

islam 106 1,494 2

Topics

federal, state afghanistan, taliban 43 99 2 federal, state iran, lybia 36 56 2 Rank among all relations The “interesting” pair is ranked much higher according to our framework.

slide-28
SLIDE 28

28 1980 1990 2000 2010

machine translation sentiment analysis

Arms-race

1980 1990 2000 2010

machine translation discourse (coherence)

Head-to-head

1980 1990 2000 2010

machine translation rule,forest methods

Tryst

1980 1990 2000 2010

machine translation word alignment

Friendship Rarely cooccur Always cooccur Correlated Anti-correlated

slide-29
SLIDE 29

29

https://github.com/nwrush/Visualization

slide-30
SLIDE 30

30

Thank you!

chenhao@chenhaot.com, Twitter: @ChenhaoTan Data & code: https://chenhaot.com/papers/idea-relations.html

cooccurrence prevalence correlation

A quantitative way to describe relations between ideas:

friendships, head-to-head, arms-race, tryst

An effective framework to explore temporal text corpora