Who, What, When, Where, and Why? A Computational Approach to - - PowerPoint PPT Presentation

who what when where and why
SMART_READER_LITE
LIVE PREVIEW

Who, What, When, Where, and Why? A Computational Approach to - - PowerPoint PPT Presentation

Who, What, When, Where, and Why? A Computational Approach to Understanding Historical Events Using State Department Cables Allison J.B. Chaney Princeton University Hanna Wallach David M. Blei Microsoft Research Columbia University Matthew


slide-1
SLIDE 1

Who, What, When, Where, and Why?

A Computational Approach to Understanding Historical Events Using State Department Cables

Allison J.B. Chaney

Princeton University

Hanna Wallach

Microsoft Research

David M. Blei

Columbia University

Matthew Connelly

History Lab at Columbia

slide-2
SLIDE 2

– Dean W.R. Matthews, What is an Historical Event?

We can do nothing but scrutinize historical events themselves if we want to discover what they are.

slide-3
SLIDE 3

Who? What? Where? When?}

  • bservable ways

to characterize unobservable events

slide-4
SLIDE 4

Who? What? Where? When? Why?}

  • bservable ways

to characterize unobservable events

slide-5
SLIDE 5

data

  • communications between the

U.S. State Department and its embassies (“cables”)

  • around two million cables
  • sent between 1973 and 1977
slide-6
SLIDE 6

BANGKOK CANBERRA HONG KONG MANILA STATE SEOUL SINGAPORE TOKYO PHNOM PENH SAIGON TAIPEI PEKING VIENTIANE

slide-7
SLIDE 7
slide-8
SLIDE 8

key actors

cables entities events

slide-9
SLIDE 9

representing cables

slide-10
SLIDE 10

representing cables

slide-11
SLIDE 11

representing cables

slide-12
SLIDE 12

representing cables

Latent Dirichlet allocation. Blei, Ng, and Jordan, 2003.

slide-13
SLIDE 13

SAIGON

documents sent

slide-14
SLIDE 14

typical concerns

SAIGON

documents sent

slide-15
SLIDE 15

typical concerns

SAIGON

documents sent …

θ1 θd θ2

θ3

slide-16
SLIDE 16

typical concerns

SAIGON

documents sent

φ0k ∼ Gamma(αφ, µφ/αφ)

θ1 θd θ2

θ3

slide-17
SLIDE 17

typical concerns

SAIGON

documents sent

1977

events

1973

slide-18
SLIDE 18

typical concerns

SAIGON

documents sent

1977

events

1973

slide-19
SLIDE 19

1977 1973

modeling events

slide-20
SLIDE 20

1977 1973

modeling events

WHEN?

✏i ∼ Poisson(⌘✏)

slide-21
SLIDE 21

1977 1973

modeling events

WHAT? WHEN?

✏i ∼ Poisson(⌘✏) πik ∼ Gamma(απ, µπ/απ)

slide-22
SLIDE 22

1977 1973

modeling events

WHAT? WHEN?

✏i ∼ Poisson(⌘✏) πik ∼ Gamma(απ, µπ/απ)

slide-23
SLIDE 23

modeling cables

typical concerns decay of relevancy event description

φjk = φ0k + X

i

f(ai, cj)πi

1977 1973

sum over all events

slide-24
SLIDE 24

modeling cables

θjk ∼ Gamma(αθ, φjk/αθ)

typical concerns decay of relevancy event description

φjk = φ0k + X

i

f(ai, cj)πi

1977 1973

sum over all events

slide-25
SLIDE 25

1973 1977

modeling cables

θjk ∼ Gamma(αθ, φjk/αθ)

typical concerns decay of relevancy event description

φjk = φ0k + X

i

f(ai, cj)πi

sum over all events

? ? ?

slide-26
SLIDE 26

learned parameters

entity typical concerns event occurence event content

exploration

  • bserved data

model assumptions black box variational inference

How do we find the values of the hidden parameters that best fit the data?

Black box variational inference. Ranganath, Gerrish, and Blei, 2014.

slide-27
SLIDE 27

validation

  • compare discovered events to manually collected

examples of known historical events (and corresponding cables)

  • How many of the known events are recovered?
  • How does the average topic distribution of the known

cables compares to the discovered event distribution?

  • present the discovered events (date, topic distribution,

and entities involved) to an expert historian

slide-28
SLIDE 28

exploration: Saigon

φ0

slide-29
SLIDE 29

exploration: Saigon

πi

slide-30
SLIDE 30

exploration: Saigon

πi

slide-31
SLIDE 31
  • topic models can describe documents, but

they cannot identify when events occur

  • we explicitly models event occurrences and

event content

  • our model can be used to identify and

explore events

results summary

slide-32
SLIDE 32
  • Main next step: share events between entities
  • Other areas:
  • include interactions between entities
  • learn event duration
  • explore different event decay shapes
  • thorough model validation

future work

slide-33
SLIDE 33

Thank you!

Questions and suggestions welcome.