1 LDA
LDA 1 [Credits: Mike Smith, Las Vegas Sun 2013] LDA 2 [Credits: - - PowerPoint PPT Presentation
LDA 1 [Credits: Mike Smith, Las Vegas Sun 2013] LDA 2 [Credits: - - PowerPoint PPT Presentation
LDA 1 [Credits: Mike Smith, Las Vegas Sun 2013] LDA 2 [Credits: IITD Library] 4 5 6 In text, the hidden variables are the thematic structure. What are the topics that describe this collection? How does a new document fit into the topic
LDA 2
[Credits: Mike Smith, Las Vegas Sun 2013]
[Credits: IITD Library]
4
5
6
In text, the hidden variables are the thematic structure. What are the topics that describe this collection? How does a new document fit into the topic structure?
7
8
Credits: [David Blei, KDD12]
9
- Credits: [David Blei, KDD12]
P(topics, proportions, assignments | documents)
10
11
ΞΈ
ππ,π π
π,π
Ξ²
π π½
12
ΞΈ
ππ,π π
π,π
Ξ²
π π½
13
ΞΈ
ππ,π π
π,π
Ξ² Ξ· π½
- ΞΈ
- Ξ²
ΞΈ Ξ± ΞΈ
14
15
[Credits: Wikipedia]
16
17
18
19
ΞΈ
ππ,π π
π,π
Ξ²
π π½
20
Topic 1: PGM (πΈπ) Bayesian: 0.1 Markov: 0.09 Network: 0.07 Inference: 0.07 β¦ Topic 2: ML (πΈπ) Inference: 0.2 Posterior: 0.15 Regression: 0.1 Gradient: 0.09 β¦ Topic 3: AI (πΈπ) Markov: 0.09 Reinforcement: 0.08 Planning: 0.08 β¦ Topic 4: Deep Learning (πΈπ) Backpropagation: 0.15 Convolution: 0.1 LSTM: 0.0.9 Dropout: 0.07 β¦
πΎπ Topic 1: 0.7 Topic 2: 0.1 Topic 3: 0.15 Topic 4: 0.05 ππ,π Topic 1 π
π,π
Markov
21
π½ = 1
22
π½ = 10
23
π½ = 100
24
π½ = 1
25
π½ = 0.1
26
π½ = 0.01
27
π πΎ, π, π¨ π₯) π(πΎ, π, π¨, π₯) Χ
πΎ,π Οπ¨ π(πΎ, π, π¨, π₯)
28
π¦1:π π¨1:π
29
π
30
π(πΎ, π¨)
31
32
33
π(π¨1:π)
34
π ππ(π¨βπ) π¨βπ
35
36 LDA
iPad
TYPE: Launch DATE: Mar 7
Steve Jobs
TYPE: Death DATE: Oct 6
Yelp
TYPE: IPO DATE: March 2
Claim: This is worth investigating
http://statuscalendar.com
- [Prachi] Events shown as
url
40
[Nupur] Model Architecture [Happy] Normalization? [Shantanu, Surag] Error Accumulation [Himanshu, Prachi] Reliance on POS tagger
Since spread of printing press Timebank MUC & ACE competitions
- Limited to narrow domains
- Performance is still not great
Short Easy to write (even on mobile devices) Instantly and widely disseminated Many irrelevant messages Many redundant messages
`2m', `2ma', `2mar', `2mara', `2maro', `2marrow', `2mor', `2mora', `2moro', `2morow', `2morr', `2morro', `2morrow', `2moz', `2mr', `2mro', `2mrrw', `2mrw', `2mw', `tmmrw', `tmo', `tmoro', `tmorrow', `tmoz', `tmr', `tmro', `tmrow', `tmrrow', `tmrrw', `tmrw', `tmrww', `tmw', `tomaro', `tomarow', `tomarro', `tomarrow', `tomm', `tommarow', `tommarrow', `tommoro', `tommorow', `tommorrow', `tommorw', `tommrow', `tomo', `tomolo', `tomoro', `tomorow', `tomorro', `tomorrw', `tomoz', `tomrw', `tomzβ
βThe Hobbit has FINALLY started filming! I cannot wait!β βwatchng american dad.β
- Annotated 2400 tweets (about 34K tokens)
- Train on in-domain data
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Stanford T-NER P R F
- β
Sports Politics Product releases β¦ Allow more customized calendars Could be useful in upstream tasks
Might start talking about different things Might want to focus on different groups of users
Generative Probabilistic Models Discovers types which match the data No need to annotate individual events Donβt need to commit to a specific set of types Modular, can integrate into various applications
Each Event Phrase is modeled as a mixture of types
Each Event phrase is modeled as a mixture of types
Each Event Type is Associated with a Distribution over Entities and Dates P(SPORTS|cheered)= 0.6 P(POLITICS|cheered)= 0.4
[Happy, Arindam, Akshay, Surag, Dinesh R] Liked [Akshay] New entities? [Anshul] Sensitive to parameters
1,000 iterations of burn in Parallelized sampling (approximation) using MPI
[Newman et. al. 2009]
[Happy, Nupur] Disliked manual annotation [Anshul] βLegalβ, βFoodβ not event categories
Using types discovered by the topic model Supervised classification using 10-fold cross validation Treat event phrases like bag of words
[Nupur] Multiple entity events? [Nupur, Anshul] Very simple baseline
What they ate for lunch Entities such as McDonalds would be frequent on most days Only show if entities appear more than expected
π»2 = ΰ·
π¦β π,Β¬π ,π§β{π,Β¬π}
ππ¦,π§ Γ ππ ππ¦,π§ πΉπ¦,π§ ππ,π ππ,Β¬π πΉπ,π
π»2
62
[Happy, Akshay, Shantanu, Nupur, Anshul, Rishab, Dinesh R] Liked [Barun, Shantanu] Same event on multiple days? [Rishab] Why not π2?
[Akshay, Barun] Liked
No Named Entity Recognition Rely on significance test to rank ngrams A few extra heuristics (filter out temporal expressions etcβ¦)
End-to-end Evaluation
65