probabilistic content models
play

Probabilistic Content Models Marc Schulder Saarland University - PowerPoint PPT Presentation

Probabilistic Content Models Marc Schulder Saarland University presenting Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization Barzilay & Lee (2004) Probabilistic Content Models Aim Model


  1. Probabilistic Content Models Marc Schulder Saarland University presenting Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization Barzilay & Lee (2004)

  2. Probabilistic Content Models Aim Model topical structures of a text Means Hidden Markov Models Language Bigrams Clustering Tasks Sentence Ordering Extractive Summarization 2

  3. Reminder: Hidden Markov Model Transition States y 1 y 2 y 3 Emission x 1 x 2 x 3 Observations 3

  4. Reminder: Hidden Markov Model Transition States N V N Emission romanes eunt domus Observations 4

  5. Reminder: Hidden Markov Model Transition $ N V N Emission romanes eunt domus 5

  6. Reminder: Hidden Markov Model P(V|N) P(N|V) P(N|$) $ N V N P(romanes|N) P(domus|N) romanes eunt domus P(eunt|V) 5

  7. Reminder: Hidden Markov Model $ N V N romanes eunt domus P(N|$) P(romanes|N) P(V|N) P(eunt|V) P(N|V) P(domus|N) * * * * * = P(romanes eunt domus|$NVN) 7

  8. HMM as Content Model Transition State y 1 y 2 y 3 Emission x 1 x 2 x 3 Observation 8

  9. HMM as Content Model Transition Topic t 1 t 2 t 3 Emission s 1 s 2 s 3 Sentence 10

  10. HMM as Content Model Transition Topic t 1 t 2 t 3 Emission s 1 s 2 s 3 Sentence 10

  11. Sentences as Bigram Word Sequences romanes eunt domus romanes eunt domus P(romanes eunt domus) = P(romanes|$) * P(eunt|romanes) * P(domus|eunt) 11

  12. HMM as Content Model Transition Topic t 1 t 2 t 3 Emission s 1 s 2 s 3 Sentence 12

  13. Topics as Sentence Clusters Group together Topic is similar sentences to defined by its form topics content But What does "similar" mean? Here Using the same words 13

  14. Topics as Sentence Clusters Step 1 Make text generic The U.S. Geological Survey said the June earthquake was centered 121 miles west-northwest of Bengkulu on Sumatra island, at a depth of 14 miles. Replace proper names, numbers and dates with placeholders 14

  15. Topics as Sentence Clusters Step 1 Make text generic The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles. Replace proper names, numbers and dates with placeholders 15

  16. Topics as Sentence Clusters Step 2 Group similar texts together The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. Sentence similarity = Cosine of Bigram Vectors 16

  17. Topics as Sentence Clusters The NAME seismological institute said Seismologists in NAME’s NAME said the the temblor’s epicenter was located NUM temblor’s epicenter was about NUM kilometers (NUM miles) south of the kilometers (NUM miles) north of the capital. provincial capital NAME. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. The temblor was centered NUM kilometers (NUM miles) northwest of the provincial capital of NAME, about NUM kilometers It was initially reported as a NUM (NUM miles) southwest of NAME, a bureau magnitude but quickly downgraded. seismologist said. 17

  18. Topics as Sentence Clusters The NAME seismological institute said Seismologists in NAME’s NAME said the the temblor’s epicenter was located NUM temblor’s epicenter was about NUM kilometers (NUM miles) south of the kilometers (NUM miles) north of the capital. provincial capital NAME. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. The temblor was centered NUM kilometers (NUM miles) northwest of the provincial capital of NAME, about NUM kilometers It was initially reported as a NUM (NUM miles) southwest of NAME, a bureau magnitude but quickly downgraded. seismologist said. 17

  19. Topics as Sentence Clusters The NAME seismological institute said the temblor’s epicenter was located NUM The NAME said the kilometers (NUM miles) south of the DATE earthquake was centered NUM capital. miles west-northwest of NAME on Seismologists in NAME’s NAME said the The temblor was centered NUM kilometers NAME island, at a depth of NUM miles. temblor’s epicenter was about NUM (NUM miles) northwest of the provincial kilometers (NUM miles) north of the capital of NAME, about NUM kilometers provincial capital NAME. (NUM miles) southwest of NAME, a bureau seismologist said. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. It was initially reported as a NUM magnitude but quickly downgraded. 17

  20. Topics as Sentence Clusters The NAME seismological institute said the temblor’s epicenter was located NUM The NAME said the kilometers (NUM miles) south of the Location DATE earthquake was centered NUM capital. Information miles west-northwest of NAME on Seismologists in NAME’s NAME said the The temblor was centered NUM kilometers NAME island, at a depth of NUM miles. temblor’s epicenter was about NUM (NUM miles) northwest of the provincial kilometers (NUM miles) north of the capital of NAME, about NUM kilometers provincial capital NAME. (NUM miles) southwest of NAME, a bureau seismologist said. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. It was initially reported as a NUM magnitude but quickly downgraded. 17

  21. Topics as Sentence Clusters The NAME seismological institute said the temblor’s epicenter was located NUM The NAME said the kilometers (NUM miles) south of the Location DATE earthquake was centered NUM capital. Information miles west-northwest of NAME on Seismologists in NAME’s NAME said the The temblor was centered NUM kilometers NAME island, at a depth of NUM miles. temblor’s epicenter was about NUM (NUM miles) northwest of the provincial kilometers (NUM miles) north of the capital of NAME, about NUM kilometers provincial capital NAME. (NUM miles) southwest of NAME, a bureau seismologist said. NAME of NAME's NAME said the quake It was initially reported as a NUM which was felt in some cities on NAME did not magnitude but quickly downgraded. have the potential to trigger a tsunami. 17

  22. Topics as Sentence Clusters The NAME seismological institute said the temblor’s epicenter was located NUM The NAME said the kilometers (NUM miles) south of the Location DATE earthquake was centered NUM capital. Information miles west-northwest of NAME on Seismologists in NAME’s NAME said the The temblor was centered NUM kilometers NAME island, at a depth of NUM miles. temblor’s epicenter was about NUM (NUM miles) northwest of the provincial kilometers (NUM miles) north of the capital of NAME, about NUM kilometers provincial capital NAME. (NUM miles) southwest of NAME, a bureau seismologist said. Etcetera NAME of NAME's NAME said the quake It was initially reported as a NUM which was felt in some cities on NAME did not magnitude but quickly downgraded. have the potential to trigger a tsunami. 17

  23. Topics as Sentence Clusters Step 3 Viterbi re-estimation 1. Compute probabilities, based on intial topic clusters 2. Let HMM predict topics of sentences 3. Put sentence in predicted topic cluster 4. Rinse, repeat 18

  24. HMM as Content Model Transition Topic t 1 t 2 t 3 Emission s 1 s 2 s 3 Sentence 19 13

  25. HMM as Content Model Transition Topic t 1 t 2 t 3 Emission Sentence 20 13

  26. HMM as Content Model Transition Topic Emission Sentence 21 13

  27. Evaluation

  28. Evaluation 1 Information Ordering

  29. Evaluation 1 Information Ordering 5 Domains • Earthquakes • Clashes between armies and rebel groups • Drug-related criminal offenses • Financial reports • Aviation accidents 24

  30. Evaluation 1 Information Ordering Average Standard Domain Vocabulary Token/Type Length Deviation Earthquakes 10.4 5.2 1182 13.2 Clashes 14.0 2.6 1302 4.5 Drugs 10.3 7.5 1566 4.1 Finance 13.7 1.6 1378 12.8 Accidents 11.5 6.3 2003 5.6 25

  31. Evaluation 1 Information Ordering A powerful earthquake with a magnitude of 6.4 on Saturday struck off Mentawai islands in western Indonesia, causing panic but officials said there were no reports of damages or casualties. The U.S. Geological Survey said the earthquake was centered 121 miles west-northwest of Bengkulu on Sumatra island, at a depth of 14 miles. Hardimansyah Maitam, a local maritime patrol officer, said residents in Sikakap town on North Pagai, an island in the Mentawai chain, poured into the streets and ran to higher ground as the quake struck. 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend