Probabilistic Content Models Marc Schulder Saarland University - - PowerPoint PPT Presentation

probabilistic content models
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Content Models Marc Schulder Saarland University - - PowerPoint PPT Presentation

Probabilistic Content Models Marc Schulder Saarland University presenting Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization Barzilay & Lee (2004) Probabilistic Content Models Aim Model


slide-1
SLIDE 1

Probabilistic Content Models

Marc Schulder Saarland University presenting

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

Barzilay & Lee (2004)

slide-2
SLIDE 2

Probabilistic Content Models

Aim Model topical structures of a text Means Hidden Markov Models Language Bigrams Clustering Tasks Sentence Ordering Extractive Summarization

2

slide-3
SLIDE 3

Reminder: Hidden Markov Model

y1 y2 y3 x1 x2 x3

States Observations Transition Emission

3

slide-4
SLIDE 4

Reminder: Hidden Markov Model

N V N romanes eunt domus

States Observations Transition Emission

4

slide-5
SLIDE 5

Reminder: Hidden Markov Model

N V N romanes eunt domus $

Transition Emission

5

slide-6
SLIDE 6

Reminder: Hidden Markov Model

N V N romanes eunt domus

P(V|N) P(romanes|N) P(N|V) P(eunt|V) P(domus|N) P(N|$)

$

5

slide-7
SLIDE 7

Reminder: Hidden Markov Model

N V N romanes eunt domus

P(V|N) P(romanes|N) P(N|V) P(eunt|V) P(domus|N) P(N|$)

$

* * * * *

7

= P(romanes eunt domus|$NVN)

slide-8
SLIDE 8

HMM as Content Model

y1 y2 y3 x1 x2 x3

State Observation Transition Emission

8

slide-9
SLIDE 9

HMM as Content Model

t1 t2 t3 s1 s2 s3

Topic Sentence Transition Emission

10

slide-10
SLIDE 10

HMM as Content Model

t1 t2 t3 s1 s2 s3

Topic Sentence Transition Emission

10

slide-11
SLIDE 11

Sentences as Bigram Word Sequences

P(romanes eunt domus) = P(romanes|$) * P(eunt|romanes) * P(domus|eunt)

romanes eunt domus romanes eunt domus

11

slide-12
SLIDE 12

HMM as Content Model

t1 t2 t3 s1 s2 s3

Topic Sentence

12

Transition Emission

slide-13
SLIDE 13

Topics as Sentence Clusters

13

Topic is defined by its content Group together similar sentences to form topics But What does "similar" mean? Here Using the same words

slide-14
SLIDE 14

Topics as Sentence Clusters

14

Step 1 Make text generic Replace proper names, numbers and dates with placeholders

The U.S. Geological Survey said the June earthquake was centered 121 miles west-northwest of Bengkulu on Sumatra island, at a depth of 14 miles.

slide-15
SLIDE 15

Topics as Sentence Clusters

15

Step 1 Make text generic Replace proper names, numbers and dates with placeholders

The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

slide-16
SLIDE 16

The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Topics as Sentence Clusters

16

Step 2 Group similar texts together Sentence similarity = Cosine of Bigram Vectors

NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami.

slide-17
SLIDE 17

The NAME seismological institute said the temblor’s epicenter was located NUM kilometers (NUM miles) south of the capital. The temblor was centered NUM kilometers (NUM miles) northwest of the provincial capital of NAME, about NUM kilometers (NUM miles) southwest of NAME, a bureau seismologist said. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Topics as Sentence Clusters

17

Seismologists in NAME’s NAME said the temblor’s epicenter was about NUM kilometers (NUM miles) north of the provincial capital NAME. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. It was initially reported as a NUM magnitude but quickly downgraded.

slide-18
SLIDE 18

The NAME seismological institute said the temblor’s epicenter was located NUM kilometers (NUM miles) south of the capital. The temblor was centered NUM kilometers (NUM miles) northwest of the provincial capital of NAME, about NUM kilometers (NUM miles) southwest of NAME, a bureau seismologist said. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Topics as Sentence Clusters

17

Seismologists in NAME’s NAME said the temblor’s epicenter was about NUM kilometers (NUM miles) north of the provincial capital NAME. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. It was initially reported as a NUM magnitude but quickly downgraded.

slide-19
SLIDE 19

The NAME seismological institute said the temblor’s epicenter was located NUM kilometers (NUM miles) south of the capital. The temblor was centered NUM kilometers (NUM miles) northwest of the provincial capital of NAME, about NUM kilometers (NUM miles) southwest of NAME, a bureau seismologist said. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Topics as Sentence Clusters

17

Seismologists in NAME’s NAME said the temblor’s epicenter was about NUM kilometers (NUM miles) north of the provincial capital NAME. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. It was initially reported as a NUM magnitude but quickly downgraded.

slide-20
SLIDE 20

The NAME seismological institute said the temblor’s epicenter was located NUM kilometers (NUM miles) south of the capital. The temblor was centered NUM kilometers (NUM miles) northwest of the provincial capital of NAME, about NUM kilometers (NUM miles) southwest of NAME, a bureau seismologist said. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Topics as Sentence Clusters

17

Seismologists in NAME’s NAME said the temblor’s epicenter was about NUM kilometers (NUM miles) north of the provincial capital NAME. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. It was initially reported as a NUM magnitude but quickly downgraded.

Location Information

slide-21
SLIDE 21

The NAME seismological institute said the temblor’s epicenter was located NUM kilometers (NUM miles) south of the capital. The temblor was centered NUM kilometers (NUM miles) northwest of the provincial capital of NAME, about NUM kilometers (NUM miles) southwest of NAME, a bureau seismologist said. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Topics as Sentence Clusters

17

Seismologists in NAME’s NAME said the temblor’s epicenter was about NUM kilometers (NUM miles) north of the provincial capital NAME. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. It was initially reported as a NUM magnitude but quickly downgraded.

Location Information

slide-22
SLIDE 22

The NAME seismological institute said the temblor’s epicenter was located NUM kilometers (NUM miles) south of the capital. The temblor was centered NUM kilometers (NUM miles) northwest of the provincial capital of NAME, about NUM kilometers (NUM miles) southwest of NAME, a bureau seismologist said. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Topics as Sentence Clusters

17

Seismologists in NAME’s NAME said the temblor’s epicenter was about NUM kilometers (NUM miles) north of the provincial capital NAME. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. It was initially reported as a NUM magnitude but quickly downgraded.

Location Information Etcetera

slide-23
SLIDE 23

Topics as Sentence Clusters

18

Step 3 Viterbi re-estimation

  • 1. Compute probabilities, based on intial topic clusters
  • 2. Let HMM predict topics of sentences
  • 3. Put sentence in predicted topic cluster
  • 4. Rinse, repeat
slide-24
SLIDE 24

HMM as Content Model

19

t1 t2 t3 s1 s2 s3

Topic Sentence Transition Emission

13

slide-25
SLIDE 25

HMM as Content Model

20

t1 t2 t3

Topic Sentence Transition Emission

13

slide-26
SLIDE 26

HMM as Content Model

21

Topic Sentence Transition Emission

13

slide-27
SLIDE 27

Evaluation

slide-28
SLIDE 28

Evaluation 1 Information Ordering

slide-29
SLIDE 29

Evaluation 1 Information Ordering

5 Domains

  • Earthquakes
  • Clashes between armies and rebel groups
  • Drug-related criminal offenses
  • Financial reports
  • Aviation accidents

24

slide-30
SLIDE 30

Evaluation 1 Information Ordering

25

Domain Average Length Standard Deviation Vocabulary Token/Type Earthquakes Clashes Drugs Finance Accidents 10.4 5.2 1182 13.2 14.0 2.6 1302 4.5 10.3 7.5 1566 4.1 13.7 1.6 1378 12.8 11.5 6.3 2003 5.6

slide-31
SLIDE 31

Evaluation 1 Information Ordering

26

A powerful earthquake with a magnitude of 6.4 on Saturday struck off Mentawai islands in western Indonesia, causing panic but officials said there were no reports of damages or casualties. The U.S. Geological Survey said the earthquake was centered 121 miles west-northwest of Bengkulu on Sumatra island, at a depth of 14 miles. Hardimansyah Maitam, a local maritime patrol officer, said residents in Sikakap town on North Pagai, an island in the Mentawai chain, poured into the streets and ran to higher ground as the quake struck.

slide-32
SLIDE 32

Evaluation 1 Information Ordering

27

A powerful earthquake with a magnitude of 6.4 on Saturday struck off Mentawai islands in western Indonesia, causing panic but officials said there were no reports of damages or casualties. The U.S. Geological Survey said the earthquake was centered 121 miles west-northwest of Bengkulu on Sumatra island, at a depth of 14 miles. Hardimansyah Maitam, a local maritime patrol officer, said residents in Sikakap town on North Pagai, an island in the Mentawai chain, poured into the streets and ran to higher ground as the quake struck.

slide-33
SLIDE 33

Evaluation 1 Information Ordering

28

A powerful earthquake with a magnitude of 6.4 on Saturday struck off Mentawai islands in western Indonesia, causing panic but officials said there were no reports of damages or casualties. The U.S. Geological Survey said the earthquake was centered 121 miles west-northwest of Bengkulu on Sumatra island, at a depth of 14 miles. Hardimansyah Maitam, a local maritime patrol officer, said residents in Sikakap town on North Pagai, an island in the Mentawai chain, poured into the streets and ran to higher ground as the quake struck.

slide-34
SLIDE 34

Evaluation 1 Information Ordering

29

Hardimansyah Maitam, a local maritime patrol officer, said residents in Sikakap town on North Pagai, an island in the Mentawai chain, poured into the streets and ran to higher ground as the quake struck. A powerful earthquake with a magnitude of 6.4 on Saturday struck off Mentawai islands in western Indonesia, causing panic but officials said there were no reports of damages or casualties. The U.S. Geological Survey said the earthquake was centered 121 miles west-northwest of Bengkulu on Sumatra island, at a depth of 14 miles.

3 sentences 6 combinations 3! = 3*2*1 = 6 4! = 4*3*2*1 = 24

slide-35
SLIDE 35

Evaluation 1 Information Ordering

Domain Average Length Earthquakes Clashes Drugs Finance Accidents 10.4 14.0 10.3 13.7 11.5

29

Hardimansyah Maitam, a local maritime patrol officer, said residents in Sikakap town on North Pagai, an island in the Mentawai chain, poured into the streets and ran to higher ground as the quake struck. A powerful earthquake with a magnitude of 6.4 on Saturday struck off Mentawai islands in western Indonesia, causing panic but officials said there were no reports of damages or casualties. The U.S. Geological Survey said the earthquake was centered 121 miles west-northwest of Bengkulu on Sumatra island, at a depth of 14 miles.

3 sentences 6 combinations 3! = 3*2*1 = 6 4! = 4*3*2*1 = 24 10! = 10*9*...*1 = 3,628,800 14! = 14*13*...*1 = 87,178,291,200

slide-36
SLIDE 36

Evaluation 1 Information Ordering

Task

  • 1. Create all possible sentence orderings
  • 2. Assign probability for each ordering
  • 3. Sort by probability

Metric Original Sentence Order Rank Position of original sentence in sorted list Baseline Word Bigram Model

30

slide-37
SLIDE 37

Evaluation 1 Information Ordering

31

Domain System Rank Earthquakes Earthquakes Clashes Clashes Drugs Drugs Finance Finance Accidents Accidents Content 2.67 Bigram 485.16 Content 3.05 Bigram 635.15 Content 15.38 Bigram 712.03 Content 0.05 Bigram 7.44 Content 10.96 Bigram 973.75

slide-38
SLIDE 38

Evaluation 1 Information Ordering

State of the Art Lapata (2003) Pairwise sentence-ordering Metric OSO Prediction Rate Percentage of sortings that ranked OSO highest

32

slide-39
SLIDE 39

Evaluation 1 Information Ordering

33

Domain System Rank Prediction rate Earthquakes Clashes Drugs Finance Accidents Content 2.67 72% Bigram 485.16 4% Lapata

  • 24%

Content 3.05 48% Bigram 635.15 12% Lapata

  • 27%

Content 15.38 38% Bigram 712.03 11% Lapata

  • 27%

Content 0.05 96% Bigram 7.44 66% Lapata

  • 18%

Content 10.96 41% Bigram 973.75 2% Lapata

  • 10%
slide-40
SLIDE 40

Evaluation 2 Summarization

slide-41
SLIDE 41

Evaluation 2 Summarization

Task Shorten text to length of gold summary Data Domain: Earthquakes Summaries by AP journalists 60 Documents 900 Sentences 50% Training 50% Testing

  • Avg. Document: 15 sentences
  • Avg. Summary: 6 sentences

35

slide-42
SLIDE 42

Evaluation 2 Summarization

Baseline Pick first L sentences Sentence classifier Kupiec et al (1999) Looks at words and where they are in sentence, but not at connections between sentences

36

slide-43
SLIDE 43

Evaluation 2 Summarization

Content Model

  • 1. Assign topics to all sentences in documents
  • a. Count in how many documents topic appears
  • 2. Assign topics to all sentences in summaries
  • a. Count in how many summaries topic appears

37

slide-44
SLIDE 44

NAME, a local maritime patrol officer, said residents in NAME town on NAME, an island in the NAME chain, poured into the streets and ran to higher ground as the quake struck. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Evaluation 2 Summarization

38

DATE's quake happened NUM days after a magnitude NUM tremor killed at least NUM people and damaged more than NUM houses and buildings in NAME province. A powerful earthquake with a magnitude of NUM on DATE struck off NAME islands in western NAME, causing panic but officials said there were no reports of damages or casualties. A magnitude NUM earthquake off NAME in DATE triggered a tsunami, killing NUM people in NUM countries.

Document

NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles. A powerful earthquake with a magnitude of NUM on DATE struck off NAME islands in western NAME, causing panic but officials said there were no reports of damages or casualties.

Summary

Topic Doc Sumry Summary Location Details Previous 1 1 1 1 1 1 1

slide-45
SLIDE 45

NAME, a local maritime patrol officer, said residents in NAME town on NAME, an island in the NAME chain, poured into the streets and ran to higher ground as the quake struck. NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles.

Evaluation 2 Summarization

38

DATE's quake happened NUM days after a magnitude NUM tremor killed at least NUM people and damaged more than NUM houses and buildings in NAME province. A powerful earthquake with a magnitude of NUM on DATE struck off NAME islands in western NAME, causing panic but officials said there were no reports of damages or casualties. A magnitude NUM earthquake off NAME in DATE triggered a tsunami, killing NUM people in NUM countries.

P r e v i

  • u

s e v e n t P r e v i

  • u

s e v e n t Location Information Event Summary Event Details E v e n t D e t a i l s

Document

NAME of NAME's NAME said the quake which was felt in some cities on NAME did not have the potential to trigger a tsunami. The NAME said the DATE earthquake was centered NUM miles west-northwest of NAME on NAME island, at a depth of NUM miles. A powerful earthquake with a magnitude of NUM on DATE struck off NAME islands in western NAME, causing panic but officials said there were no reports of damages or casualties.

Location Information Event Summary

Summary

E v e n t D e t a i l s

Topic Doc Sumry Summary Location Details Previous 1 1 1 1 1 1 1

slide-46
SLIDE 46

Evaluation 2 Summarization

Content Model

  • 1. Assign topics to all sentences in documents
  • a. Count in how many documents topic appears
  • 2. Assign topics to all sentences in summaries
  • a. Count in how many summaries topic appears
  • 3. Probability = Summary count / Doc count
  • 4. Choose sentences whose topic has high

probability of appearing in summaries

39

slide-47
SLIDE 47

Evaluation 2 Summarization

40

System Accuracy Baseline Sentence classifier Content model 69% 76% 88%

slide-48
SLIDE 48

Evaluation 3 Relation between Tasks

41

0% 25% 50% 75% 100% 10 20 40 60 64 80

Ordering Summarization

Number of topic clusters

slide-49
SLIDE 49

Conclusion

42

Probabilistic Content Model Hidden Markov Model Clustering Applications Sentence Ordering Extractive Summarization

slide-50
SLIDE 50

Questions?

slide-51
SLIDE 51

Sources

44

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

  • Barzilay & Lee (2004)

Probabilistic text structuring: Experiments with sentence ordering

  • Lapata (2003)

A trainable document summarizer

  • Kupiec et al (1999)