Category Specific Information for Guided Summarization Jun-Ping Ng - - PowerPoint PPT Presentation

category specific information
SMART_READER_LITE
LIVE PREVIEW

Category Specific Information for Guided Summarization Jun-Ping Ng - - PowerPoint PPT Presentation

Exploiting Category Specific Information for Guided Summarization Jun-Ping Ng Praveen Bysani Ziheng Lin Min-Yen Kan Chew-Lim Tan National University of Singapore 1 Monday, November 14, 11 1 Outline System Overview Category


slide-1
SLIDE 1

Category Specific Information

for Guided Summarization Exploiting

National University of Singapore Jun-Ping Ng Praveen Bysani Ziheng Lin Min-Yen Kan Chew-Lim Tan

1

1 Monday, November 14, 11

slide-2
SLIDE 2

Outline

  • System Overview
  • Category Specific Features
  • Evaluation and Discussion

2

2 Monday, November 14, 11

slide-3
SLIDE 3

System Overview

3

3 Monday, November 14, 11

slide-4
SLIDE 4

Hypothesis

  • Word frequency distribution across

different categories should be different

  • Some words are more important in certain

categories

  • e.g. ‘health’ is more salient in “Health and

Safety Issues”

4

4 Monday, November 14, 11

slide-5
SLIDE 5

What are those words?

5

Category Attacks Health Endangered people people years minister food state told years national government new

  • two

health water

5 Monday, November 14, 11

slide-6
SLIDE 6

A Hint of Sentence Saliency

  • Two ways to look at the difference in word

distribution

  • Frequency - Words which are used more

are more important

  • Difference in usage - Words which are

used differently from the “usual” are more important

6

6 Monday, November 14, 11

slide-7
SLIDE 7

Category Specific Information

  • Category Relevance Score
  • Category KL-Divergence

7

7 Monday, November 14, 11

slide-8
SLIDE 8

Category Relevance Score

  • Intuition - A word that appears across many

documents within a topic and category is more useful

  • Linearly weight topic and document

frequency scores

8

8 Monday, November 14, 11

slide-9
SLIDE 9

Category KL-Divergence

  • Intuition - The use of a word varies

according to the category an article is written in.

  • KL-Divergence between frequency of word

across all categories vs specific category

9

9 Monday, November 14, 11

slide-10
SLIDE 10

Generic Features

  • Bigram document frequency
  • Backoff model with unigram and bigram

document frequencies

10

  • Sentence position
  • Sentence length

10 Monday, November 14, 11

slide-11
SLIDE 11

Update Summarization

  • Update summaries generated in similar

fashion

  • But we take into account existing snippets

from Set A

11

Typical MMR Penalise sentences similar to those in Set A

11 Monday, November 14, 11

slide-12
SLIDE 12

Evaluation

  • Against ROUGE-2

12

0.035 0.07 0.105 0.14 Set A Set B ROUGE-2

NUS1 NUS2 Baseline2 Baseline1

12 Monday, November 14, 11

slide-13
SLIDE 13

What is Important?

13

  • 0.003
  • 0.002
  • 0.002
  • 0.001

0.001 0.002 0.002 0.003 Set A Set B ROUGE-2

  • CRS
  • CKLD
  • CRS-CKLD

13 Monday, November 14, 11

slide-14
SLIDE 14

All Features

14

  • 0.05
  • 0.038
  • 0.025
  • 0.013

0.013 Set A Set B ROUGE-2

  • CRS
  • CKLD
  • CRS - CKLD
  • BDFS
  • SL
  • SP

14 Monday, November 14, 11

slide-15
SLIDE 15

Future Work

  • Do better studies to determine influence
  • f category specific information
  • Exploit aspect-level information

15

15 Monday, November 14, 11

slide-16
SLIDE 16

Thank You

  • Word distribution within and outside a

category plays a significant role in sentence selection

  • Category relevance score
  • Category KL-Divergence score

16

16 Monday, November 14, 11