category specific information
play

Category Specific Information for Guided Summarization Jun-Ping Ng - PowerPoint PPT Presentation

Exploiting Category Specific Information for Guided Summarization Jun-Ping Ng Praveen Bysani Ziheng Lin Min-Yen Kan Chew-Lim Tan National University of Singapore 1 Monday, November 14, 11 1 Outline System Overview Category


  1. Exploiting Category Specific Information for Guided Summarization Jun-Ping Ng Praveen Bysani Ziheng Lin Min-Yen Kan Chew-Lim Tan National University of Singapore 1 Monday, November 14, 11 1

  2. Outline • System Overview • Category Specific Features • Evaluation and Discussion 2 Monday, November 14, 11 2

  3. System Overview 3 Monday, November 14, 11 3

  4. Hypothesis • Word frequency distribution across different categories should be different • Some words are more important in certain categories • e.g. ‘health’ is more salient in “Health and Safety Issues” 4 Monday, November 14, 11 4

  5. What are those words? Category Attacks Health Endangered people people years minister food state told years national government new --- two health water 5 Monday, November 14, 11 5

  6. A Hint of Sentence Saliency • Two ways to look at the difference in word distribution • Frequency - Words which are used more are more important • Difference in usage - Words which are used differently from the “usual” are more important 6 Monday, November 14, 11 6

  7. Category Specific Information • Category Relevance Score • Category KL-Divergence 7 Monday, November 14, 11 7

  8. Category Relevance Score • Intuition - A word that appears across many documents within a topic and category is more useful • Linearly weight topic and document frequency scores 8 Monday, November 14, 11 8

  9. Category KL-Divergence • Intuition - The use of a word varies according to the category an article is written in. • KL-Divergence between frequency of word across all categories vs specific category 9 Monday, November 14, 11 9

  10. Generic Features • Bigram document frequency • Backoff model with unigram and bigram document frequencies • Sentence position • Sentence length 10 Monday, November 14, 11 10

  11. Update Summarization • Update summaries generated in similar fashion • But we take into account existing snippets from Set A Typical MMR Penalise sentences similar to those in Set A 11 Monday, November 14, 11 11

  12. Evaluation • Against ROUGE-2 NUS1 NUS2 Baseline2 Baseline1 0.14 0.105 ROUGE-2 0.07 0.035 0 Set A Set B 12 Monday, November 14, 11 12

  13. What is Important? - CRS -CKLD -CRS-CKLD 0.003 0.002 0.002 0.001 ROUGE-2 0 -0.001 -0.002 -0.002 -0.003 Set A Set B 13 Monday, November 14, 11 13

  14. All Features - CRS -CKLD -CRS - CKLD -BDFS -SL -SP 0.013 0 ROUGE-2 -0.013 -0.025 -0.038 -0.05 Set A Set B 14 Monday, November 14, 11 14

  15. Future Work • Do better studies to determine influence of category specific information • Exploit aspect-level information 15 Monday, November 14, 11 15

  16. Thank You • Word distribution within and outside a category plays a significant role in sentence selection • Category relevance score • Category KL-Divergence score 16 Monday, November 14, 11 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend