A Symbolic Summarizer Fabrizio Gotti, Guy Lapalme Universit de - - PowerPoint PPT Presentation

a symbolic summarizer
SMART_READER_LITE
LIVE PREVIEW

A Symbolic Summarizer Fabrizio Gotti, Guy Lapalme Universit de - - PowerPoint PPT Presentation

GOFAI SUM A Symbolic Summarizer Fabrizio Gotti, Guy Lapalme Universit de Montral Luka Nerima, ric Wehrli Universit de Genve Originality of our approach Symbolic approach - Syntactic parser that produces an XML file - Tree


slide-1
SLIDE 1

GOFAISUM A Symbolic Summarizer

Fabrizio Gotti, Guy Lapalme Université de Montréal Luka Nerima, Éric Wehrli Université de Genève

slide-2
SLIDE 2

Originality of our approach

  • Symbolic approach
  • Syntactic parser that produces an XML file
  • Tree transformations using only XSLT rules (700 lines)

no Java, no C++, no Perl, no Python...

  • No outside language information
  • no gazeeter, no Wordnet
  • ROUGE only used for evaluation not within the system

itself

2

slide-3
SLIDE 3

TP VP PONC DP The NP U.S. government CP AdvP said today TP that VP should DP NP the PAR European Union (EU) AdvP AdvP DP accept Turkey PP DP as DP its NP member AP new . in the future AdvP: Adverbial phrase AP: Adjectival phrase CP: Complementizer phrase DP: Determiner phrase NP: Noun phrase PAR: Parenthetical phrase PONC: Ponctuation PP: Prepositional phrase TP: Tense phrase VP: Verb phrase

FIPS output for

The U.S. government said today that the European Union (EU) should accept Turkey as its new member in the future.

slide-4
SLIDE 4

TP VP PONC DP The NP U.S. government CP AdvP said today TP that VP should DP NP the PAR European Union (EU) AdvP AdvP DP accept Turkey PP DP as DP its NP member AP new . in the future AdvP: Adverbial phrase AP: Adjectival phrase CP: Complementizer phrase DP: Determiner phrase NP: Noun phrase PAR: Parenthetical phrase PONC: Ponctuation PP: Prepositional phrase TP: Tense phrase VP: Verb phrase

FIPS output for

The U.S. government said today that the European Union (EU) should accept Turkey as its new member in the future.

Full parse of sentences 71% for docs 93% for topics

slide-5
SLIDE 5

Topic News cluster Topic Preprocessing Cluster Preprocessing

FIPS

Sentence Scoring Sentence Selection Sentence Post-Processing Summary

slide-6
SLIDE 6

Topic News cluster Topic Preprocessing Cluster Preprocessing

FIPS

Sentence Scoring Sentence Selection Sentence Post-Processing Summary

Minutes per cluster (25 articles)

0.1 4.0 4.0 0.1 Total: 8.2

slide-7
SLIDE 7

Sentence scoring

  • Word-based tf⋅idf similarity score (15%)
  • Lemma-based tf⋅idf similarity score (50%)
  • Lemma-based tf⋅idf similarity score with node depth (5%)
  • Sentence weight (20%)
  • Absolute sentence position (10%)

5

slide-8
SLIDE 8

Sentence selection

  • Keep sentences with the highest scores
  • Sentences are dismissed (regardless of score) if
  • they cannot be parsed by FIPS (29%)
  • duplicate from different documents (4%)
  • without a verb (5%)
  • with the « I » pronoun (3%)
  • ending with « : » or « ? » (2%)
  • with all upper case words or with less than 5 words (4%)

6

slide-9
SLIDE 9

Sentence post-processing

  • Referential clarity
  • some pronouns are removed

Climate is changing, he said ⇒ Climate is changing

  • ambiguous temporal references are fixed
  • Reference to the present day ⇒ date of document
  • Day of the week ⇒ month and year of document
  • No repetition of a date within a summary
  • Sentence compression by pruning non-essential

subtrees (e.g. parenthetical expressions)

7

slide-10
SLIDE 10

Results

  • Content (11th)
  • Linguistic quality (5th)
  • Bad non-redundancy (23rd)
  • Pyramid: 8th over 11

8

slide-11
SLIDE 11

DUC 2007 Average Scores

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 A B C D E F G H I J Content & Linguistic Quality Scores 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Rouge & Pyramid Scores

  • Avg. Content
  • Avg. Linguistic Quality

Basic Elements

  • Avg. Pyramid

RALI

slide-12
SLIDE 12

Possible Improvements

  • Parsing : dedicated lexicons
  • Anaphora resolution with pronoun resolutions
  • Reduce redundancy with internal tf⋅idf
  • Better pruning of subordinate clauses,

adjectival and adverbial modifiers

10

slide-13
SLIDE 13

Possible Improvements

  • Parsing : dedicated lexicons
  • Anaphora resolution with pronoun resolutions
  • Reduce redundancy with internal tf⋅idf
  • Better pruning of subordinate clauses,

adjectival and adverbial modifiers

10

Combine with Wordnet, Gazetteers, etc

slide-14
SLIDE 14

Conclusion

  • Simple and powerful
  • Back to the roots of AI
  • Modern tools and reliable syntactic parsers
  • pen new possibilities for principled

summarization

11