Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 - - PowerPoint PPT Presentation

overview of the multiling pilot in tac 2011
SMART_READER_LITE
LIVE PREVIEW

Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 - - PowerPoint PPT Presentation

Introduction MultiLing Pilot The Results Conclusion Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 NCSR Demokritos, Greece ggianna@iit.demokritos.gr November 2011 George Giannakopoulos Overview of the MultiLing Pilot


slide-1
SLIDE 1

Introduction MultiLing Pilot The Results Conclusion

Overview of the MultiLing Pilot in TAC 2011

George Giannakopoulos1

1NCSR Demokritos, Greece

ggianna@iit.demokritos.gr

November 2011

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-2
SLIDE 2

Introduction MultiLing Pilot The Results Conclusion Motivation

Outline

1

Introduction

2

MultiLing Pilot

3

The Results

4

Conclusion

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-3
SLIDE 3

Introduction MultiLing Pilot The Results Conclusion Motivation

Multilinguality

News Blogs Search results Automatic translation

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-4
SLIDE 4

Introduction MultiLing Pilot The Results Conclusion Motivation

Brief history of DUC/TAC domains

Single document summarization Multi-document summarization (Update, Guided, Opinion, ...) Cross-lingual summarization Something appears to be missing...

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-5
SLIDE 5

Introduction MultiLing Pilot The Results Conclusion Motivation

The missing piece: MultiLing

Create summaries regardless of underlying language on document sets that use the same (possibly unknown) language.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-6
SLIDE 6

Introduction MultiLing Pilot The Results Conclusion Motivation

MultiLing aim

Detect multi-document summarization (MMS) research Learn about MMS algorithms Learn about multilingual reusable resources Quantify performance Check existing automatic measures

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-7
SLIDE 7

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Outline

1

Introduction

2

MultiLing Pilot

3

The Results

4

Conclusion

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-8
SLIDE 8

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Task definition

Generate a single, fluent, representative summary from a set of documents describing an event sequence language for document set within a given range

  • utput summary should be (240-)250 words

An event Sequence ...is a set of atomic (self-sufficient) event descriptions, sequenced in time, that share main actors, location of occurence or some

  • ther important factor. Event sequences may refer to topics such

as a natural disaster, a crime investigation, a set of negotiations focused on a single political issue, a sports event.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-9
SLIDE 9

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Dataset

Human created Multi-lingual News Freely available Containing event sequences Plain text Solution WikiNews (http://www.wikinews.org) Translation Preprocessing

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-10
SLIDE 10

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Mini-pilot for effort estimation

Small scale corpus (2 topics) Everything was timed Questions would be noted Lesson Always do a mini-pilot, note everything, do follow-up meetings.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-11
SLIDE 11

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Overview of full corpus creation

Determine topics (10 topics / language) Translate documents (10 docs / topic) Produce model summaries (3 models / topic)

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-12
SLIDE 12

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Determine topics

Use metadata (WikiNews categories) Verify existence of event sequence Cover several different news types (e.g., politics, environment, sports) Find at least 10 documents per topic

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-13
SLIDE 13

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Translate documents

Sentence alignment Keep original meaning Produce readable, fluent text Translation verified Lesson Difficult, error-prone, subjective, high cost process.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-14
SLIDE 14

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Summarizing

3 summarizers per topic and language Keep human subjectivity related to important aspects Use the minimum possible guidelines

Self-sufficient, clearly written text ...providing no external information ...fluent, easily readable language

Lesson Few guidelines are better than a lot.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-15
SLIDE 15

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Types of evaluation

Automatic (ROUGE, AutoSummENG) Manual (Overall Responsiveness)

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-16
SLIDE 16

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Automatic Methods

ROUGE (ROUGE-1, 2, SU-4), word n-gram matching, allows gaps AutoSummENG — Merged Model Graph (MeMoG), character n-gram co-occurence, merged representation Not (too) strongly correlated. Possibly describing slightly different aspects.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-17
SLIDE 17

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Manual Evaluation Guidelines

Read source documents at least once Give a grade between 1 and 5 (Overall Responsiveness: OR) Content and fluency equally important

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-18
SLIDE 18

Introduction MultiLing Pilot The Results Conclusion Task Details Corpus creation Evaluating summaries

Guidelines continued

We consider a text to be worth a 5, if it appears to cover all the important aspects of the corresponding document set using fluent, readable language. A text should be assigned a 1, if it is either unreadable, nonsensical, or contains only trivial information from the document set.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-19
SLIDE 19

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Outline

1

Introduction

2

MultiLing Pilot

3

The Results

4

Conclusion

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-20
SLIDE 20

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Overview

Original aim: 3 groups per language

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-21
SLIDE 21

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Overview

Original aim: 3 groups per language Achieved: 8+1 groups

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-22
SLIDE 22

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Overview

Original aim: 3 groups per language Achieved: 8+1 groups Original aim: 5 languages

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-23
SLIDE 23

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Overview

Original aim: 3 groups per language Achieved: 8+1 groups Original aim: 5 languages Achieved: 7 languages

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-24
SLIDE 24

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Baseline — Topline

global baseline system (ID9) , vector space, bag-of-words, highest cosine similarity to the centroid of documents. global topline system (ID10) uses the model summaries, produces random summaries by combining sentences, find the

  • ne closest to the Merged Model Graph of the

models.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-25
SLIDE 25

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Our champions

Participant System ID Arabic Czech English French Greek Hebrew Hindi Notes CIST ID1

  • Peer

CLASSY ID2

  • Peer

JRC ID3

  • Coorg (Czech)

LIF ID4

  • Coorg (French)

SIEL IIITH ID5

  • Coorg (Hindi)

TALN UPF ID6

  • Peer

UBSummarizer ID7

  • Peer

UoEssex ID8

  • Coorg (Arabic)

Baseline ID9 Centroid baseline for all languages Coorg (All) Topline ID10 Using model summaries for all languages Coorg (All)

Lesson The community will respond if you take the first step.

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-26
SLIDE 26

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Evaluation aims

Allow, but penalize, out-of-limit text sizes Measure per language performance Reward multi-lingual systems

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-27
SLIDE 27

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Length-Aware Grading (LAG)

Given a summary S of length |S| (in words) assigned a grade g, a lower word limit count lmin and an upper word limit count lmax:

LAG(g, S) = g ∗

  • 1 − max(max(lmin−|S|,|S|−lmax),0)

lmin

  • Example

An excellent summary (graded with OR 5) with 120 words, would be assigned a LAG-OR grade of 2.5 (less than mediocre).

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-28
SLIDE 28

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Combined Multi-lingual Performance (CMP)

gs(l) is the LAG grade of system s in a given language l from the full set of languages L: CMPs =

  • l∈L

gs(l) |L| Non-participation implies a LAG value of 1. Instability System s participated in the set Ls of languages, Ls ⊂ L, and the st.dev. of its LAG grades in these languages is σs, then: Instabilitys = σs

  • |Ls|

Higher instability indicates more uncertainty on future performance

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-29
SLIDE 29

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Overview

System CMP Instability ID1 (CIST) 2.99 0.19 ID2 (CLASSY) 2.95 0.18 ID3 (JRC) 3.13 0.18 ID4 (LIF) 1.86 0.21 ID5 (SIEL IIITH) 1.6 0.48 ID6 (TALN UPF) 1.6 0.34 ID7 (UBSummarizer) 2.41 0.19 ID8 (UoEssex) 1.63 0.78 ID9 (Baseline) 2.81 0.27 ID10 (Topline) 2.71 0.22

Table: Combined Multi-lingual Performance and Instability per System

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-30
SLIDE 30

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Per Language Overview — Arabic

  • 1

2 3 4 5 SysID Overall Responsiveness

  • A

B C ID1 ID2 ID3 ID4 ID6 ID7 ID8 ID9

  • 1

2 3 4 5 SysID LAG

  • ID1

ID10 ID2 ID3 ID4 ID6 ID7 ID8 ID9

Overall Responsiveness LAG (Systems only) Lesson Model summaries may be bad summaries. How does this influence evaluation?

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-31
SLIDE 31

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Overall Responsiveness — Czech, English

  • 1

2 3 4 5 SysID Overall Responsiveness

  • A

B C D ID1 ID10 ID2 ID3 ID4 ID7 ID9

  • 1

2 3 4 5 SysID Overall Responsiveness

  • A

B C ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9

Czech English

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-32
SLIDE 32

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Overall Responsiveness — French, Greek

  • 1

2 3 4 5 SysID Overall Responsiveness

  • A

B C D E F ID1 ID2 ID4 ID6 ID9

  • 1

2 3 4 5 SysID Overall Responsiveness

  • A

B C ID1 ID10 ID2 ID3 ID4 ID7 ID9

French Greek

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-33
SLIDE 33

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Overall Responsiveness — Hebrew, Hindi

  • 1

2 3 4 5 SysID Overall Responsiveness

  • A

B C ID1 ID10 ID2 ID3 ID4 ID7 ID9

  • 1

2 3 4 5 SysID Overall Responsiveness

  • A

B C ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID9

Hebrew Hindi

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-34
SLIDE 34

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Summary of system performances

Systems good enough for many languages Big variance across languages Human grades not always stable Human grades not always high

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-35
SLIDE 35

Introduction MultiLing Pilot The Results Conclusion Participation System Evaluation Performance Automatic Evaluation

Correlations

Language ROUGE2 to OR MeMoG to OR ROUGE2 to MeMoG Arabic 0.25

  • 0.36

0.11 Czech 0.33

  • 0.04

0.24 English 0.56 0.47 0.47 French 0.42 0.37 0.50 Greek 0.14 0.33 0.24 Hebrew 0.52 0.05

  • 0.24

Hindi 0.18 0.33 0.13 All languages 0.12 0.12 0.42

Table: Correlation (Kendall’s Tau) Between Gradings. Note: statistically significant results in bold.

Lesson Much space for improvement. Negative examples can be good examples...

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-36
SLIDE 36

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

Outline

1

Introduction

2

MultiLing Pilot

3

The Results

4

Conclusion

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-37
SLIDE 37

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

Community

MMS Researchers are present MMS Researchers are active and collaborating

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-38
SLIDE 38

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

Community

MMS Researchers are present MMS Researchers are active and collaborating Researchers need data and evaluation

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-39
SLIDE 39

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

Dataset

Useful Publicly available A basis for future work Measured effort

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-40
SLIDE 40

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

From pilot to track

Dataset Evaluation Support

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-41
SLIDE 41

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

Dataset

Change of scale

More languages More texts

Dataset creation support software (Funded) Community work

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-42
SLIDE 42

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

Evaluation

Larger dataset Use negative examples of summaries Optimize existing metrics Devise better metrics

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-43
SLIDE 43

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

Support

TAC support Community support AIJ funding

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

slide-44
SLIDE 44

Introduction MultiLing Pilot The Results Conclusion Achievements The Future

Thank you!

Last lesson United we stand, divided we fall... (attributed to Aesop, Greek Fabulist) We stand. (TAC MultiLing Pilot Community)

Co-organizers: Ilias Zavitsanos, (NCSR Demokritos, Greece) Vasudeva Varma (IIT Hyderabad, India) Josef Steinberger (JRC, Italy in collaboration with the Univ. of West Bohemia, Czech Republic) Benoˆ ıt Favre (LIF, France) Marina Litvak (Sami Shamoon College of Engineering, Israel) Mahmoud El - Haj (Univ. of Essex, UK) William Darling (Univ. of Guelph, Canada)

George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011