Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 - PowerPoint PPT Presentation

Introduction MultiLing Pilot The Results Conclusion Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 NCSR Demokritos, Greece ggianna@iit.demokritos.gr November 2011 George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction MultiLing Pilot Motivation The Results Conclusion Outline Introduction 1 MultiLing Pilot 2 The Results 3 Conclusion 4 George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction MultiLing Pilot Motivation The Results Conclusion Multilinguality News Blogs Search results Automatic translation George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction MultiLing Pilot Motivation The Results Conclusion Brief history of DUC/TAC domains Single document summarization Multi-document summarization (Update, Guided, Opinion, ...) Cross-lingual summarization Something appears to be missing... George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction MultiLing Pilot Motivation The Results Conclusion The missing piece: MultiLing Create summaries regardless of underlying language on document sets that use the same (possibly unknown) language. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction MultiLing Pilot Motivation The Results Conclusion MultiLing aim Detect multi-document summarization (MMS) research Learn about MMS algorithms Learn about multilingual reusable resources Quantify performance Check existing automatic measures George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Outline Introduction 1 MultiLing Pilot 2 The Results 3 Conclusion 4 George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Task definition Generate a single, fluent, representative summary from a set of documents describing an event sequence language for document set within a given range output summary should be (240-)250 words An event Sequence ...is a set of atomic (self-sufficient) event descriptions, sequenced in time, that share main actors, location of occurence or some other important factor. Event sequences may refer to topics such as a natural disaster, a crime investigation, a set of negotiations focused on a single political issue, a sports event. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Dataset Human created Multi-lingual News Freely available Containing event sequences Plain text Solution WikiNews ( http://www.wikinews.org ) Translation Preprocessing George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Mini-pilot for effort estimation Small scale corpus (2 topics) Everything was timed Questions would be noted Lesson Always do a mini-pilot, note everything, do follow-up meetings. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Overview of full corpus creation Determine topics (10 topics / language) Translate documents (10 docs / topic) Produce model summaries (3 models / topic) George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Determine topics Use metadata (WikiNews categories) Verify existence of event sequence Cover several different news types (e.g., politics, environment, sports) Find at least 10 documents per topic George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Translate documents Sentence alignment Keep original meaning Produce readable, fluent text Translation verified Lesson Difficult, error-prone, subjective, high cost process. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Summarizing 3 summarizers per topic and language Keep human subjectivity related to important aspects Use the minimum possible guidelines Self-sufficient, clearly written text ...providing no external information ...fluent, easily readable language Lesson Few guidelines are better than a lot. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Types of evaluation Automatic (ROUGE, AutoSummENG) Manual (Overall Responsiveness) George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Automatic Methods ROUGE (ROUGE-1, 2, SU-4), word n-gram matching, allows gaps AutoSummENG — Merged Model Graph (MeMoG), character n-gram co-occurence, merged representation Not (too) strongly correlated. Possibly describing slightly different aspects. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Manual Evaluation Guidelines Read source documents at least once Give a grade between 1 and 5 (Overall Responsiveness: OR) Content and fluency equally important George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Task Details MultiLing Pilot Corpus creation The Results Evaluating summaries Conclusion Guidelines continued We consider a text to be worth a 5, if it appears to cover all the important aspects of the corresponding document set using fluent, readable language. A text should be assigned a 1, if it is either unreadable, nonsensical, or contains only trivial information from the document set. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Participation MultiLing Pilot System Evaluation The Results Performance Conclusion Automatic Evaluation Outline Introduction 1 MultiLing Pilot 2 The Results 3 Conclusion 4 George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Participation MultiLing Pilot System Evaluation The Results Performance Conclusion Automatic Evaluation Overview Original aim: 3 groups per language George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Participation MultiLing Pilot System Evaluation The Results Performance Conclusion Automatic Evaluation Overview Original aim: 3 groups per language Achieved: 8+1 groups George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Participation MultiLing Pilot System Evaluation The Results Performance Conclusion Automatic Evaluation Overview Original aim: 3 groups per language Achieved: 8+1 groups Original aim: 5 languages George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Participation MultiLing Pilot System Evaluation The Results Performance Conclusion Automatic Evaluation Overview Original aim: 3 groups per language Achieved: 8+1 groups Original aim: 5 languages Achieved: 7 languages George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Participation MultiLing Pilot System Evaluation The Results Performance Conclusion Automatic Evaluation Baseline — Topline global baseline system (ID9) , vector space, bag-of-words, highest cosine similarity to the centroid of documents. global topline system (ID10) uses the model summaries, produces random summaries by combining sentences, find the one closest to the Merged Model Graph of the models. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Participation MultiLing Pilot System Evaluation The Results Performance Conclusion Automatic Evaluation Our champions Participant System ID Arabic Czech English French Greek Hebrew Hindi Notes CIST ID1 Peer � � � � � � � CLASSY ID2 Peer � � � � � � � JRC ID3 Coorg (Czech) � � � � � � � LIF ID4 Coorg (French) � � � � � � � SIEL IIITH ID5 Coorg (Hindi) � � � TALN UPF ID6 Peer � � � � UBSummarizer ID7 � � � � � � � Peer UoEssex ID8 � � Coorg (Arabic) Baseline ID9 Centroid baseline for all languages Coorg (All) Topline ID10 Using model summaries for all languages Coorg (All) Lesson The community will respond if you take the first step. George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Introduction Participation MultiLing Pilot System Evaluation The Results Performance Conclusion Automatic Evaluation Evaluation aims Allow, but penalize, out-of-limit text sizes Measure per language performance Reward multi-lingual systems George Giannakopoulos Overview of the MultiLing Pilot in TAC 2011

Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 - PowerPoint PPT Presentation

Introduction MultiLing Pilot The Results Conclusion Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 NCSR Demokritos, Greece ggianna@iit.demokritos.gr November 2011 George Giannakopoulos Overview of the MultiLing Pilot

I want my MVP UX in the City - 20th April 2017 PILOT WORKS 1 Hello, I am Alastair from PILOT

IKATAN PILOT INDONESIA Source: FAA (2004-2013) IKATAN PILOT INDONESIA Source: FAA IKATAN PILOT

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

ABA Meeting TAC Card Update May 21, 2019 Office of Disbursements ABA Meeting TAC Card Update

Texas Administrative Code Ch. 202 W EDNESDAY , J ULY 23, 2014 | A USTIN , T EXAS TAC 202

Existing Class B Graphics Los Angeles TAC/Flyway San Diego TAC/Flyway Phoenix

Multi-story An introduction to our pilot project 11 th June 2019 Agenda 1. Context to the

The Lion Pilot The Lion Pilot The Lion pilot program was created by the Boy Scouts of America

PILOT PLATFORM OF VOCATIONAL PILOT PLATFORM OF VOCATIONAL EXCELLENCE WATER (PILOT PoVE WATER

EPA Hazardous Waste Exports Pilot Filing Information as of January 2017 Pilot Status Export Pilot

TAC Services Overview 1969: TAC was founded by these Associations County Judges &

TAC Services Overview New Treasurers Seminar December 14, 2018 San Marcos, TX 1969: TAC was

Overview of Event Nugget Track TAC KBP 2016 Teruko Mitamura Zhengzhong Liu Eduard Hovy

Resource Adequacy Assessment December Update Dan Woodfin Director, System Planning TAC TAC

TAC Meeting Presentation 8/31/15 Additional Analysis Requested by TAC Baseline 2040 Values

Red Wing Bridge Project PAC #11/TAC #14 Meeting June 25, 2015 PAC #11/TAC #14 June 25, 2015

Ons Oueslati Water Research Institute, Italian National Research Council Introduction Within

PRESENCE OR ABSENCE OF CHRONIC KIDNEY DISEASE. A BET ON MACE TRIAL REPORT. Kamyar Kalantar-Zadeh,

1 The Stakes: Half a million Coloradans have health insurance because of the Affordable Care Act

ACA Roundtable March 21, 2017 Mikel T. Gray, Milliman Melanie Curtice, Perkins Coie Jodi

Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University

Burdekin Grower Research Update Marian Davis Burdekin harvester trials Ryan Turner Water

private healthcare organization: the balanced scorecard as a tool to support management Lara

Optimal Electricity Generation Portfolios in the Presence of Fuel Price and Availability Risks

Sambuz

Useful Links

Newsletter

Mail Us

Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 - PowerPoint PPT Presentation

Introduction MultiLing Pilot The Results Conclusion Overview of the MultiLing Pilot in TAC 2011 George Giannakopoulos 1 1 NCSR Demokritos, Greece ggianna@iit.demokritos.gr November 2011 George Giannakopoulos Overview of the MultiLing Pilot

I want my MVP UX in the City - 20th April 2017 PILOT WORKS 1 Hello, I am Alastair from PILOT

IKATAN PILOT INDONESIA Source: FAA (2004-2013) IKATAN PILOT INDONESIA Source: FAA IKATAN PILOT

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

ABA Meeting TAC Card Update May 21, 2019 Office of Disbursements ABA Meeting TAC Card Update

Texas Administrative Code Ch. 202 W EDNESDAY , J ULY 23, 2014 | A USTIN , T EXAS TAC 202

Existing Class B Graphics Los Angeles TAC/Flyway San Diego TAC/Flyway Phoenix

Multi-story An introduction to our pilot project 11 th June 2019 Agenda 1. Context to the

The Lion Pilot The Lion Pilot The Lion pilot program was created by the Boy Scouts of America

PILOT PLATFORM OF VOCATIONAL PILOT PLATFORM OF VOCATIONAL EXCELLENCE WATER (PILOT PoVE WATER

EPA Hazardous Waste Exports Pilot Filing Information as of January 2017 Pilot Status Export Pilot

TAC Services Overview 1969: TAC was founded by these Associations County Judges &amp;

TAC Services Overview New Treasurers Seminar December 14, 2018 San Marcos, TX 1969: TAC was

Overview of Event Nugget Track TAC KBP 2016 Teruko Mitamura Zhengzhong Liu Eduard Hovy

Resource Adequacy Assessment December Update Dan Woodfin Director, System Planning TAC TAC

TAC Meeting Presentation 8/31/15 Additional Analysis Requested by TAC Baseline 2040 Values

Red Wing Bridge Project PAC #11/TAC #14 Meeting June 25, 2015 PAC #11/TAC #14 June 25, 2015

Ons Oueslati Water Research Institute, Italian National Research Council Introduction Within

PRESENCE OR ABSENCE OF CHRONIC KIDNEY DISEASE. A BET ON MACE TRIAL REPORT. Kamyar Kalantar-Zadeh,

1 The Stakes: Half a million Coloradans have health insurance because of the Affordable Care Act

ACA Roundtable March 21, 2017 Mikel T. Gray, Milliman Melanie Curtice, Perkins Coie Jodi

Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University

Burdekin Grower Research Update Marian Davis Burdekin harvester trials Ryan Turner Water

private healthcare organization: the balanced scorecard as a tool to support management Lara

Optimal Electricity Generation Portfolios in the Presence of Fuel Price and Availability Risks

Sambuz

Useful Links

Newsletter

Mail Us

TAC Services Overview 1969: TAC was founded by these Associations County Judges &