Argument Retrieval in Project Debater
Yufang Hou
IBM Research Europe, Dublin
Argument Retrieval in Project Debater Yufang Hou IBM Research - - PowerPoint PPT Presentation
Argument Retrieval in Project Debater Yufang Hou IBM Research Europe, Dublin IBM Research: History of Grand Challenges 2019 First computer to successfully debate champion debaters 2011 ( Proje ater ) Project Debat First computer to defeat
IBM Research Europe, Dublin
1997 First computer to defeat a world champion in Chess (Deep Blue) 2011 First computer to defeat best human Jeopardy! players (Watson) 2019 First computer to successfully debate champion debaters (Proje Project Debat ater)
Motion: We should subsidize preschool
Selected from test set based on assessment of chances to have a meaningful debate
Format: Oxford style debating Fully automatic debate No human intervention
Fully automatic debate
No human intervention
Segments from a Live Debate (San Francisco, Feb 11th 2019) Expert human debater: Mr. Harish Natarajan
Hundreds
leading news papers
100 Million
people reached
Millions
2.1 Billion
social media impressions
https://www.youtube.com/watch?v=m3u-1yttrVw&t=2469s
https://www.youtube.com/watch?v=7pHaNMdWGsk&t=1383s
Current Publications Highlight Various As Aspects of the System
Lippi ppi an and d To Toroni ni, IJCAI AI, 2015
Al-Khatib et al, NAAC AACL 2016; Wachsmuth et al, Ar Argument-Mi Mining W Workshop, 2 2017, … …
Stab and nd Gu Gurevy vych, EMNLP 2014; Stab et al, NAAC AACL 2018, …
Recent nt reviews
Five years rs of
rgument min inin ing: g: a data-dr driven en an anal alysis, Cabr abrio an and d Vi Villata, IJCAI AI, 2018
Argumentation Mining, St Stede an and d Sch chnei eider der, Synthes esis Lect Lectures es on HLT LT, 2018 2018
Argument Mining: A A Survey, Lawrence and Reed, CL, 2019
Con
2014. Show
Your Evidence - an Au Autom
Evidence Detection
2015.
edia Claim/Eviden ence e Label eled ed Data – Label eling Proces ess
Con
Select Wikipedia Ar Articles Fi Find Claim Candidates per Ar Article Con
Fi Find Candidate Evidence per Claim Con
ü 5 5 In-house An Annotators Per Stage ü Ex Exhau austive e an annotat ation
edia Claim/Eviden ence e Label eled ed Data - Res esults
58 Controver ersial al Topi pics cs se selected from rom De Debatabase
547 rel elev evan ant Wikipedi pedia a ar articl cles es car caref efully label abeled ed by by in-ho hous use team
§
E.g., Ban the sale of
2.6K 6K Clai aims ms & & 4. 4.5K 5K Ev Eviden dence ce th that s t support/c t/conte test th t the c claims
§
Evidence length vary from
§
Three types of
Anecdot
Pre-def defined ed trai ain/dev dev/tes est spl plit
em Des esign for Ar Argumen ent Mining
Topic Topic Analysis Document Level IR Claim Detection
We should subsidize preschool
Evidence Detection
address the topic and are likely to contain argumentative text segments
carefully designed features
Mining, Shnarch et al., EMNLP 2017
Cor
AAAI 2020. 2020.
Sent ntenc nce Level (SL (SL) ) strategy, vs. Docum ument nt Level us used before
ALE
~240 trai ain/dev dev topi pics cs & ~100 ~100 tes est topi pics cs
~200,000 000 sen enten ences ces car caref efully an annotat ated ed for trai ain/dev dev à Re Retrospective Lab abeling ng Par arad adigm
~10,000, 000,000, 000,000 000 Sen enten ences ces - Re Reporting ng resul ults over a a mas massive corpus us
Clos Closer tha han n ever to
ng solut
Mai Main n Di Disti tincti nction n from Prev. Wo Work
System Ar Architecture
Massive Corpus ~10B Sentences Queries Controversial Topic Ranking Model BERT Retrieved Sentences High-precision Evidence Set
argumentative sentences § Topic terms § Evidence connectors § sentiment lexicon § NER
Iteratively Collected Labeled-Data
evidence type per topic
al, EMNLP 2015
Retrospectiv ive Labelin ing g Paradigm igm
An infrastructure that supports qu quick dy dynamic expe periments and d mo monitors annotation quality
Collecting labeled data poses a two wo-fo fold c challenge -
Annot
simple guidelines, careful mon
BTW - Kappa of
~0.4 4 is ac actual ally quite good
Developing corpus-wi wide a argument m t mining p poses a anoth ther c challenge
~2,000 000 new prediction
Assoc
Retrospective lab abeling ng of top predictions ns is a a nat natur ural al and and effective solut ution
Ho How to to Collect ct Lab Labeled Data? ata?
Why Eviden ence e Det etec ection is Hard?
Mo Moti tion: n: Blood donation should be mandatory According to studies, blood donors are 88 percent less likely to suffer a heart attack… Statistics … show that students are the main blood donors contributing about 80 percent…
REJECTED CONFIRMED
Number of candidates Macro-Average Precision
Re Results
Results by va various BERT RT Models ove ver a a mas assive corpus of ~10B B sentences
A baselines: Bl BlendNet, At Attention based bi bidi directional LS LSTM mode del [Shnarch et al. (2018)] )]
High p precis isio ion
Wide coverage wi with th diverse evidences (hi (highl hly simi milar sent ntenc nces are remo moved)
Modeling human dilemmas
controversy and
principled arguments
Listening comprehension
long continuous spoken language
Data-driven speech writing and delivery
Ar Argumen ent ret etriev eval is the e first step ep to build such a system em
Getti tting th the s sta tance wr wrong m means y you s support y t your o
t…
Drifting from the topic – fr from Ph Physical l Ed Education
to Se Sex Edu ducat atio ion an and d back back…
The system is onl nly as good as its corpus us à … … gl global al war armin ing g wil ill lead ad ma malaria virus to to creep into nto hi hilly areas…
Document le level l IR
Corpus us: Wi Wikipedia
Exhaustive ve labe belling g
ces
LR + Rich feat ature res
Sentence level IR
Very Large Corpus: 400 400 mi million articles (50 times larger than Wikipedia)
Attention-based Bi-LSTM with weak supervision Sentence level IR Flexible query Very large corpus Retrospective labelling
Computational Argumentation
Social NLP
Discourse and Pragmatics
Dialogue System Natural Language Generation Text Summarization
is emerging as an interesting research area
keyword in the list of topics in recent *ACL conferences