Argument Retrieval in Project Debater Yufang Hou IBM Research - - PowerPoint PPT Presentation

argument retrieval in project debater
SMART_READER_LITE
LIVE PREVIEW

Argument Retrieval in Project Debater Yufang Hou IBM Research - - PowerPoint PPT Presentation

Argument Retrieval in Project Debater Yufang Hou IBM Research Europe, Dublin IBM Research: History of Grand Challenges 2019 First computer to successfully debate champion debaters 2011 ( Proje ater ) Project Debat First computer to defeat


slide-1
SLIDE 1

Argument Retrieval in Project Debater

Yufang Hou

IBM Research Europe, Dublin

slide-2
SLIDE 2

1997 First computer to defeat a world champion in Chess (Deep Blue) 2011 First computer to defeat best human Jeopardy! players (Watson) 2019 First computer to successfully debate champion debaters (Proje Project Debat ater)

IBM Research: History of Grand Challenges

slide-3
SLIDE 3

Motion: We should subsidize preschool

Selected from test set based on assessment of chances to have a meaningful debate

Format: Oxford style debating Fully automatic debate No human intervention

Fully automatic debate

No human intervention

Segments from a Live Debate (San Francisco, Feb 11th 2019) Expert human debater: Mr. Harish Natarajan

slide-4
SLIDE 4

Project Debater: Media Exposure

Hundreds

  • f press articles in all

leading news papers

100 Million

people reached

Millions

  • f video views

2.1 Billion

social media impressions

slide-5
SLIDE 5
  • Fu

Full Li Live e Deba ebate, te, Feb Feb-2019 2019

https://www.youtube.com/watch?v=m3u-1yttrVw&t=2469s

  • “T

“The e Deb ebater ter” Doc

  • cumen

enta tary

https://www.youtube.com/watch?v=7pHaNMdWGsk&t=1383s

slide-6
SLIDE 6

Outline

q System overview q Argument retrieval in Project Debater q Some retrospective thoughts

slide-7
SLIDE 7

Current Publications Highlight Various As Aspects of the System

slide-8
SLIDE 8

Pub Public licatio ions ns an and Dat Datas asets are are av avai ailab able at at - https://www.research.ibm.com/artificial- intelligence/project-debater/research/

slide-9
SLIDE 9

Outline

q System overview q Argument retrieval in Project Debater q Some retrospective thoughts

slide-10
SLIDE 10
  • Li

Lippi ppi an and d To Toroni ni, IJCAI AI, 2015

  • Al

Al-Khatib et al, NAAC AACL 2016; Wachsmuth et al, Ar Argument-Mi Mining W Workshop, 2 2017, … …

  • St

Stab and nd Gu Gurevy vych, EMNLP 2014; Stab et al, NAAC AACL 2018, …

  • Re

Recent nt reviews

  • Fiv

Five years rs of

  • f argu

rgument min inin ing: g: a data-dr driven en an anal alysis, Cabr abrio an and d Vi Villata, IJCAI AI, 2018

  • Ar

Argumentation Mining, St Stede an and d Sch chnei eider der, Synthes esis Lect Lectures es on HLT LT, 2018 2018

  • Ar

Argument Mining: A A Survey, Lawrence and Reed, CL, 2019

Related Work

slide-11
SLIDE 11

Con

  • ntext Dependent Claim Detection
  • n, Levy et al, COLING 2014.

2014. Show

  • w Me You

Your Evidence - an Au Autom

  • matic Method
  • d for
  • r Con
  • ntext Dependent

Evidence Detection

  • n, Rinot
  • tt et al, EMNLP 2015.

2015.

Wikipedia Stage

slide-12
SLIDE 12
  • Wikiped

edia Claim/Eviden ence e Label eled ed Data – Label eling Proces ess

Con

  • ntrov
  • versial Top
  • pic

Select Wikipedia Ar Articles Fi Find Claim Candidates per Ar Article Con

  • nfirm/Reject Each Claim Candidate

Fi Find Candidate Evidence per Claim Con

  • nfirm/Reject Each Candidate Evidence

ü 5 5 In-house An Annotators Per Stage ü Ex Exhau austive e an annotat ation

Wikipedia Stage

slide-13
SLIDE 13
  • Wikiped

edia Claim/Eviden ence e Label eled ed Data - Res esults

ü 58

58 Controver ersial al Topi pics cs se selected from rom De Debatabase

ü 547

547 rel elev evan ant Wikipedi pedia a ar articl cles es car caref efully label abeled ed by by in-ho hous use team

§

E.g., Ban the sale of

  • f Viol
  • lent Video
  • Games for
  • r Children

ü 2.

2.6K 6K Clai aims ms & & 4. 4.5K 5K Ev Eviden dence ce th that s t support/c t/conte test th t the c claims

§

Evidence length vary from

  • m on
  • ne sentence to
  • a whol
  • le paragraph

§

Three types of

  • f Evidence: Study, Expert, and An

Anecdot

  • tal

ü Pr

Pre-def defined ed trai ain/dev dev/tes est spl plit

Wikipedia Stage

slide-14
SLIDE 14
  • System

em Des esign for Ar Argumen ent Mining

Wikipedia Stage

Topic Topic Analysis Document Level IR Claim Detection

We should subsidize preschool

Evidence Detection

  • Retrieve documents that directly

address the topic and are likely to contain argumentative text segments

  • Simple logistic regression model with lots of

carefully designed features

  • GrASP: Rich Patterns for Argumentation

Mining, Shnarch et al., EMNLP 2017

  • Static train/dev/test datasets
  • Moderate success over a range of test topics
  • Only positive instances are annotated
  • Limited coverage
slide-15
SLIDE 15

Cor

  • rpus wide argument mining - a wor
  • rking sol
  • lution
  • n, Ein-Dor
  • r et al, AAAI

AAAI 2020. 2020.

VLC (Very Large Corpus) Stage

slide-16
SLIDE 16
  • Se

Sent ntenc nce Level (SL (SL) ) strategy, vs. Docum ument nt Level us used before

  • SCAL

ALE

  • ~240

~240 trai ain/dev dev topi pics cs & ~100 ~100 tes est topi pics cs

  • ~200,

~200,000 000 sen enten ences ces car caref efully an annotat ated ed for trai ain/dev dev à Re Retrospective Lab abeling ng Par arad adigm

  • ~10,

~10,000, 000,000, 000,000 000 Sen enten ences ces - Re Reporting ng resul ults over a a mas massive corpus us

Clos Closer tha han n ever to

  • a wor
  • rking

ng solut

  • lution
  • n

Mai Main n Di Disti tincti nction n from Prev. Wo Work

VLC (Very Large Corpus) Stage

slide-17
SLIDE 17

System Ar Architecture

Massive Corpus ~10B Sentences Queries Controversial Topic Ranking Model BERT Retrieved Sentences High-precision Evidence Set

VLC (Very Large Corpus) Stage

  • Support flexible patterns to retrieve

argumentative sentences § Topic terms § Evidence connectors § sentiment lexicon § NER

Iteratively Collected Labeled-Data

  • Retrieve 12, 000 sentences per

evidence type per topic

  • Starting with LR from Rinott et

al, EMNLP 2015

  • Re

Retrospectiv ive Labelin ing g Paradigm igm

  • An

An infrastructure that supports qu quick dy dynamic expe periments and d mo monitors annotation quality

slide-18
SLIDE 18
  • Co

Collecting labeled data poses a two wo-fo fold c challenge -

  • Low
  • w prior
  • r of
  • f pos
  • sitive examples
  • An

Annot

  • tation
  • n throu
  • ugh crow
  • wd requires expertise –

simple guidelines, careful mon

  • nitor
  • ring…
  • BT

BTW - Kappa of

  • f ~0.

~0.4 4 is ac actual ally quite good

  • od
  • De

Developing corpus-wi wide a argument m t mining p poses a anoth ther c challenge

  • Imagine ~2,

~2,000 000 new prediction

  • ns every week… à As

Assoc

  • ciated infrastructure is a must
  • Re

Retrospective lab abeling ng of top predictions ns is a a nat natur ural al and and effective solut ution

Ho How to to Collect ct Lab Labeled Data? ata?

VLC (Very Large Corpus) Stage

slide-19
SLIDE 19

Why Eviden ence e Det etec ection is Hard?

Mo Moti tion: n: Blood donation should be mandatory According to studies, blood donors are 88 percent less likely to suffer a heart attack… Statistics … show that students are the main blood donors contributing about 80 percent…

REJECTED CONFIRMED

slide-20
SLIDE 20

Number of candidates Macro-Average Precision

Re Results

  • Re

Results by va various BERT RT Models ove ver a a mas assive corpus of ~10B B sentences

  • BA

A baselines: Bl BlendNet, At Attention based bi bidi directional LS LSTM mode del [Shnarch et al. (2018)] )]

  • Hig

High p precis isio ion

  • Wi

Wide coverage wi with th diverse evidences (hi (highl hly simi milar sent ntenc nces are remo moved)

VLC (Very Large Corpus) Stage

slide-21
SLIDE 21

Outline

q System overview q Argument retrieval in Project Debater q Some retrospective thoughts

slide-22
SLIDE 22

Modeling human dilemmas

  • Modeling the world of human

controversy and

  • Enabling the system to suggest

principled arguments

Listening comprehension

  • Identify key claims hidden in

long continuous spoken language

  • Compare to personal assistants
  • simple short commands

Data-driven speech writing and delivery

  • Digest massive corpora
  • Write a well-structured speech
  • Deliver with clarity and purpose

Challenges to Consider while developing a Live Debate System

Ar Argumen ent ret etriev eval is the e first step ep to build such a system em

slide-23
SLIDE 23

The Problem: Many things need to succeed simultaneously and many things can go wrong…

slide-24
SLIDE 24

Many things can go wrong… / Examples

  • Ge

Getti tting th the s sta tance wr wrong m means y you s support y t your o

  • pponent…

t…

  • Dr

Drifting from the topic – fr from Ph Physical l Ed Education

  • n to

to Se Sex Edu ducat atio ion an and d back back…

  • The

The system is onl nly as good as its corpus us à … … gl global al war armin ing g wil ill lead ad ma malaria virus to to creep into nto hi hilly areas…

slide-25
SLIDE 25

Progress over time / Improvement in Precision of Detecting Claims

  • Docu

Document le level l IR

  • Co

Corpus us: Wi Wikipedia

  • Ex

Exhaustive ve labe belling g

  • f
  • f pos
  • sitive instance

ces

  • LR

LR + Rich feat ature res

  • Se

Sentence level IR

  • Ve

Very Large Corpus: 400 400 mi million articles (50 times larger than Wikipedia)

  • Retrospective labelling
  • Bert fine-tuning

Attention-based Bi-LSTM with weak supervision Sentence level IR Flexible query Very large corpus Retrospective labelling

slide-26
SLIDE 26

Beyond Project Debater

Computational Argumentation

  • Argument retrieval
  • Argument Unit Identification
  • Argument Relation Prediction
  • Argument(ation) Quality
  • Argument Generation

Social NLP

  • Sentiment
  • Persuasiveness
  • Social bias
  • Framing
  • Fact verification

Discourse and Pragmatics

  • Argumentative discourse
  • Argumentative coherence

Dialogue System Natural Language Generation Text Summarization

  • Computational argumentation

is emerging as an interesting research area

  • “Argument mining” is the new

keyword in the list of topics in recent *ACL conferences

slide-27
SLIDE 27

Thanks! Q&A