Learning and Generating Paraphrases From Twitter and Beyond Wei Xu - - PowerPoint PPT Presentation

learning and generating paraphrases from twitter and
SMART_READER_LITE
LIVE PREVIEW

Learning and Generating Paraphrases From Twitter and Beyond Wei Xu - - PowerPoint PPT Presentation

Learning and Generating Paraphrases From Twitter and Beyond Wei Xu Computer)and)Informa/on)Science) University)of)Pennsylvania Guest Lecture @ Penn MT class April-2-2015 Research Overview TACL 15 ! Paraphrase ! NAACL 15 ! ! TACL 14 ! !


slide-1
SLIDE 1

Learning and Generating Paraphrases From Twitter and Beyond

Wei Xu

Guest Lecture @ Penn MT class April-2-2015

Computer)and)Informa/on)Science) University)of)Pennsylvania

slide-2
SLIDE 2

Research Overview

TACL 15!

!

NAACL 15!

!

TACL 14!

!

ACL 14!

!

ACL 13!

!

BUCC 13!

!

LSAM 13!

!

COLING 12!

!

IJCNLP 11!

!

EMNLP 11!

!

ACL 06

Social Media Paraphrase Information Extraction

slide-3
SLIDE 3

Paraphrase

slide-4
SLIDE 4

Paraphrase

… the forced resignation

  • f the CEO of Boeing,

Harry Stonecipher, for … the king’s speech His Majesty’s address wealthy rich

word phrase sentence

… after Boeing Co. Chief Executive Harry Stonecipher was ousted from …

slide-5
SLIDE 5

Application

Information Extraction end_job (Harry Stonecipher, Boeing)

Wei)Xu,)Raphael)Hoffmann,)Le)Zhao,)Ralph)Grishman.)“Filling)Knowledge)Base)Gaps)for)Distant)Supervision)of)Rela/on)Extrac/on”)) In)ACL)(2013)))

extract

… the forced resignation

  • f the CEO of Boeing,

Harry Stonecipher, for … … after Boeing Co. Chief Executive Harry Stonecipher was ousted from …

slide-6
SLIDE 6

Application

Question Answering Who is the CEO stepping down from Boeing?

match

… the forced resignation

  • f the CEO of Boeing,

Harry Stonecipher, for … … after Boeing Co. Chief Executive Harry Stonecipher was ousted from …

slide-7
SLIDE 7

Application

Text Simplification

They are culturally akin to the coastal peoples of Papua New Guinea. Their culture is like that of the coastal peoples of Papua New Guinea.

Wei)Xu,)Chris)CallisonUBurch.)“Problems)in)Current)Text)Simplifica/on)Research:)New)Data)Can)Help”))to)appear)in)TACL)(2015))) NSF)EAGER:)“Simplifica/on)as)Machine)Transla/on”)(2014)~)2015))

slide-8
SLIDE 8

Application

Stylistic Rewriting

Palpatine: If you will not be turned, you will be destroyed! If you will not be turn’d, you will be undone!

Wei)Xu,)Alan)Ri_er,)Bill)Dolan,)Ralph)Grishman,)Colin)Cherry.)“Paraphrasing)for)Style”)In)COLING)(2012)))

Luke: Father, please! Help me! Father, I pray you! Help me!

slide-9
SLIDE 9

Previous Work

But, primarily for formal language usage and well-edited text

Numerous publications on paraphrase identification, extraction, generation and various applications

slide-10
SLIDE 10

Previous Work

  • nly a few hundreds news agencies

report big events using formal language

(Dolan,)Quirk)and)Brocke_,)2004;)Dolan)and)Brocke_,)2005;)Brocke_)and)Dolan,)2005))

slide-11
SLIDE 11

Twitter as a new resource

Wei)Xu,)Alan)Ri_er,)Ralph)Grishman.)“A)Preliminary)Study)of)Tweet)Summariza/on)using)Informa/on)Extrac/on”)in)LASM)(2014)))

slide-12
SLIDE 12

Twitter as a powerful resource

thousands of users talk about both big and micro events using formal, informal, erroneous language

Very%diverse!%

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

slide-13
SLIDE 13

Enables new applications

pittsburgh pgh pittsburg pixburgh pit steelers against the steelers against pittsburgh

Wei)Xu,)Alan)Ri_er,)Ralph)Grishman.)“Gathering)and)Genera/ng)Paraphrases)from)Twi_er)with)Applica/on)to)Normaliza/on”)) In)BUCC)(2013)))

Information Retrieval

?

slide-14
SLIDE 14

Enables new applications

Noisy Text Normalization

  • scar nom’d doc

Oscar-nominated documentary don’t want for don’t wait for

Wei)Xu,)Joel)Tetreault,)Mar/n)Chodorow,)Ralph)Grishman,)Le)Zhao.)“Exploi/ng)Syntac/c)and)Distribu/onal)Informa/on)for)Spelling) Correc/on)with)WebUScale)NUgram)Models”)In)EMNLP)(2011)))

slide-15
SLIDE 15

Enables new applications

who wants to get a beer?

Human-computer Interaction

Wei)Xu,)Alan)Ri_er,)Ralph)Grishman.)“Gathering)and)Genera/ng)Paraphrases)from)Twi_er)with)Applica/on)to)Normaliza/on”)) In)BUCC)(2013)))

want to get a beer? who else wants to get a beer? who wants to go get a beer? trying to get a beer? who wants to buy a beer? who else wants to get a beer? … (21 different ways)

slide-16
SLIDE 16

Enables new applications

Language Education

Aaaaaaaaand stephen curry is on fire What a incredible performance from Stephen Curry

slide-17
SLIDE 17

Enables new applications

Wowsers to this nets bulls game This nets vs bulls game is great

Sentiment Analysis

This Nets vs Bulls game is nuts This Nets and Bulls game is a good game this Nets vs Bulls game is too live This NetsBulls series is intense This netsbulls game is too good

slide-18
SLIDE 18

Learn Paraphrases

slide-19
SLIDE 19

Learn Paraphrases

Mancini has been sacked by Manchester City Mancini gets the boot from Man City

identify parallel sentences automatically ! from Twitter’s big data stream

WORLD OF JENKS IS ON AT 11 World of Jenks is my favorite show on tv

Yes!% No!$

slide-20
SLIDE 20

Early Attempts

  • 1242 tweet pairs, tracking celebrity & hashtags 


(Zanzotto, Pennacchiotti and Tsioutsiouliklis, 2011)

  • named entity + date


(Xu, Ritter and Grishman, 2013)

  • bilingual posts 


(Ling, Dyer, Black and Trancoso, 2013)

slide-21
SLIDE 21

Design a Model Train it on data

slide-22
SLIDE 22

A Challenge

Mancini has been sacked by Manchester City Mancini gets the boot from Man City

very short lexically divergent

!

(less word overlap, even in high-dimensional space)

slide-23
SLIDE 23

Design a Model

two sentences about the same topic are paraphrases if and only if they contain at least one word pair that is a paraphrase anchor

That boy Brook Lopez with a deep 3 brook lopez hit a 3

At-least-one-anchor Assumption

Yes!%

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

slide-24
SLIDE 24

Another Challenge

not every word pair of similar meaning indicates sentence-level paraphrase Solution: a discriminative model using features at word-level

Iron Man 3 was brilliant fun Iron Man 3 tonight see what this is like

No!$

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

slide-25
SLIDE 25

Multi-instance Learning Paraphrase Model

Z2"

0" 0" 0"

1" 0"

..." man$"|"teo" be"|"is" ..." Z1" Z4" Y#paraphrase" Y#non2paraphrase" Z3"

1"

next"|"new" diff_word" same_pos_nn" both_sig" …" same_stem" same_pos_be" not_both_sig" …" diff_word" same_pos_jj" both_sig" …" diff_word" diff_pos_nn" diff_pos_jj" not_both_sig" …" man$"|"li>le"

sentence"pair" word"pair" features(

Manti bout to be the next Junior Seau Teo is the little new Junior Seau

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

slide-26
SLIDE 26

[Mini Tutorial] Multi-instance Learning

Nega%ve'Bags' Posi%ve'Bags''

A'bag'is'labeled'posi%ve,'if'' there'is'at#least#one'posi%ve'example' A'bag'is'labeled'nega%ve,'if'' all'the'examples'in'it'are'nega%ve'

Instead of labels on each individual instance, the learner only observes labels on bags of instances.

(Die_erich)et)al.,)1997))

slide-27
SLIDE 27

[Mini Tutorial] Multi-instance Learning

Z2"

?" ?" 1"

Z1" Y" Z3"

1"

bag"label" (observed)" instance"label" (latent)"

Posi7ve"Bag""

A"bag"is"labeled"posi7ve,"if"" there"is"at#least#one"posi7ve"example"

features" constraints"

Latent Variable Model

slide-28
SLIDE 28

[Mini Tutorial] Multi-instance Learning

Latent Variable Model

Z2"

0" 0" 0"

Z1" Y" Z3"

0"

instance"label" (latent)"

Nega3ve"Bag""

A"bag"is"labeled"nega3ve,"if"" all"the"examples"in"it"are"nega3ve"

features" constraints" bag"label" (observed)"

slide-29
SLIDE 29

[Mini Tutorial] Multi-instance Learning

Maria)Pershina,)Bonan)Min,)Wei)Xu,)Ralph)Grishman.)“Infusion)of)Labeled)Data)into)Distant)Supervision)for)Rela/on)Extrac/on”))In)ACL)(2014)))) Wei)Xu,)Raphael)Hoffmann,)Le)Zhao,)Ralph)Grishman.)“Filling)Knowledge)Base)Gaps)for)Distant)Supervision)of)Rela/on)Extrac/on”))In)ACL)(2013)))) Wei)Xu,)Ralph)Grishman,)Le)Zhao.)“Passage)Retrieval)for)Informa/on)Extrac/on)using)Distant)Supervision”))In)IJCNLP)(2011))))

Distantly Supervised Information Extraction

  • 1. incomplete knowledge base problem
  • 2. distant supervision + human-labeled data

G

|R| |xi|

n

zi

hi

yi

xi

      

{

relation level mention level

  • 3. IE + IR
slide-30
SLIDE 30

[Recap] Multi-instance Learning Paraphrase Model

Z2"

0" 0" 0"

1" 0"

..." man$"|"teo" be"|"is" ..." Z1" Z4" Y#paraphrase" Y#non2paraphrase" Z3"

1"

next"|"new" diff_word" same_pos_nn" both_sig" …" same_stem" same_pos_be" not_both_sig" …" diff_word" same_pos_jj" both_sig" …" diff_word" diff_pos_nn" diff_pos_jj" not_both_sig" …" man$"|"li>le"

sentence"pair" word"pair" features(

Manti bout to be the next Junior Seau Teo is the little new Junior Seau

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

slide-31
SLIDE 31

Joint Word-Sentence Model

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

Zi" Y#" W×W"" S×S""

sentence"pair" word"pair" determinis2c"OR" bag"label" (observed)" instance"label" (latent)"

Model the assumption:! sentence-level paraphrase is anchored by at-least-one word pair

Zj"

slide-32
SLIDE 32

Joint Word-Sentence Model

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

features parameters determinis/c)OR jth)word)pair ith)sentence)pair’s)label)) (observed)or)to)be)predicated)) latent)labels)for)all)word)pairs)) in)the)ith)sentence)pair)

slide-33
SLIDE 33

Learning Algorithm

Objective:! learn the parameters that maximize likelihood over the training corpus

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

ith#training#sentence#pair all#possible#values#

  • f#the#latent#variables
slide-34
SLIDE 34

reward#correct# (condi6oned#on#labels)

Learning Algorithm

Perceptron-style Update:! Viterbi approximation + online learning O(# word pairs)

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

penalize#wrong# (ignoring#labels)

slide-35
SLIDE 35

Training Data

slide-36
SLIDE 36

Annotation

Crowdsourcing

(Courtesy:)The)Sheep)Market)by)Aaron)Koblin)

slide-37
SLIDE 37

Annotation

Wei)Xu.)“DataUdriven)Approaches)for)Paraphrasing)Across)Language)Varia/ons”)PhD)Thesis,)New)York)University.)(2014)))

Crowdsourcing

slide-38
SLIDE 38

A Problem

  • nly 8% sentence pairs about the same topic

have similar meaning hurts both quantity and quality

non#experts*lower*their*bars*

Wei)Xu.)“DataUdriven)Approaches)for)Paraphrasing)Across)Language)Varia/ons”)PhD)Thesis,)New)York)University.)(2014)))

slide-39
SLIDE 39

Sentence Selection

Netflix Jeff Green Ryu The Clippers Reggie Miller 0.2 0.4 0.6 0.8

Random w/ Selection

SumBasic Algorithm 8% 16%

percentages of paraphrases

Wei)Xu.)“DataUdriven)Approaches)for)Paraphrasing)Across)Language)Varia/ons”)PhD)Thesis,)New)York)University.)(2014)))

slide-40
SLIDE 40

Topic Selection

Multi-Armed Bandits 16% 34%

Wei)Xu.)“DataUdriven)Approaches)for)Paraphrasing)Across)Language)Varia/ons”)PhD)Thesis,)New)York)University.)(2014)))

slide-41
SLIDE 41

Twitter Paraphrase Dataset

18,762 sentence pairs labeled cost only $200

! ! !

1/3 paraphrase, 2/3 non-paraphrase (very balanced) including a very broad range of paraphrases: synonyms, misspellings, slang, acronyms and colloquialisms

Wei)Xu.)“DataUdriven)Approaches)for)Paraphrasing)Across)Language)Varia/ons”)PhD)Thesis,)New)York)University.)(2014)))

important)but)difficult)to)obtain

slide-42
SLIDE 42

Performance

slide-43
SLIDE 43

Performance

40 55 70 85 100

(Das&Smith,2009) (Guo&Diab,2012) (Ji&Eisenstein,2013) Our Model Human Upperbound

90.8 72.6 62.8 65.5 63.2 75.2 72.2 66.4 52.5 62.9

Precision Recall

state-of-the-art of paraphrase identification

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

slide-44
SLIDE 44

40 55 70 85 100

(Das&Smith,2009) (Guo&Diab,2012) (Ji&Eisenstein,2013) Our Model Human Upperbound

90.8 72.6 62.8 65.5 63.2 75.2 72.2 66.4 52.5 62.9

Precision Recall

Performance

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014)) Our Model (Ji&Eisenstein,2013)

slide-45
SLIDE 45

Impact

SemEval 2015 shared task on “Paraphrase in Twitter” 19 + 1 teams participated

!

100+ research groups have requested the data since Nov 2014 paraphrase identification (0 or 1) rank 1 semantic similarity (0 ~ 1) rank 4

  • ur model

Wei)Xu,)Chris)CallisonUBurch,)Bill)Dolan.)“SemEvalU2015)Task)1:)Paraphrase)and)Seman/c)Similarity)in)Twi_er)(PIT)”)In)SemEval)(2015))

slide-46
SLIDE 46

Innovations

That boy Brook Lopez with a deep 3 brook lopez hit a 3

Yes!%

Multi-instance Learning Paraphrase Model (MultiP)

  • Twitter’s big data stream
  • potential beyond Twitter and English
  • joint sentence-word alignment
  • extensible latent variable model

Wei)Xu,)Alan)Ri_er,)Chris)CallisonUBurch,)Bill)Dolan,)Yangfeng)Ji.)“Extrac/ng)Lexically)Divergent)Paraphrases)from)Twi_er”)In)TACL)(2014))

(a lot of space for future work)

slide-47
SLIDE 47

Generate Paraphrases

slide-48
SLIDE 48

Extract Phrasal Paraphrases

Mancini has been sacked by Manchester City Mancini gets the boot from Man City

align

Wei)Xu,)Alan)Ri_er,)Ralph)Grishman.)“Gathering)and)Genera/ng)Paraphrases)from)Twi_er)with)Applica/on)to)Normaliza/on”)) In)BUCC)(2013)))

slide-49
SLIDE 49

Extract Phrasal Paraphrases

has been sacked by gets the boot from manchester city man city 4 for 4 four

  • utta
  • ut of

hostes hostess

Wei)Xu,)Alan)Ri_er,)Ralph)Grishman.)“Gathering)and)Genera/ng)Paraphrases)from)Twi_er)with)Applica/on)to)Normaliza/on”)) In)BUCC)(2013)))

slide-50
SLIDE 50

Text-to-text Generation

business Hostes

  • utta

is going biz . .

  • ut of

is going Hostess

translate

Wei)Xu,)Alan)Ri_er,)Ralph)Grishman.)“Gathering)and)Genera/ng)Paraphrases)from)Twi_er)with)Applica/on)to)Normaliza/on”)) In)BUCC)(2013)))

slide-51
SLIDE 51

Statistical Machine Translation

Bilingual Monolingual studied sensitive to error

  • bjective

straightforward sophisticated less more less more a lot more recently has standard evaluation yes not quite yet naturally available parallel text

Wei)Xu.)“DataUdriven)Approaches)for)Paraphrasing)Across)Language)Varia/ons”)PhD)Thesis,)New)York)University.)(2014)))

(Paraphrase =)

slide-52
SLIDE 52

Text-to-text Generation

complex simple stylistic plain noisy standard erroneous correct

and more (future work) …

(Xu et al. 2013) (Xu et al. 2012) (Xu et al. 2015) (Xu et al. 2011)

Wei)Xu.)“DataUdriven)Approaches)for)Paraphrasing)Across)Language)Varia/ons”)PhD)Thesis,)New)York)University.)(2014)))

slide-53
SLIDE 53

Prose to Sonnet

Wandering through rows of stalls examining workhorses and prize hogs may seem to … have been a strange way for a scientist to spend an afternoon, but there was a certain logic to it.

hogs may seem a bit strange through rows of stalls

Quanze)Chen,)Chenyang)Lei,)Wei)Xu,)Ellie)Pavlick)and)Chris)CallisonUBurch.)“Poetry)of)the)Crowd:)A)Human)Computa/on)Algorithm)to) Convert)Prose)into)Rhyming)Verse”)In)AAAI's)HCOMP)(2012)

[Rhyme]! balls falls installs walls …

slide-54
SLIDE 54

Text Simplification

state-of-the-art (since 2010)

NSF)EAGER:)“Simplifica/on)as)Machine)Transla/on”)(2014)~)2015))

slide-55
SLIDE 55

Text Simplification

state-of-the-art (since 2010) is suboptimal ! is not all that simple

Wei)Xu,)Chris)CallisonUBurch.)“Problems)in)Current)Text)Simplifica/on)Research:)New)Data)Can)Help”))to)appear)in)TACL)(2015))) NSF)EAGER:)“Simplifica/on)as)Machine)Transla/on”)(2014)~)2015))

slide-56
SLIDE 56
  • Jointly model word-sentence via latent variables!

!

  • Use Twitter as a powerful paraphrase resource!

!

  • Systemize a framework for language generation!

!

  • Right the direction of text simplification research

all#code#and#data#are#available#on#my#homepage:##h<p://www.cis.upenn.edu/~xwe/

Main Contributions

slide-57
SLIDE 57

The Ideal

slide-58
SLIDE 58

Collaborators

Chris Callison-Burch Ralph Grishman Bill Dolan Alan Ritter Raphael Hoffmann Joel Tetreault Le Zhao Maria Pershina Martin Chodorow Colin Cherry Yangfeng Ji Ellie Pavlick Mingkun Gao Quanze Chen UPenn NYU MSR UW / OSU UW / AI2 Incubator ETS / Yahoo! CMU / Google NYU CUNY NRC GaTech UPenn UPenn UPenn

slide-59
SLIDE 59

Thank you

thanks thanking you appreciate it thnx thx tyvm thank you very much thanks a lot 3x say thanks am grateful wawwww thankkkkkkkkkkk you alotttttttttttt! thank u 4 ur time gratitude