Semantic T extual Similarity & more on Alignment CMSC 723 / - - PowerPoint PPT Presentation

semantic t extual similarity
SMART_READER_LITE
LIVE PREVIEW

Semantic T extual Similarity & more on Alignment CMSC 723 / - - PowerPoint PPT Presentation

Semantic T extual Similarity & more on Alignment CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu 2 topics today P3 task: Semantic Textual Similarity Including Monolingual alignment Beyond IBM word alignment


slide-1
SLIDE 1

Semantic T extual Similarity & more on Alignment

CMSC 723 / LING 723 / INST 725 MARINE CARPUAT

marine@cs.umd.edu

slide-2
SLIDE 2

2 topics today

  • P3 task: Semantic Textual Similarity

– Including Monolingual alignment

  • Beyond IBM word alignment

– Synchronous CFGs

slide-3
SLIDE 3

Semantic T extual Similarity

Series of tasks at international workshop on semantic evaluations (SemEval), since 2012 http://alt.qcri.org/semeval2017/task1/

slide-4
SLIDE 4

What is Semantic T extual Similarity?

Semantic Similarity

يج جي يدجي دجايجدي دجكلادج يفحيحي سحفيسحي حيحي وغو ساجكاسجك جديييج يج ي فس ج حجوسحجفح . يحيح حسححك يج وي يدجي يدح حد سوحوح كحكسجحسكححفجحيج يو ديويوديحيشوحوسحفح شكلامو ىلاعت سبطوفوفوفحسوي طاسبنا رخا طسبنبته ،هوعد

Hnh whdun duuhj js ijd dj iow oijd oidj dk uwhd8 yh djhdhwuih jhu h uh jhihk, jdhhii, gdytysla, yuiyduinsjsh, iodpisomkncijsi. Kjhhuduh, dhdhhd hhduhd jjhuiq…Welcome to my world, trust me you will never be disappointed djijdp idiowdiw I iwfiow ifiwoufowi ioiowruo iyfi I wioiwf oid

  • i iwoiwy iowuouwr ujjd hihi iohoihiof uouo
  • u o oufois f uhdiy oioi oo ouiosufoisuf

iouiouf paidp paudoi uiu fh uhhioiof Shjkahsiunu iuhndhau dhdkhn hdhaud8 kdhikahdi dhjhd dhjh jiidh iihiiohio hihiahdiod Yo! Come over here, you will be pleasantly surprised idoasd io idjioio jidjduio iodio oi iiouio

  • iudoi ifuiosu fiuoi oiuiou oi io hiyuify 8iy ih iouoiu
  • u o ooihyiush iuh fhdfosiip upouosu oiu oi o
  • isyoisy oi sih oiiou ios oisuois uois oudiosu doi

soiddu os oso iio oioisosuo.

Добро пожаловать в мой мир, поверьте мне вы никогда не будете разочарованы

안녕하세요 제가 당신에게 전화했지만 아무 소용이있을려고 ... 당신이 시간을 즐기고 있었다 희망

Quantitative Graded Similarity Score Confidence Score Principled Interpretability, which semantic components/features led to results (hopefully will lead to us gaining a better understanding of semantics)

slide-5
SLIDE 5

Why Semantic T extual Similarity?

  • Most NLP applications need some notion of semantic

similarity to overcome brittleness and sparseness

  • Provides evaluation beyond surface text processing
  • A hub for semantic processing as a black box in

applications beyond NLP

  • Lends itself to an extrinsic evaluation of scattered

semantic components

slide-6
SLIDE 6

What is STS?

  • The graded process by which two snippets of text

(t1 and t2) are deemed equivalent semantically, i.e. bear the same meaning

  • An STS system will quantifiably inform us on how

similar t1 and t2 are, resulting in a similarity score

  • An STS system will tell us why t1 and t2 are similar

giving a nuanced interpretation of similarity based

  • n semantic components’ contributions
slide-7
SLIDE 7

What is STS?

  • Word similarity has been relatively well studied

– For example according to WN

cord smile 0.02 rooster voyage 0.04 noon string 0.04 fruit furnace 0.05 ... hill woodland 1.48 car journey 1.55 cemetery mound 1.69 ... cemetery graveyard 3.88 automobile car 3.92

More similar

slide-8
SLIDE 8

What is STS?

  • Fewer datasets for similarity between

sentences

A forest is a large area where trees grow close together. VS. The coast is an area of land that is next to the sea.

[0.25]

slide-9
SLIDE 9

What is STS?

  • Fewer datasets for similarity between

sentences

A forest is a large area where trees grow close together. VS. Woodland is land with a lot of trees.

[2.51]

slide-10
SLIDE 10

What is STS?

  • Fewer datasets for similarity between

sentences

Once there was a Czar who had three lovely daughters.

VS.

There were three beautiful girls, whose father was a Czar.

[4.3]

slide-11
SLIDE 11

Related tasks

  • Paraphrase detection

– Are 2 sentences equivalent in meaning?

  • Textual Entailment

– Does premise P entail hypothesis H?

  • STS provides graded similarity

judgments

slide-12
SLIDE 12
slide-13
SLIDE 13

Annotation: crowd-sourcing

slide-14
SLIDE 14

Annotation: crowd-sourcing

  • English annotation process

– Pairs annotated in batches of 20 – Annotators paid $1 per batch – 5 annotations per pair – Workers need to have Mturk master qualification

  • Defining gold standard judgments

– Median value of annotations – After filtering low quality annotators (<0.80 correlation with leave-on-out gold & <0.20 Kappa)

slide-15
SLIDE 15

Diverse data sources

slide-16
SLIDE 16

Evaluation: a shared task

Subset of 2016 results (Score: Pearson correlation)

slide-17
SLIDE 17

STS models from word to sentence vectors

  • Can we perform STS by comparing sentence vector

representation?

  • This approach works well for word level similarity
  • But can we capture the meaning of a sentence in a single

vector?

slide-18
SLIDE 18

“Composing” by averaging

g(“shots fired at residence”)

=

1 4 + + +

shots fired at residence

[Tai et al. 2015, Wieting et al. 2016]

slide-19
SLIDE 19

How can we induce word vectors for composition?

𝒚𝟐 𝒚𝟑 English paraphrases

[Wieting et al. 2016]

By our fellow members By our colleagues Bilingual sentence pairs

[Hermann & Blunsom 2014]

Thus in fact … by our fellow members As que podramos … nuestra colega disputado Bilingual phrase pairs by our fellow member de nuestra colega

slide-20
SLIDE 20

STS models: monolingual alignment

slide-21
SLIDE 21

One (of many) approaches to monolingual entailment

Idea

  • Exploit not only similarity

between words

  • But also similarity

between their contexts See Sultan et al. 2013 https://github.com/ma- sultan/

slide-22
SLIDE 22

2 topics today

  • P3 task: Semantic Textual Similarity

– Including Monolingual alignment

  • Beyond IBM word alignment

– Synchronous CFGs

slide-23
SLIDE 23

Aligning words & constituents

  • Alignment: mapping between spans of text in

lang1 and spans of text in lang2

– Sentences in document pairs – Words in sentence pairs – Syntactic constituents in sentence pairs

  • Today: 2 methods for aligning constituents

– Parse and match – biparse

slide-24
SLIDE 24

Parse & Match

slide-25
SLIDE 25

Parse(-Parse)-Match

  • Idea

– Align spans that are consistent with existing structure

  • Pros

– Builds on existing NLP tools

  • Cons

– Assume availability of lots of resources – Assume that representations can be matched

slide-26
SLIDE 26

Aligning words & constituents

2 methods for aligning constituents:

  • Parse and match

– assume existing parses and alignment

  • Biparse

– alignment = structure

slide-27
SLIDE 27

A “straw man” hypothesis: All languages have same grammar

slide-28
SLIDE 28

A “straw man” hypothesis: All languages have same grammar

slide-29
SLIDE 29

A “straw man” hypothesis: All languages have same grammar

slide-30
SLIDE 30

A “straw man” hypothesis: All languages have same grammar

slide-31
SLIDE 31

The biparsing hypothesis: All languages have nearly the same grammar

slide-32
SLIDE 32

The biparsing hypothesis: All languages have nearly the same grammar

slide-33
SLIDE 33

Example for the biparsing hypothesis: All languages have nearly the same grammar

slide-34
SLIDE 34

The biparsing hypothesis: All languages have nearly the same grammar

slide-35
SLIDE 35

Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center

The biparsing hypothesis: All languages have nearly the same grammar

slide-36
SLIDE 36

Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center

The biparsing hypothesis : All languages have nearly the same grammar

slide-37
SLIDE 37

Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center

The biparsing hypothesis : All languages have nearly the same grammar

slide-38
SLIDE 38

Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center

The biparsing hypothesis: All languages have nearly the same grammar

VP  [ VV PP ] VP   VV PP  ITG shorthand VP  VV PP , VV PP VP  VV PP , PP VV SDTG/SCFG notation VP  VV(1) PP(2) , VV(1) PP(2) VP  VV(1) PP(2) , PP(2) VV(1) Indexed SDTG/SCFG notation VP  VV PP ; 1 2 VP  VV PP ; 2 1 Permuted SDTG/SCFG

slide-39
SLIDE 39

Synchronous Context Free Grammars

  • Context free grammars (CFG)

– Common way of representing syntax in (monolingual) NLP

  • Synchronous context free grammars (SCFG)

– Generate pairs of strings – Align sentences by parsing them – Translate sentences by parsing them

  • Key algorithm: how to parse with SCFGs?
slide-40
SLIDE 40

SCFG trade off

  • Expressiveness

– SCFGs cannot represent all sentence pairs in all languages

  • Efficiency

– SCFGs let us view alignment as parsing & benefit from well-studied formalism

slide-41
SLIDE 41

Synchronous parsing cannot represent all sentence pairs

slide-42
SLIDE 42

Synchronous parsing cannot represent all sentence pairs

slide-43
SLIDE 43

Synchronous parsing cannot represent all sentence pairs

slide-44
SLIDE 44

A subclass of SCFGs: Inversion Transduction Grammars

  • ITGs are the subclass of SDTGs/SCFGs:

– with only straight and inverted transduction rules – with only transduction rules of rank < 2 – with only transduction rules of rank < 3

  • ITGs are context-free (like SCFGs).

equivalent

slide-45
SLIDE 45

For length-4 phrases (or frames), ITGs can express 22 out of 24 permutations!

slide-46
SLIDE 46

ITGs enable efficient DP algorithms

[Wu 1995]

e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6

slide-47
SLIDE 47

ITGs enable efficient DP algorithms

[Wu 1995]

e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6

slide-48
SLIDE 48

ITGs enable efficient DP algorithms

[Wu 1995]

e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6

slide-49
SLIDE 49

ITGs enable efficient DP algorithms

[Wu 1995]

e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6

slide-50
SLIDE 50

ITGs enable efficient DP algorithms

[Wu 1995]

e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6

slide-51
SLIDE 51

ITGs enable efficient DP algorithms

[Wu 1995]

e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6

slide-52
SLIDE 52

ITGs enable efficient DP algorithms

[Wu 1995]

e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6

slide-53
SLIDE 53

Biparsing with CKY

  • Given the following SCFG

A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A(1)N(2),N(2)A(1) S -> NP(1)VP(2), NP(1)VP(2)

  • Let’s parse a sentence pair

fat cats eat gatos gordos comen

Example by Matt Post (JHU)

slide-54
SLIDE 54

Biparsing with CKY

A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A(1)N(2),N(2)A(1) S -> NP(1)VP(2), NP(1)VP(2)

3 comen 2 gordos 1 gatos fat cats eats 1 2 3

Chart now enumerates pairs of spans

slide-55
SLIDE 55

Biparsing with CKY

A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A(1)N(2),N(2)A(1) S -> NP(1)VP(2), NP(1)VP(2)

3 comen 2 gordos 1 gatos fat cats eats 1 2 3 A ((1,1), (2,2)) N ((2,2), (1,1)) VP ((3,3), (3,3))

Apply lexical rules

slide-56
SLIDE 56

Biparsing with CKY

A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A(1)N(2),N(2)A(1) S -> NP(1)VP(2), NP(1)VP(2)

3 comen 2 gordos 1 gatos fat cats eats 1 2 3 A ((1,1), (2,2)) N ((2,2), (1,1)) NP ((1,2), (1,2)) VP ((3,3), (3,3))

For each block, apply straight & inverted rules

S ((3,3), (3,3))

slide-57
SLIDE 57

Biparsing with CKY

3 comen 2 gordos 1 gatos fat cats eats 1 2 3

O(GN3M3)

slide-58
SLIDE 58

Aligning words & constituents

2 different ways of looking at this problem:

  • parse-parse-match

– assume existing parses and alignment

  • biparse

– alignment = structure