Semantic T extual Similarity & more on Alignment
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
Semantic T extual Similarity & more on Alignment CMSC 723 / - - PowerPoint PPT Presentation
Semantic T extual Similarity & more on Alignment CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu 2 topics today P3 task: Semantic Textual Similarity Including Monolingual alignment Beyond IBM word alignment
CMSC 723 / LING 723 / INST 725 MARINE CARPUAT
marine@cs.umd.edu
Series of tasks at international workshop on semantic evaluations (SemEval), since 2012 http://alt.qcri.org/semeval2017/task1/
يج جي يدجي دجايجدي دجكلادج يفحيحي سحفيسحي حيحي وغو ساجكاسجك جديييج يج ي فس ج حجوسحجفح . يحيح حسححك يج وي يدجي يدح حد سوحوح كحكسجحسكححفجحيج يو ديويوديحيشوحوسحفح شكلامو ىلاعت سبطوفوفوفحسوي طاسبنا رخا طسبنبته ،هوعد
Hnh whdun duuhj js ijd dj iow oijd oidj dk uwhd8 yh djhdhwuih jhu h uh jhihk, jdhhii, gdytysla, yuiyduinsjsh, iodpisomkncijsi. Kjhhuduh, dhdhhd hhduhd jjhuiq…Welcome to my world, trust me you will never be disappointed djijdp idiowdiw I iwfiow ifiwoufowi ioiowruo iyfi I wioiwf oid
iouiouf paidp paudoi uiu fh uhhioiof Shjkahsiunu iuhndhau dhdkhn hdhaud8 kdhikahdi dhjhd dhjh jiidh iihiiohio hihiahdiod Yo! Come over here, you will be pleasantly surprised idoasd io idjioio jidjduio iodio oi iiouio
soiddu os oso iio oioisosuo.
Добро пожаловать в мой мир, поверьте мне вы никогда не будете разочарованы
안녕하세요 제가 당신에게 전화했지만 아무 소용이있을려고 ... 당신이 시간을 즐기고 있었다 희망
Quantitative Graded Similarity Score Confidence Score Principled Interpretability, which semantic components/features led to results (hopefully will lead to us gaining a better understanding of semantics)
similarity to overcome brittleness and sparseness
applications beyond NLP
semantic components
– For example according to WN
cord smile 0.02 rooster voyage 0.04 noon string 0.04 fruit furnace 0.05 ... hill woodland 1.48 car journey 1.55 cemetery mound 1.69 ... cemetery graveyard 3.88 automobile car 3.92
More similar
– Pairs annotated in batches of 20 – Annotators paid $1 per batch – 5 annotations per pair – Workers need to have Mturk master qualification
– Median value of annotations – After filtering low quality annotators (<0.80 correlation with leave-on-out gold & <0.20 Kappa)
Subset of 2016 results (Score: Pearson correlation)
representation?
vector?
[Tai et al. 2015, Wieting et al. 2016]
𝒚𝟐 𝒚𝟑 English paraphrases
[Wieting et al. 2016]
By our fellow members By our colleagues Bilingual sentence pairs
[Hermann & Blunsom 2014]
Thus in fact … by our fellow members As que podramos … nuestra colega disputado Bilingual phrase pairs by our fellow member de nuestra colega
Idea
between words
between their contexts See Sultan et al. 2013 https://github.com/ma- sultan/
– Sentences in document pairs – Words in sentence pairs – Syntactic constituents in sentence pairs
– Parse and match – biparse
Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center
Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center
Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center
Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center
VP [ VV PP ] VP VV PP ITG shorthand VP VV PP , VV PP VP VV PP , PP VV SDTG/SCFG notation VP VV(1) PP(2) , VV(1) PP(2) VP VV(1) PP(2) , PP(2) VV(1) Indexed SDTG/SCFG notation VP VV PP ; 1 2 VP VV PP ; 2 1 Permuted SDTG/SCFG
– Common way of representing syntax in (monolingual) NLP
– Generate pairs of strings – Align sentences by parsing them – Translate sentences by parsing them
– with only straight and inverted transduction rules – with only transduction rules of rank < 2 – with only transduction rules of rank < 3
equivalent
For length-4 phrases (or frames), ITGs can express 22 out of 24 permutations!
ITGs enable efficient DP algorithms
[Wu 1995]
e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6
ITGs enable efficient DP algorithms
[Wu 1995]
e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6
ITGs enable efficient DP algorithms
[Wu 1995]
e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6
ITGs enable efficient DP algorithms
[Wu 1995]
e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6
ITGs enable efficient DP algorithms
[Wu 1995]
e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6
ITGs enable efficient DP algorithms
[Wu 1995]
e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6
ITGs enable efficient DP algorithms
[Wu 1995]
e0 e1 e2 e3 e4 e5 e6 e7 c0 c1 c2 c3 c4 c5 c6
A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A(1)N(2),N(2)A(1) S -> NP(1)VP(2), NP(1)VP(2)
fat cats eat gatos gordos comen
Example by Matt Post (JHU)
A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A(1)N(2),N(2)A(1) S -> NP(1)VP(2), NP(1)VP(2)
3 comen 2 gordos 1 gatos fat cats eats 1 2 3
A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A(1)N(2),N(2)A(1) S -> NP(1)VP(2), NP(1)VP(2)
3 comen 2 gordos 1 gatos fat cats eats 1 2 3 A ((1,1), (2,2)) N ((2,2), (1,1)) VP ((3,3), (3,3))
A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A(1)N(2),N(2)A(1) S -> NP(1)VP(2), NP(1)VP(2)
3 comen 2 gordos 1 gatos fat cats eats 1 2 3 A ((1,1), (2,2)) N ((2,2), (1,1)) NP ((1,2), (1,2)) VP ((3,3), (3,3))
S ((3,3), (3,3))
3 comen 2 gordos 1 gatos fat cats eats 1 2 3