Motivation Good translation preserves the meaning of the sentence. - PowerPoint PPT Presentation

Motivation ● Good translation preserves the meaning of the sentence. ● Neural MT learns to represent the sentence. ○ Is the representation “meaningful” in some sense?

Evaluating sentence representations ● Evaluation through classification. ● Evaluation through similarity. ● Evaluation using paraphrases. ● SentEval (Conneau et al., 2017) ○ prediction tasks for evaluating sentence embeddings ○ focus on semantics (recently, “linguistics” task added, too). ● HyTER paraphrases (Dreyer and Marcu, 2014)

Evaluation through similarity ● 7 similarity tasks: pairs of sentences + human judgement I think it probably depends on your money. It depends on your country. 0 Yes, you should mention your experience. Yes, you should make a resume 2 Hope this is what you are looking for. Is this the kind of thing you're looking for? 4 ○ with training set, sent. similarity predicted by regression, ○ without training set, cosine similarity used as sent. sim., ○ ultimately, the predicted sent. similarity is correlated with the golden truth. ● In sum, we report them as “AvgSim”.

Classification task 1. Remove some points from ? the clusters. 2. Train an LDA ? classifier with the remaining points. 3. Classify the removed points back. ?

Sequence-to-sequence with attention ● Bahdanau et al. (2014) ● α ij : weight of the j th encoder state for the i th decoder state ● no sentence embedding

Multi-head inner attention ● Liu et al. (2016), Li et al. (2016), Lin et al. (2017) ● α ij : weight of the j th encoder state for the i th column of M T ● concatenate columns of M T → sentence embedding ● linear projection of columns to control embedding size

Proposed NMT architectures ATTN - CTX ATTN - ATTN ( compound att.) decoder operates on entire decoder „selects“ components of embedding embedding

Evaluated NMT models ● model architectures: ○ FINAL , FINAL-CTX : no attention ○ AVGPOOL , MAXPOOL : pooling instead of attention ○ ATTN-CTX : inner attention, constant context vector ○ ATTN-ATTN : inner attention, decoder attention ○ TRF-ATTN-ATTN : Transformer with inner attention ● translation from English (to Czech or German), evaluating embeddings of English (source) sentences ○ en→cs: CzEng 1.7 (Bojar et al., 2016) ○ en→de: Multi30K (Elliott et al., 2016; Helcl and Libovický, 2017)

Sample Results – translation quality en→cs Manual Manual Model Heads BLEU (> other) (≥ other) — 22.2 50.9 93.8 „Bahdanau“ ATTN 8 18.4 42.5 88.6 ATTN-ATTN compound attention ATTN-ATTN 4 17.1 — — inner attention + 4 16.1 31.7 77.9 ATTN-CTX „Cho“ „Cho“ FINAL-CTX — 15.5 — — 1 14.8 27.3 71.7 ATTN-ATTN — 10.8 — — „Sutskever“ FINAL Selected models trained for translation from English to Czech. The embedding size is 1000 (except ATTN ).

Sample Results – translation quality en→cs Manual Manual Model Heads BLEU BLEU is (> other) (≥ other) consistent — 22.2 50.9 93.8 „Bahdanau“ ATTN 8 18.4 42.5 88.6 with human ATTN-ATTN compound attention ATTN-ATTN 4 17.1 — — evaluation. inner attention + 4 16.1 31.7 77.9 ATTN-CTX „Cho“ „Cho“ FINAL-CTX — 15.5 — — 1 14.8 27.3 71.7 ATTN-ATTN — 10.8 — — „Sutskever“ FINAL Selected models trained for translation from English to Czech. The embedding size is 1000 (except ATTN ).

Sample Results – translation quality en→cs Manual Manual Model Heads BLEU (> other) (≥ other) — 22.2 50.9 93.8 „Bahdanau“ ATTN Attention in 8 18.4 42.5 88.6 ATTN-ATTN compound attention the encoder ATTN-ATTN 4 17.1 — — inner attention + helps ATTN-CTX 4 16.1 31.7 77.9 „Cho“ translation „Cho“ FINAL-CTX — 15.5 — — quality. ATTN-ATTN 1 14.8 27.3 71.7 FINAL — 10.8 — — „Sutskever“ Selected models trained for translation from English to Czech. The embedding size is 1000 (except ATTN ).

Sample Results – translation quality en→cs Manual Manual Model Heads BLEU (> other) (≥ other) — 22.2 50.9 93.8 „Bahdanau“ ATTN 8 18.4 42.5 88.6 ATTN-ATTN compound attention ATTN-ATTN 4 17.1 — — More attention inner attention + ATTN-CTX 4 16.1 31.7 77.9 heads „Cho“ „Cho“ FINAL-CTX — 15.5 — — → better ATTN-ATTN 1 14.8 27.3 71.7 translation FINAL — 10.8 — — „Sutskever“ quality. Selected models trained for translation from English to Czech. The embedding size is 1000 (except ATTN ).

Sample Results – representation eval. en→cs Paraphrases Heads SentEval SentEval Model Size class. accuracy AvgAcc AvgSim (COCO) InferSent 4096 — 81.7 0.70 31.58 GloVe bag-of-words 300 — 75.8 0.59 34.28 FINAL-CTX (“Cho“) 1000 — 74.4 0.60 23.20 ATTN-ATTN 1000 1 73.4 0.54 21.54 ATTN-CTX 1000 4 72.2 0.45 14.60 ATTN-ATTN 1000 4 70.8 0.39 10.84 ATTN-ATTN 1000 8 70.0 0.36 10.24 Selected models trained for translation from English to Czech. InferSent and GloVe- BOW are trained on monolingual (English) data.

Sample Results – representation eval. en→cs Paraphrases Heads SentEval SentEval Model Size class. accuracy AvgAcc AvgSim (COCO) InferSent 4096 — 81.7 0.70 31.58 Baselines GloVe bag-of-words 300 — 75.8 0.59 34.28 are hard to FINAL-CTX (“Cho“) 1000 — 74.4 0.60 23.20 ATTN-ATTN 1000 1 73.4 0.54 21.54 beat. ATTN-CTX 1000 4 72.2 0.45 14.60 ATTN-ATTN 1000 4 70.8 0.39 10.84 ATTN-ATTN 1000 8 70.0 0.36 10.24 Selected models trained for translation from English to Czech. InferSent and GloVe- BOW are trained on monolingual (English) data.

Sample Results – representation eval. en→cs Paraphrases Heads SentEval SentEval Model Size class. accuracy AvgAcc AvgSim (COCO) InferSent 4096 — 81.7 0.70 31.58 GloVe bag-of-words 300 — 75.8 0.59 34.28 FINAL-CTX (“Cho“) 1000 — 74.4 0.60 23.20 Attention ATTN-ATTN 1000 1 73.4 0.54 21.54 harms the ATTN-CTX 1000 4 72.2 0.45 14.60 performance. ATTN-ATTN 1000 4 70.8 0.39 10.84 ATTN-ATTN 1000 8 70.0 0.36 10.24 Selected models trained for translation from English to Czech. InferSent and GloVe- BOW are trained on monolingual (English) data.

Sample Results – representation eval. en→cs Paraphrases Heads SentEval SentEval Model Size class. accuracy AvgAcc AvgSim (COCO) InferSent 4096 — 81.7 0.70 31.58 GloVe bag-of-words 300 — 75.8 0.59 34.28 FINAL-CTX (“Cho“) 1000 — 74.4 0.60 23.20 ATTN-ATTN 1000 1 73.4 0.54 21.54 More heads ATTN-CTX 1000 4 72.2 0.45 14.60 → worse ATTN-ATTN 1000 4 70.8 0.39 10.84 ATTN-ATTN 1000 8 70.0 0.36 10.24 results. Selected models trained for translation from English to Czech. InferSent and GloVe- BOW are trained on monolingual (English) data.

Full Results – correlations en→de en→cs BLEU vs. other metrics: −0.57 ± 0.31 (en→ cs ) −0.36 ± 0.29 (en→ de ) Pairwise average (except BLEU): 0.78 ± 0.32 (en→ cs ) 0.57 ± 0.23 (en→ de )

Full Results – correlations en→de excluding Transformer en→cs BLEU vs. other metrics: −0.57 ± 0.31 (en→ cs ) −0.54 ± 0.27 (en→ de ) Pairwise average (except BLEU): 0.78 ± 0.32 (en→ cs ) 0.62 ± 0.23 (en→ de )

Compound attention interpretation ATTN-ATTN en-cs model with 8 heads

Average attention weight by position inner attention weight relative position in encoder

Average attention weight by position Heads divide the inner attention sentence weight equidistantly, not based on syntax or semantics. relative position in encoder

Summary

Summary ● Proposed NMT architecture combining the benefit of attention and one $&!#* vector representing the whole sentence.

Summary ● Proposed NMT architecture combining the benefit of attention and one $&!#* vector representing the whole sentence. ● Evaluated the obtained sentence embeddings using a wide range of “semantic” tasks.

Summary ● Proposed NMT architecture combining the benefit of attention and one $&!#* vector representing the whole sentence. ● Evaluated the obtained sentence embeddings using a wide range of “semantic” tasks. ● The better the translation, the worse performance in “meaning” representation.

Summary ● Proposed NMT architecture combining the benefit of attention and one $&!#* vector representing the whole sentence. ● Evaluated the obtained sentence embeddings using a wide range of “semantic” tasks. ● The better the translation, the worse performance in “meaning” representation. ● Heads divide sentence equidistantly, not logically.

Motivation Good translation preserves the meaning of the sentence. - PowerPoint PPT Presentation

Motivation Good translation preserves the meaning of the sentence. Neural MT learns to represent the sentence. Is the representation meaningful in some sense? Evaluating sentence representations Evaluation through

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

TENTATIVE/ PROPOSED BUDGET Fiscal Year 20202021 Presented by: Dr. Ronald R. Heezen ,

29 29 th th May 2018 Introduction Welcome Regional way of working Focus on club sport

EAHPs current priorities PCWP/HCPWP joint meeting 20 September 2017 Who is EAHP?

ONLINE TOOLS FOR TEACHING CS AP CS Principles - Overview CT Practices Big Ideas Connecting

Coal Conversion to Products Serge Prineau World CTX www.WorldCTX.com 1 International Coal

GBCC Oral Presentation PROGNOSIS AND EFFECT OF ADJUVANT TREATMENT IN IN SMALL, NODE(-), HER2(+)

CLIQUES : Security for Dynamic Peer Groups Formation Member add Member leave Group fusion

Destroying a First-Year Seminar Program and Rebuilding From the Ground Up Kristi Kirk &

Motivation Good translation preserves the meaning of the sentence. - PowerPoint PPT Presentation

Motivation Good translation preserves the meaning of the sentence. Neural MT learns to represent the sentence. Is the representation meaningful in some sense? Evaluating sentence representations Evaluation through

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory &amp; practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

TENTATIVE/ PROPOSED BUDGET Fiscal Year 20202021 Presented by: Dr. Ronald R. Heezen ,

29 29 th th May 2018 Introduction Welcome Regional way of working Focus on club sport

EAHPs current priorities PCWP/HCPWP joint meeting 20 September 2017 Who is EAHP?

ONLINE TOOLS FOR TEACHING CS AP CS Principles - Overview CT Practices Big Ideas Connecting

Coal Conversion to Products Serge Prineau World CTX www.WorldCTX.com 1 International Coal

GBCC Oral Presentation PROGNOSIS AND EFFECT OF ADJUVANT TREATMENT IN IN SMALL, NODE(-), HER2(+)

CLIQUES : Security for Dynamic Peer Groups Formation Member add Member leave Group fusion

Destroying a First-Year Seminar Program and Rebuilding From the Ground Up Kristi Kirk &amp;

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

Destroying a First-Year Seminar Program and Rebuilding From the Ground Up Kristi Kirk &