Cr Cros oss-lin lingual al lan languag age mod model pr - PowerPoint PPT Presentation

Nov 01, 2023 •285 likes •410 views

Cr Cros oss-lin lingual al lan languag age mod model pr pretraini ning ng Alexis Conneau and Guillaume Lample Facebook AI Research 1 Why learning cross-lingual representations? 1 2 4 3 This is great. Cest super. Das ist toll.

Cr Cros oss-lin lingual al lan languag age mod model pr pretraini ning ng Alexis Conneau and Guillaume Lample Facebook AI Research 1
Why learning cross-lingual representations? 1 2 4 3 This is great. C’est super. Das ist toll. 2
Cross-lingual language models 3
Mult. Masked Language Modeling (MLM) Similar to BERT, we pretrain a Transformer model with MLM but in many languages: Multilingual Masked language modeling pretraining .. multilingual representations emerge from a single MLM trained on many languages. Devlin et al. – BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding (+ mBERT) 4
Translation Language Modeling (TLM) Multilingual MLM is unsupervised, but we leverage parallel data with TLM: Translation language modeling (TLM) pretraining .. to encourage the model to leverage cross-lingual context when making predictions. 5
Results on XLU benchmarks 6
Results on Cross-lingual Classification (XNLI) Average XNLI accuracy on the 15 languages of XNLI The pretrained encoder is fine-tuned on the English for zero-shot cross-lingual classification XNLI(*) training data and then tested on 15 languages XNLI baseline 65.6 mBERT 66.3 XLM LASER 70.2 XLM (MLM) 71.5 XLM (MLM+TLM) 75.1 60 64 68 72 76 Average XNLI accuracy over 15 languages (*) Conneau et al. – XNLI: Evaluating Cross-lingual Sentence Representations (EMNLP 2018) 7
Results on Unsupervised Machine Translation Initialization is key in unsupervised MT to bootstrap the iterative BT process Embedding layer initialization Full Transformer model initialization is essential for neural unsupervised MT (*) significantly improves performance (+7 BLEU) Embeddings pretrained 27.3 Full model pretrained (CLM) 30.5 Full model pretrained (MLM) 34.3 Supervised 2016 SOTA (Edinburgh) 36.2 20 24 28 32 36 40 BLEU (*) Lample et al. – Phrase-based and neural unsupervised machine translation (EMNLP 2018) 8
Results on Supervised Machine Translation We also show the importance of pretraining for generation • Pretraining both the encoder and decoder improves BLEU score No pretraining • MLM better than LM pretraining Full model pretrained (CLM) • Back-translation + pretraining leads to the best BLEU score Full model pretrained (MLM) 20 24 28 32 36 40 • Pretraining is more important without back-translation with back-translation when supervised data is small 9
Conclusion • Cross-lingual language model pretraining is very effective for XLU • New state of the art for cross-lingual classification on XNLI • Reduces the gap between unsupervised and supervised MT • Recent developments have improved XLM/mBERT models 10
Thank you! Code and models available at github.com/facebookresearch/XLM Lample & Conneau – Cross-lingual Language Model Pretraining (NeurIPS 2019) 11

Recommend

Modular Arithmetic Addition and Subtraction Modulus 9 0 mod 9 = 0 9 mod 9 = 0 1 mod 9 = 1 10

Modular Arithmetic Addition and Subtraction Modulus 9 0 mod 9 = 0 9 mod 9 = 0 1 mod 9 = 1 10 mod 9 = 1 0 2 mod 9 = 2 11 mod 9 = 2 8 1 3 mod 9 = 3 12 mod 9 = 3 7 2 4 mod 9 = 4 13 mod 9 = 4 5 mod 9 = 5 14 mod 9 = 5 6 3 6 mod 9 =

212 views • 8 slides

mod m y x x y 1 (mod m ) gcd( x , m ) = 1 3 mod 5 2 3 2 6 1 (mod 5) 10

mod m y x x y 1 (mod m ) gcd( x , m ) = 1 3 mod 5 2 3 2 6 1 (mod 5) 10 mod 12 10 x 1 (mod 12) mod m x , y a ax + my = 1 1 ax y mod m ax 1 (mod m ) x a 3 mod 5 3 x + 5 y = 1 1 3(2) + 5(1) = 1 3

366 views • 8 slides

2014/07/10 1 ZDA One Stop Shop Department Topics OSS Background OSS Our Services OSS

2014/07/10 1 ZDA One Stop Shop Department Topics OSS Background OSS Our Services OSS - One Desk OSS Roll Out 2014/07/10 ZDA One Stop Shop Department 2 OSS Background BACKGROUND Concept of establishing One Stop Shops for

398 views • 20 slides

cse 311: foundations of computing Spring 2015 Lecture 13: Primes, GCDs, modular inverses

cse 311: foundations of computing Spring 2015 Lecture 13: Primes, GCDs, modular inverses review: repeated squaring Since a mod m a (mod m) for any a we have a 2 mod m = (a mod m) 2 mod m and a 4 mod m = (a 2 mod m) 2 mod m

485 views • 21 slides

CROS SPOONER, RICHARD WAKELIN, ERIN GARNETT Beef + Lamb New Zealand Cros Spooner is Chief

CROS SPOONER, RICHARD WAKELIN, ERIN GARNETT Beef + Lamb New Zealand Cros Spooner is Chief Operating Offjcer, with responsibilities for the economic service, information technology and the fjnancial and strategic functions. Richard Wakelin is

293 views • 28 slides

Quadratic Residues and Reciprocity Is 3 congruent to the square of some number modulo 7 ?

Quadratic Residues and Reciprocity Is 3 congruent to the square of some number modulo 7 ? 0 2 0 (mod 7), 1 2 1 (mod 7), 2 2 4 (mod 7), 3 2 2 (mod 7), 4 2 2 (mod 7), 5 2 4 (mod 7), 6 2 1 (mod 7) - No Does

607 views • 22 slides

RfG 1 Year Implementation : Timeline National Parameter identification Mod 1 Structure /

RfG 1 Year Implementation : Timeline National Parameter identification Mod 1 Structure / Banding Mod 2 Compliance Mod 3 General Mod 4 Fault Ride Through Mod 5 Voltage and Reactive Power Mod 6 Frequency 3/11/14 7/09/15

471 views • 5 slides

C-Mod Schedule and Capabilities C-Mod Schedule and Capabilities Jim Irby MIT-PSFC Ideas Forum

C-Mod Schedule and Capabilities C-Mod Schedule and Capabilities Jim Irby MIT-PSFC Ideas Forum 2011 Outline C-Mod Operations C-Mod Run Week Operations Schedule C-Mod Parameters Facility/Diagnostics Data

479 views • 8 slides

TF Database Analysis Luiz Cheim and Lan Lin ABB Inc. Luiz Cheim, Lan Lin - IEEE Fall Meeting -

IEEE C57.104 Revision (Nov 2011) TF Database Analysis Luiz Cheim and Lan Lin ABB Inc. Luiz Cheim, Lan Lin - IEEE Fall Meeting - Milwaukee 10/23/2012 Building the Database + HQ = 521,700 records Transformers Age Groups Transformers Power

1.9k views • 72 slides

LibreOffice oss-fuzz, crashtesting, coverity Overview Oss-Fuzz Crashtesting Coverity

LibreOffice oss-fuzz, crashtesting, coverity Overview Oss-Fuzz Crashtesting Coverity Oss-Fuzz Overview Continuous Fuzzing of our import filters Thanks to Google we get to use their infrastructure and resources

542 views • 26 slides

PNDA.io: when big data and OSS collide [Build Slide] Simplified OSS / BSS Stack Bills and

PNDA.io: when big data and OSS collide [Build Slide] Simplified OSS / BSS Stack Bills and Order Customer Reports Order Billing and BSS Mgmt Reporting OSS analytics is responsible for Orchestration is responsible collecting data from

490 views • 20 slides

Cros oss-Bor Border C Connected Cit itie ies SMART Borders / Cross-Border Connected Cities

ARI-SON Council Meeting Welcome Reception Cros oss-Bor Border C Connected Cit itie ies SMART Borders / Cross-Border Connected Cities Steve Hamilton, Senior Manager Deloitte Advisory August 23, 2017 THE US-MEXI CO CEO DI ALOGUE |

429 views • 14 slides

Acquis cquisit ition ion Ref efor orm Acr cros oss the he Go Gover ernment nment Jon

Acquis cquisit ition ion Ref efor orm Acr cros oss the he Go Gover ernment nment Jon Etherton Senior Fellow for Acquisition Reform National Defense Industrial Association 1" 1" The Environment Why now? Downward

430 views • 9 slides

PR PROPOS OPOSED ED B BAK AKAS ASSI D DEEP S EEP SEA POR EA PORT T CROS OSS R RIVER

PR PROPOS OPOSED ED B BAK AKAS ASSI D DEEP S EEP SEA POR EA PORT T CROS OSS R RIVER ER S STATE TE NIGER ERIA PR PROPOS OPOSED ED B BAK AKAS ASSI POR PORT OU T OUTL TLAY 2 BACKGROU OUND Greenfield 16.0m (draft)deep

174 views • 14 slides

SOU OUTH TH AFR FRIC ICAN AN RED ED CROS OSS S SOC OCIE IETY TY (SARCS) RCS) Disaste

SOU OUTH TH AFR FRIC ICAN AN RED ED CROS OSS S SOC OCIE IETY TY (SARCS) RCS) Disaste ster r Risk Red eduction ction Summit mmit Date: e: 1-2 Feb ebrua uary ry 2018 8 Polokw okwane ane Prese sented ted by; Levona na

238 views • 20 slides

A Sur urplus Base ased Fr Fram amework for or Cros oss- Bor order Elec Electricit ity

A Sur urplus Base ased Fr Fram amework for or Cros oss- Bor order Elec Electricit ity Trad ade in n Sou outh Ameri erica Claudio A. Agostini 1 Andrs M. Guzmn 2 Shahriyar Nasirov 1 Carlos Silva 1 16th IAEE European Conference

397 views • 21 slides

Laser Cutting for Starters 6.S063 Engineering Interaction Technologies Prof. Stefanie Mueller |

Laser Cutting for Starters 6.S063 Engineering Interaction Technologies Prof. Stefanie Mueller | MIT CSAIL | HCI Engineering Group laser cutter benefits:: fast (good for design iteration) easy to learn and get started! how can I laser cut

870 views • 53 slides

Polycides, Salicides and Metals Gates Prof. Krishna Saraswat Department of Electrical

Polycides, Salicides and Metals Gates Prof. Krishna Saraswat Department of Electrical Engineering Stanford University Stanford, CA 94305 saraswat@stanford.edu Stanford University 1 Saraswat / EE311 / Polycides, .. MOS Gate Electrode

734 views • 24 slides

Technology Mial Warren VP of Technology October 22, 2019 Outline Introduction to ADAS and

Navigating Automotive LIDAR Technology Mial Warren VP of Technology October 22, 2019 Outline Introduction to ADAS and LIDAR for automotive use Brief history of LIDAR for autonomous driving Why LIDAR? LIDAR requirements

604 views • 29 slides

Design and Implementation of Equipment for Enhanced Safeguards of a Plutonium Storage in a

Design and Implementation of Equipment for Enhanced Safeguards of a Plutonium Storage in a Reprocessing Plant PuO 2 STORE PROJECT www.jrc.ec.europa.eu The Unattended Combined Measurement System Serving society Stimulating innovation

217 views • 10 slides

4.1 3D Scanning Hao Li http://cs599.hao-li.com 1 Administrative Exercise 2: next tuesday

Spring 2015 CSCI 599: Digital Geometry Processing 4.1 3D Scanning Hao Li http://cs599.hao-li.com 1 Administrative Exercise 2: next tuesday after surface registration 2 2D Imaging Pipeline 2D capture 2D processing/editing 2D printing 3

1.01k views • 72 slides

Prolific Pair Production in Laser Beams John Kirk Max-Planck-Institut fr Kernphysik

Introduction Method Results Summary/Outlook Prolific Pair Production in Laser Beams John Kirk Max-Planck-Institut fr Kernphysik Heidelberg, Germany Collaborators: Tony Bell (University of Oxford/CLF), Ioanna Arka (MPIK) cole

467 views • 31 slides

States of Polarization Linear Polarization = 8 = =

Experiment 2 Linear and Circular States of Polarization Linear Polarization = 8 = = = 8 8 8 = = = = 8 8 8 8 =

257 views • 15 slides

Ontologies and Semantic Networks Sven Koenig, USC Russell and Norvig, 3 rd Edition, Section

12/18/2019 Ontologies and Semantic Networks Sven Koenig, USC Russell and Norvig, 3 rd Edition, Section 12.5.1 These slides are new and can contain mistakes and typos. Please report them to Sven (skoenig@usc.edu). 1 Ontology Ontology = a

255 views • 11 slides