Transformer Ablation Studies Simon Will Institute of Formal and - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Transformer Ablation Studies Simon Will Institute of Formal and Applied Linguistics Charles University Seminar: Statistical Machine Translation Instructor: Dr. Ondřej Bojar . . . . . . . . . . . . . . . . . . . . . . . . May 2019, 30

. . . . . . . . . . . . . . . . Structure The Transformer Ablation Concerns Feed Forward Layers Positional Embeddings . . . . . . . . . . . . . . . . . . . . . . . . Self-Attention Keys and Queries

. . . . . . . . . . . . . . . . . Idea Ott et al. 2018; Dai et al. 2019) part contributes . . . . . . . . . . . . . . . . . . . . . . . → Train similar models difgering in crucial points ▶ Transformer successful and many variations exist (e.g. ▶ Diffjcult to know what the essentials are and what each

. . . . . . . . . . . . . . . . . . Transformer (Vaswani et al. 2017) . . . . . . . . . . . . . . . . . . . . . . ▶ Encoder-Decoder Architecture based on attention ▶ No recurrence ▶ Constant in source and target “time” while training ▶ In inference, only constant in source “time” ▶ Better parallelizable than RNN-based network

. . . . . . . . . . . . . . . . . . Transformer Illustration . . . . . . . . . . . . . . . . . . . . . . Figure: Two-Layer Transformer (Image from Alammar 2018)

. . . . . . . . . . . . . . . . . . Ablation Concerns . . . . . . . . . . . . . . . . . . . . . . Figure: Areas of Concern for this Project

. . . . . . . . . . . . . . . . . Feed Forward Layers decoder not clear. → Three confjgurations: . . . . . . . . . . . . . . . . . . . . . . . ▶ Contribution of Feed Forward Layers in encoder and ▶ Is the attention enough? ▶ No encoder FF layer ▶ No decoder FF layer ▶ No decoder and no encoder FF layer

. . . . . . . . . . . . . . . . . . Ablation Concerns . . . . . . . . . . . . . . . . . . . . . . Figure: Areas of Concern for this Project

. . . . . . . . . . . . . . . Positional Embeddings → Add information via explicit positional embeddings et al. 2017) pos key dimensionality pos key dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . ▶ No recurrence → no information about order of tokens ▶ Added to the word embedding vector ▶ Two types: ▶ Learned embeddings of absolute position (e.g. Gehring ▶ Sinusoidal embeddings (used in Vaswani et al. 2017) ▶ ( ) PE ( pos , 2 i ) = sin 2 i 10000 ( ) PE ( pos , 2 i +1) = cos 2 i 10000

. . . . . . . . . . . . . . . . . . Positional Embeddings Illustrated Figure: Illustrated positional embeddings for 20 token sentence and . . . . . . . . . . . . . . . . . . . . . . key dimensionality 512 (taken from Alammar 2018)

. . . . . . . . . . . . . . . . . Modifjcations pos key dimensionality . . . . . . . . . . . . . . . . . . . . . . . ▶ Vary “rainbow stretch” by introducing stretching factor α ( ) PE ( pos , 2 i ) = sin 2 i α 10000 ▶ Expectations: ▶ α too low: No positional information ▶ α too high: Word embedding information destroyed ▶ α other than 1 optimal

. . . . . . . . . . . . . . . . . Self-Attention Keys and Queries d k generation with difgerent matrices . . . . . . . . . . . . . . . . . . . . . . . ▶ ( ( QW Q )( KW K ) T ) Attention ( Q , K , V ) = softmax ( VW V ) √ ▶ In encoder, source words are used for key and query ▶ Modifjcation: Use same matrix for both

. . . . . . . . . . . . . . . . . . Experiment Design . . . . . . . . . . . . . . . . . . . . . . ▶ Do all basic confjgurations ▶ Combine well-performing modifjcations ▶ How to compare? ▶ BLEU score on test set at best dev set performance ▶ Whole learning curves (similar to Popel and Bojar 2018)

. . . . . . . . . . . . . . . . . . Dataset . . . . . . . . . . . . . . . . . . . . . . performance ▶ Parallel image captions (Elliott et al. 2016) ▶ https://github.com/multi30k/dataset ▶ Short sentences ▶ Rather small (30k sentences) ▶ Good because fjtting takes less than a day ▶ Bad because dev and test performance is far below train

. . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . ▶ Experiments still pending ▶ Expecting to see mainly negative results ▶ Hopefully some positive ones

. . . . . . . . . . . . . . . . . Nice translation by the system “eine junge frau hält eine blume , um sich an der blume zu halten .” “a young woman is holding a fmower in order to hold on to the . . . . . . . . . . . . . . . . . . . . . . . fmower .”

. . . . . . . . . . . . . . . . References I Dai, Zihang et al. (2019). “Transformer-XL: Attentive Language Models Elliott, Desmond et al. (2016). “Multi30K: Multilingual English-German and Language . Berlin, Germany: Association for Computational Gehring, Jonas et al. (2017). “Convolutional Sequence to Sequence Ott, Myle et al. (2018). “Scaling Neural Machine Translation”. In: CoRR . . . . . . . . . . . . . . . . . . . . . . . . Alammar, Jay (2018). The Illustrated Transformer . url : http: //jalammar.github.io/illustrated-transformer/ . Beyond a Fixed-Length Context”. In: CoRR abs/1901.02860. arXiv: 1901.02860 . url : http://arxiv.org/abs/1901.02860 . Image Descriptions”. In: Proceedings of the 5th Workshop on Vision Linguistics, pp. 70–74. doi : 10.18653/v1/W16-3210 . url : http://www.aclweb.org/anthology/W16-3210 . Learning”. In: CoRR abs/1705.03122. arXiv: 1705.03122 . url : http://arxiv.org/abs/1705.03122 . abs/1806.00187. arXiv: 1806.00187 . url : http://arxiv.org/abs/1806.00187 .

. . . . . . . . . . . . . . . . . References II Popel, Martin and Ondrej Bojar (2018). “Training Tips for the Vaswani, Ashish et al. (2017). “Attention Is All You Need”. In: CoRR . . . . . . . . . . . . . . . . . . . . . . . Transformer Model”. In: CoRR abs/1804.00247. arXiv: 1804.00247 . url : http://arxiv.org/abs/1804.00247 . abs/1706.03762. arXiv: 1706.03762 . url : http://arxiv.org/abs/1706.03762 .

Transformer Ablation Studies Simon Will Institute of Formal and - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Transformer Ablation Studies Simon Will Institute of Formal and Applied Linguistics Charles University Seminar: Statistical Machine Translation Instructor: Dr. Ondej Bojar . . . . . . .

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

L ABLATION DANS LES ABLATION DANS LES L ARYTHMIES ARYTHMIES SUPRAVENTRICULAIRES

West Cape Transformer Replacement Purpose Purchase a 50 MVA Load Tap Changer power

IEEE Transformer Committee PC57.167 Distribution Transformer Monitoring - User Mark Scarborough

Transformer Program Trevor Foster Electrical Engineering Manager Calpine Transformer Program 1

Electronic DC Transformer Pavol Bauer Learning objectives What is an electronic DC

Microwave Ablation with Tumor Permittivity Microwave Ablation with Tumor Permittivity Feedback

THERMAL ABLATION TATO SYSTEM Laura Sarti Product Specialist 1 Definition Thermal Ablation (TA)

NO CONFLICT OF Associate Chief, Cardiac Electrophysiology Director, Cardiac Electrophysiology

Magnetics Design 3.1 Important magnetic equations 3.2 Magnetic losses 3.3 Transformer 3.3.1

Microwave Ablation and IRE Damian E. Dupuy, M.D., FACR Professor Diagnostic Imaging Brown

Laser Ablation Forensic Tool for the End User Edward Chip Pollock Sacramento County

Catheter Ablation of AF 35% Expected Outcomes with Cryoballoon vs. RF 3 4 Jason G. Andrade

Disclosure Ablation and Devices for Atrial Fibrillation SentreHeart/Atricure, Inc Should all

Utility Task Force Corrosion Pictures for IEEE C57.12.24 Fall 2018 & Submersible Transformer

Matrix Transformer Building Blocks for High Frequency Applications James Lau CWS (Coil

Fibs and Flutters: The Heart of the Matter Anita Ralstin, CNP By the Numbers Atrial

Permanent Tachyc cardi ardias as Permanent Tachy J J. . Janou Janou ek ek Klinik f.

Investor Presentation 4Q 2018 Stereotaxis at a Glance Global Leader in Endovascular Robotics

Laser Rework Capabilities Utilizing different laser wavelengths allow for selective removal

BRINGING ADVANCED ENERGY TO ENDOSCOPY Investor Open Event 17 October 2018 2 DISCLAIMER

The Simple, Effective Treatment of MEA MEA is a highly effective, minimally invasive treatment

Indicatie voor ablatie bij voorkamerfibrillatie Andrea Sarkozy Cardiologie Universitair

Modeling of Laser Ablation of LiF - Influence of Defects H. M Urbassek, Y. Cherednikov Physics

Transformer Ablation Studies Simon Will Institute of Formal and - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Transformer Ablation Studies Simon Will Institute of Formal and Applied Linguistics Charles University Seminar: Statistical Machine Translation Instructor: Dr. Ondej Bojar . . . . . . .

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

L ABLATION DANS LES ABLATION DANS LES L ARYTHMIES ARYTHMIES SUPRAVENTRICULAIRES

West Cape Transformer Replacement Purpose Purchase a 50 MVA Load Tap Changer power

IEEE Transformer Committee PC57.167 Distribution Transformer Monitoring - User Mark Scarborough

Transformer Program Trevor Foster Electrical Engineering Manager Calpine Transformer Program 1

Electronic DC Transformer Pavol Bauer Learning objectives What is an electronic DC

Microwave Ablation with Tumor Permittivity Microwave Ablation with Tumor Permittivity Feedback

THERMAL ABLATION TATO SYSTEM Laura Sarti Product Specialist 1 Definition Thermal Ablation (TA)

NO CONFLICT OF Associate Chief, Cardiac Electrophysiology Director, Cardiac Electrophysiology

Magnetics Design 3.1 Important magnetic equations 3.2 Magnetic losses 3.3 Transformer 3.3.1

Microwave Ablation and IRE Damian E. Dupuy, M.D., FACR Professor Diagnostic Imaging Brown

Laser Ablation Forensic Tool for the End User Edward Chip Pollock Sacramento County

Catheter Ablation of AF 35% Expected Outcomes with Cryoballoon vs. RF 3 4 Jason G. Andrade

Disclosure Ablation and Devices for Atrial Fibrillation SentreHeart/Atricure, Inc Should all

Utility Task Force Corrosion Pictures for IEEE C57.12.24 Fall 2018 &amp; Submersible Transformer

Matrix Transformer Building Blocks for High Frequency Applications James Lau CWS (Coil

Fibs and Flutters: The Heart of the Matter Anita Ralstin, CNP By the Numbers Atrial

Permanent Tachyc cardi ardias as Permanent Tachy J J. . Janou Janou ek ek Klinik f.

Investor Presentation 4Q 2018 Stereotaxis at a Glance Global Leader in Endovascular Robotics

Laser Rework Capabilities Utilizing different laser wavelengths allow for selective removal

BRINGING ADVANCED ENERGY TO ENDOSCOPY Investor Open Event 17 October 2018 2 DISCLAIMER

The Simple, Effective Treatment of MEA MEA is a highly effective, minimally invasive treatment

Indicatie voor ablatie bij voorkamerfibrillatie Andrea Sarkozy Cardiologie Universitair

Modeling of Laser Ablation of LiF - Influence of Defects H. M Urbassek, Y. Cherednikov Physics

Utility Task Force Corrosion Pictures for IEEE C57.12.24 Fall 2018 & Submersible Transformer