Universal transformers Matus Zilinec SZI, November 29, 2018 - PowerPoint PPT Presentation

Sep 18, 2022 •141 likes •300 views

Universal transformers Matus Zilinec SZI, November 29, 2018 Motivation What do we want? Given a sequence x of inputs ( x 1 , x 2 , ..., x n ), predict a sequence y of outputs ( y 1 , y 2 , ..., y n ). Why would we want that? machine

Universal transformers Matus Zilinec SZI, November 29, 2018
Motivation What do we want? Given a sequence x of inputs ( x 1 , x 2 , ..., x n ), predict a sequence y of outputs ( y 1 , y 2 , ..., y n ′ ). Why would we want that? ◮ machine translation ◮ video captioning ◮ speech recognition ◮ generating music ◮ talking robots ◮ working with symbols (math)
Motivation How do we do it? ◮ Let’s use a neural network! ◮ It kind of works, but is it ideal? ◮ Problem: dependencies in data Recurrent neural networks!
Recurrent neural nets ◮ RNN allows loops in the network ◮ each timestep t , read an item x ( t ) ◮ update internal state h ( t ) h ( t ) = f ( h ( t − 1) , x ( t ) , θ )
Encoder-decoder ◮ two RNNs, encoder and decoder ◮ compress meaning of x into a ”thought” vector ◮ use the vector to generate y ◮ Problems: information loss, vanishing gradient, parallelization
Attention ◮ every item of y depends on different items of x ◮ try to learn dependencies and focus attention Dot-product attention query q , keys k , values v w i = k i · q attention ( q ) = � i w i · v i
Multi-head self-attention ◮ multi-head: focus on multiple places at once ◮ self-: update representation of x instead of comparing with y ◮ for each head, use different features from x , thus the weights Attention ( Q , K , V ) = softmax ( QK T √ d k ) V head i = Attention ( HW Q i , HW K i , HW V i ) MultiHeadSelfAttention ( H ) = Concat ( head 1 , ..., head k ) W O
Transformer ◮ encoder-decoder without recurrence, just attention ◮ generates N intermediate representations of x and y ◮ all timesteps can be processed at the same time
Universal transformer ◮ generalization of transformer ◮ recurrent in depth, not width ◮ O (# steps ) << O ( inputlength )
Adaptive computation time (ACT) ◮ imagine simplifying equations ◮ different inputs - different difficulty ◮ dynamically adjust number of computational steps for each symbol in the input ◮ When should we stop? ◮ predict ”pondering value” - probability of stopping ◮ stop computation for symbol when ( P shouldstop > threshold ) ∨ ( step > N ) ◮ pondering value trained jointly with the transformer
Results bAbI Question-Answering ◮ read a story and answer questions about the characters Average error and number of failed tasks ( > 5% error) out of 20 (in parentheses; lower is better in both cases)
Ponder time
Results Learning to Execute ◮ just like progtest ◮ algorithmic, memorization and evaluation tasks
Results Machine translation ◮ UT with ACT not mentioned ◮ seems to perform worse so far
Another talk about Universal Transformer: Let’s Talk ML, TH:A-1347, 11:00

Recommend

Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety

Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety

1.1k views • 51 slides

Status of CIGRE JWG A2/B4-28 HVDC Converter Transformers HVDC Converter Transformers Ugo Piovan

IEEE Transformers Committee October 2008 Status of CIGRE JWG A2/B4-28 HVDC Converter Transformers HVDC Converter Transformers Ugo Piovan 1 IEEE Transformers Committee October 2008 Guidelines for conducting design reviews for HVDC converter

484 views • 8 slides

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas and Fran cois Fleuret ICML, July 2020 https://linear-transformers.com/ Funded by Transformers are performant

1.53k views • 51 slides

QUALITY PLAN POWER TRANSFORMERS MANUFACTURING CUSTOMER: SIDOR C.A. PROJECT: POWER TRANSFORMERS

QUALITY PLAN POWER TRANSFORMERS MANUFACTURING CUSTOMER: SIDOR C.A. PROJECT: POWER TRANSFORMERS FOR R5 SUBSTATION ENLARGEMENT Job Quality Assurance Plan ABB Transformers Pereira STRATEGIC MANAGEMENT C LOGISTIC U C U S S T T

854 views • 28 slides

Task Force on Partial Discharge Testing of Class I Power Transformers IEEE/PES Transformers

1 Task Force on Partial Discharge Testing of Class I Power Transformers IEEE/PES Transformers Committee Fall 2019 Columbus, Ohio 2 Agenda 1. Welcome and call to order 2. Approval of agenda 3. Membership and quorum 4. Patent

609 views • 15 slides

Adding Aerosol Cans to the Universal Waste Regulations Where does Universal Waste fit? HAZARDOUS

Adding Aerosol Cans to the Universal Waste Regulations Where does Universal Waste fit? HAZARDOUS WASTES SOLID WASTES UNIVERSAL WASTES - Universal waste categories must be hazardous waste before they can be designated as universal wastes -

810 views • 40 slides

UNIVERSAL ROBOTS RUC 2018 Universal Robots - Evolving the future UNIVERSAL ROBOTS SET THE

UNIVERSAL ROBOTS RUC 2018 Universal Robots - Evolving the future UNIVERSAL ROBOTS SET THE STANDARD Fast Set-up Flexible Deployment Easy Programming Safe UNIVERSAL ROBOTS SET THE STANDARD Fast setup Fast Set-up Flexible

585 views • 24 slides

Tech Day: Universal Acceptance Mark van rek Universal Acceptance Todays Objectives

Tech Day: Universal Acceptance Mark van rek Universal Acceptance Todays Objectives Definition of Universal Acceptance Universal Acceptance Steering Group Challenges BiDi Stuff Conclusion 2 Definition of Universal

665 views • 27 slides

Universal Credit Universal Credit Universal Credit is for working-age people aged over 18 and

Universal Credit Universal Credit Universal Credit is for working-age people aged over 18 and under State Pension age. Universal Credit will replace a number of current benefits: income-based Jobseekers Allowance (JSA);

332 views • 14 slides

Universal Acceptance Quick Guide What Does Universal Acceptance Mean? ACCEPT Universal

Universal Acceptance Quick Guide What Does Universal Acceptance Mean? ACCEPT Universal Acceptance (UA) is the state where all valid domain names and email addresses are accepted, validated, stored, processed and displayed correctly and

122 views • 8 slides

North West Landlords Forum Universal Credit June 2014 Universal Credit Current position

North West Landlords Forum Universal Credit June 2014 Universal Credit Current position Universal Credit Current position Universal Credit live in 10 sites North West Expansion Tranche 1 starts in June Trafford Bolton /

466 views • 12 slides

V-PLC9000 Product Series Veesta Universal PLC & Veesta Universal PLC & Universal PLC

Veesta World Co. Veesta World Co. V-PLC 9000 V-PLC9000 Product Series Veesta Universal PLC & Veesta Universal PLC & Universal PLC & Teleprotection Teleprotection Teleprotection Veesta-PLC9000 ( Version 1.0 ) 1 V-PLC9000

527 views • 36 slides

DISTRIBUTION TRANSFORMERS BUREAU OF INDIAN STANDARDS BHOPAL BIS Act WTO Principle on

PRESENTATION ON STANDARDS- DISTRIBUTION TRANSFORMERS BUREAU OF INDIAN STANDARDS BHOPAL BIS Act WTO Principle on Standard Process for development of Indian Standard Standards on Transformers Detailed provision of IS 1180-1

582 views • 42 slides

Presentation February 2017 www.vimap-technics.com Moisture in power transformers

Presentation February 2017 www.vimap-technics.com Moisture in power transformers www.vimap-technics.com Origin of water in power transformers Depolymerization of cellulose (Water is by-product of cellulose ageing) Water from atmosphere

241 views • 22 slides

Commencing Development of Just Approved New Guide on Moisture in Transformers and Reactors

Commencing Development of Just Approved New Guide on Moisture in Transformers and Reactors Prof. Valery G. Davydov, Consultant Melbourne, Australia WG Moisture in Insulation Systems F13 IEEE Transformers Committee Meeting 21 October 2013

491 views • 36 slides

Understanding the Value of Electrical Testing for Power Transformers Charles Sweetser - OMICRON

Understanding the Value of Electrical Testing for Power Transformers Charles Sweetser - OMICRON SEPTEMBER 5 - 7, 2018 Transformers SEPTEMBER 5 - 7, 2018 Diagnostic Testing - OVERALL DGA Oil Screen Power Factor / Capacitance

295 views • 27 slides

PVMD Olindo Isabella Delft University of Technology Plane of array PV modules irradiance

Introduction to Electrical Conversion PVMD Olindo Isabella Delft University of Technology Plane of array PV modules irradiance Fluid-dynamic model q convection T ambient T sky q rad,sky q convection q sun Meteo data q rad,ground T module T

439 views • 11 slides

High-Charged Magnetized Beams at FAST-IOTA [MagBeam] Northern Illinois University: A.

NORTHERN ILLINOIS CENTER NORTHERN ILLINOIS CENTER FOR ACCELERATOR FOR ACCELERATOR AND DETECTOR AND DETECTOR DEVELOPMENT DEVELOPMENT High-Charged Magnetized Beams at FAST-IOTA [MagBeam] Northern Illinois University: A. Fetterman*, C.

317 views • 9 slides

Online Versus Offline NMT Quality An In-depth Analysis on English-German and German-English Maha

Online Versus Offline NMT Quality An In-depth Analysis on English-German and German-English Maha Elbayad 1,2 Michael Ustaszewski 3 Emmanuelle Esperana-Rodier 1 Francis Brunet-Manquat 1 Jakob Verbeek 4 Laurent Besacier 1 (1) (2) (3) (4)

965 views • 63 slides

Introduction to Electrical Systems Course Code: EE 111 Course Code: EE 111 Department: Electrical

Introduction to Electrical Systems Course Code: EE 111 Course Code: EE 111 Department: Electrical Engineering Department: Electrical Engineering Instructor Name: B G Fernandes Instructor Name: B.G. Fernandes E mail id: bgf @ee iitb ac in E

647 views • 10 slides

All Eyes on Code Using Call Graphs for WSN Software Optimization Wolf-Bastian Pttner, Daniel

All Eyes on Code Using Call Graphs for WSN Software Optimization Wolf-Bastian Pttner, Daniel Willmann, Felix Bsching, and Lars Wolf, IEEE SenseApp, Sydney, Australia, 21/10/2013 Motivation DTN: Delay-tolerant Networking Implementation for

380 views • 21 slides

Multi-Threaded Reactive Processing Xin Li Marian Boldt Reinhard v. Hanxleden Real-Time Systems

Introduction The Kiel Esterel Processor The Compiler Wrap-Up Multi-Threaded Reactive Processing Xin Li Marian Boldt Reinhard v. Hanxleden Real-Time Systems and Embedded Systems Group Department of Computer Science

1.52k views • 112 slides

Introduction to Introduction to with Application to Bioinformatics with Application to

Introduction to Introduction to with Application to Bioinformatics with Application to Bioinformatics - Day 2 - Day 2 Review Day 1 Review Day 1 Give an example of the following: A number of type float A variable containing an integer A

796 views • 61 slides

JUST THE MATHS SLIDES NUMBER 13.7 INTEGRATION APPLICATIONS 7 (First moments of an area)

JUST THE MATHS SLIDES NUMBER 13.7 INTEGRATION APPLICATIONS 7 (First moments of an area) by A.J.Hobson 13.7.1 Introduction 13.7.2 First moment of an area about the y -axis 13.7.3 First moment of an area about the x -axis 13.7.4 The

387 views • 15 slides