Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 - PowerPoint PPT Presentation

ICML 2020 WHU C H I N A Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 Weiwei Liu 2 Xuemin Lin 1 1. University of New South Wales, Australia. 2. Wuhan University, China.

Overview: Adaptive AMTRL (Adversarial Multi-task Representation Learning) Algorithm Forward Propagation Backward Propagation 𝜖෠ 𝑀(𝜄 𝑡ℎ , 𝜄 1 ) 𝜖෠ 𝑀(𝜄 𝑡ℎ , 𝜄 1 ) Task 1 Task Specific Layers Original MTRL 𝜖𝜄 𝑡ℎ Adaptive AMTRL 𝜖𝜄 1 𝑡𝑝𝑔𝑢𝑛𝑏𝑦(𝑋𝑌 + 𝑐) • …… …… Augmented 𝜖෠ 𝜖෠ 𝑀(𝜄 𝑡ℎ , 𝜄 𝑈 ) 𝑀(𝜄 𝑡ℎ , 𝜄 𝑈 ) Input Task T 𝜖𝜄 𝑈 𝜖𝜄 𝑡ℎ (a) Three 2-d Gaussian distributions (b) Discriminator (c) Relatedness changing curve …… Lagrangian Task Relatedness for • Discriminator Relatedness based MinMax … 𝑀 𝐸 (𝜄 𝑡ℎ ) 𝑀 𝐸 (𝜄 𝑡ℎ ) −𝜖෠ 𝜖෠ AMTRL Weighting Strategy 𝜖𝜄 𝑡ℎ 𝜖𝑋 Shared Layers Gradient Reversal Layer Better Performance AMTRL PAC Bound � L D ( h ) − L S ( h ) ≤ c 1 ρ G a ( G ∗ ( X 1 )) + c 2 Qsup g ∈ G ∗ ∥ g ( X 1 ) ∥ 9 ln (2 / δ ) √ n + n 2 nT Negligible Generalization Error The number of tasks does not matter

Content • Adversarial Multi-task Representation Learning (AMTRL) • Adaptive AMTRL • PAC Bound and Analysis • Experiments

Adversarial Multi-task Representation Learning Adversarial Multi-task Representation Learning (AMTRL) has achieved success in various applications, ranging from sentiment analysis to question answering systems. h L ( h, λ ) = L S ( h ) + λ L adv Forward Propagation Backward Propagation min 𝜖෠ 𝑀(𝜄 𝑡ℎ , 𝜄 1 ) 𝜖෠ 𝑀(𝜄 𝑡ℎ , 𝜄 1 ) Task 1 Task Specific Layers Original MTRL 𝜖𝜄 𝑡ℎ 𝜖𝜄 1 Empirical loss: …… …… T n L S ( h ) = 1 𝜖෠ 𝑀(𝜄 𝑡ℎ , 𝜄 𝑈 ) 𝜖෠ 𝑀(𝜄 𝑡ℎ , 𝜄 𝑈 ) � � l t ( f t ( g ( x t i )) , y t i ) Input Task T 𝜖𝜄 𝑈 𝜖𝜄 𝑡ℎ …… nT t =1 i =1 Loss of the adversarial module: Discriminator MinMax … 𝑀 𝐸 (𝜄 𝑡ℎ ) 𝑀 𝐸 (𝜄 𝑡ℎ ) T n −𝜖෠ 𝜖෠ 1 L adv = max � � e t Φ ( g ( x t 𝜖𝜄 𝑡ℎ i )) 𝜖𝑋 nT Φ Shared Layers Gradient Reversal Layer t =1 i =1

Adaptive AMTRL Adversarial AMTRL aims to minimize the task-averaged empirical risk and enforce the representation of each task to share an identical distribution. We formulate it as a constraint optimization problem min L S ( h ) h L adv − c = 0 , s.t. and propose to solve the problem with an augmented Lagrangian method. 1 T L S ( h ) + λ ( L adv − c ) + r 2( L adv − c ) 2 . min h 𝜇 and 𝑠 updates in the training process.

Relatedness for AMTRL � N n =1 e j Φ ( g ( x i n )) + e i Φ ( g ( x j n )) Relatedness between task i and task j: R ij = min { , 1 } � N n )) + e j Φ ( g ( x j n =1 e i Φ ( g ( x i n ))   R 11 R 12 R 1 T · · · R 21 R 22 R 2 T Relatedness matrix:   · · · R =  .   . . . ...   . . . . . .  R T 1 R T 2 R TT · · · 𝑡𝑝𝑔𝑢𝑛𝑏𝑦(𝑋𝑌 + 𝑐) (c) Relatedness changing curve (a) Three 2-d Gaussian distributions (b) Discriminator

Adaptive AMTRL In multi-task learning, tasks regularize each other and improve the generalization of some tasks. The weights of each task influences the effect of the regularization. This paper proposes a weighting strategy for AMTRL based on the proposed task relatedness. 1 1 R 1 ′ 1 R, w = where 1 is a 1×𝑈 vector of all 1, and 𝑆 is the relatedness matrix. Combining the augmented Lagrangian method with the weighting strategy, optimization objective of our adaptive AMTRL method is T 1 w t L S t ( f t ◦ g ) + λ ( L adv − c ) + r � 2( L adv − c ) 2 . min T h t =1

PAC Bound and Analysis Assume the representation of each task share an identical distribution, we have the following generalization error bound. � L D ( h ) − L S ( h ) ≤ c 1 ρ G a ( G ∗ ( X 1 )) + c 2 Qsup g ∈ G ∗ ∥ g ( X 1 ) ∥ 9 ln (2 / δ ) √ n + n 2 nT Generalization Error Negligible The number of tasks does not matter • The generalization error bound for AMTRL is tighter than that for MTRL. • The number of tasks slightly influence the generalization bound of AMTRL.

Experiments - Relatedness Evolution Sentiment Analysis and Topic Classification. T R t = 1 Mean of � R tk . T k =0 Sentiment Analysis. Topic Classification

Experiments - Classification Accuracy Sentiment Analysis and Topic Classification. Sentiment Analysis. Topic Classification

Experiments - Influence of the Number of Tasks Sentiment Analysis. Relative Error: er MTL er rel = � T 1 1 er t STL T Error rate for the task ’ appeal ’ .

THANK YOU

Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 - PowerPoint PPT Presentation

ICML 2020 WHU C H I N A Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 Weiwei Liu 2 Xuemin Lin 1 1. University of New South Wales, Australia. 2. Wuhan University, China. Overview: Adaptive AMTRL (Adversarial Multi-task

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

6 JUNE 2007 Higher Education Data Workshop Outline Data requirements Data quality

Strategic Opportunities: Food & Drink & Scottish Peatlands Ceri Ritchie Sector Manager

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

Support Vector Machines & Kernels Lecture 5 David Sontag New York University Slides adapted

the Future: Advancing Workforce Equity in the United States November 2020 Housekeeping All

Considerations for Your Education & Career Pathways Just-In-Time Slides for Release of GCE

gen , iterate , and ana Still more higher order list functions Theory of Programming Languages

Dynamic Agent Communities Facilitating to Distant Learning in a Virtual University Information

Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 - PowerPoint PPT Presentation

ICML 2020 WHU C H I N A Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 Weiwei Liu 2 Xuemin Lin 1 1. University of New South Wales, Australia. 2. Wuhan University, China. Overview: Adaptive AMTRL (Adversarial Multi-task

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

6 JUNE 2007 Higher Education Data Workshop Outline Data requirements Data quality

Strategic Opportunities: Food &amp; Drink &amp; Scottish Peatlands Ceri Ritchie Sector Manager

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

Support Vector Machines &amp; Kernels Lecture 5 David Sontag New York University Slides adapted

the Future: Advancing Workforce Equity in the United States November 2020 Housekeeping All

Considerations for Your Education &amp; Career Pathways Just-In-Time Slides for Release of GCE

gen , iterate , and ana Still more higher order list functions Theory of Programming Languages

Dynamic Agent Communities Facilitating to Distant Learning in a Virtual University Information

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Strategic Opportunities: Food & Drink & Scottish Peatlands Ceri Ritchie Sector Manager

Support Vector Machines & Kernels Lecture 5 David Sontag New York University Slides adapted

Considerations for Your Education & Career Pathways Just-In-Time Slides for Release of GCE