adaptive adversarial multi task representation learning
play

Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 - PowerPoint PPT Presentation

ICML 2020 WHU C H I N A Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 Weiwei Liu 2 Xuemin Lin 1 1. University of New South Wales, Australia. 2. Wuhan University, China. Overview: Adaptive AMTRL (Adversarial Multi-task


  1. ICML 2020 WHU C H I N A Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 Weiwei Liu 2 Xuemin Lin 1 1. University of New South Wales, Australia. 2. Wuhan University, China.

  2. Overview: Adaptive AMTRL (Adversarial Multi-task Representation Learning) Algorithm Forward Propagation Backward Propagation ๐œ–เท  ๐‘€(๐œ„ ๐‘กโ„Ž , ๐œ„ 1 ) ๐œ–เท  ๐‘€(๐œ„ ๐‘กโ„Ž , ๐œ„ 1 ) Task 1 Task Specific Layers Original MTRL ๐œ–๐œ„ ๐‘กโ„Ž Adaptive AMTRL ๐œ–๐œ„ 1 ๐‘ก๐‘๐‘”๐‘ข๐‘›๐‘๐‘ฆ(๐‘‹๐‘Œ + ๐‘) โ€ข โ€ฆโ€ฆ โ€ฆโ€ฆ Augmented ๐œ–เท  ๐œ–เท  ๐‘€(๐œ„ ๐‘กโ„Ž , ๐œ„ ๐‘ˆ ) ๐‘€(๐œ„ ๐‘กโ„Ž , ๐œ„ ๐‘ˆ ) Input Task T ๐œ–๐œ„ ๐‘ˆ ๐œ–๐œ„ ๐‘กโ„Ž (a) Three 2-d Gaussian distributions (b) Discriminator (c) Relatedness changing curve โ€ฆโ€ฆ Lagrangian Task Relatedness for โ€ข Discriminator Relatedness based MinMax โ€ฆ ๐‘€ ๐ธ (๐œ„ ๐‘กโ„Ž ) ๐‘€ ๐ธ (๐œ„ ๐‘กโ„Ž ) โˆ’๐œ–เท  ๐œ–เท  AMTRL Weighting Strategy ๐œ–๐œ„ ๐‘กโ„Ž ๐œ–๐‘‹ Shared Layers Gradient Reversal Layer Better Performance AMTRL PAC Bound ๏ฟฝ L D ( h ) โˆ’ L S ( h ) โ‰ค c 1 ฯ G a ( G โˆ— ( X 1 )) + c 2 Qsup g โˆˆ G โˆ— โˆฅ g ( X 1 ) โˆฅ 9 ln (2 / ฮด ) โˆš n + n 2 nT Negligible Generalization Error The number of tasks does not matter

  3. Content โ€ข Adversarial Multi-task Representation Learning (AMTRL) โ€ข Adaptive AMTRL โ€ข PAC Bound and Analysis โ€ข Experiments

  4. Adversarial Multi-task Representation Learning Adversarial Multi-task Representation Learning (AMTRL) has achieved success in various applications, ranging from sentiment analysis to question answering systems. h L ( h, ฮป ) = L S ( h ) + ฮป L adv Forward Propagation Backward Propagation min ๐œ–เท  ๐‘€(๐œ„ ๐‘กโ„Ž , ๐œ„ 1 ) ๐œ–เท  ๐‘€(๐œ„ ๐‘กโ„Ž , ๐œ„ 1 ) Task 1 Task Specific Layers Original MTRL ๐œ–๐œ„ ๐‘กโ„Ž ๐œ–๐œ„ 1 Empirical loss: โ€ฆโ€ฆ โ€ฆโ€ฆ T n L S ( h ) = 1 ๐œ–เท  ๐‘€(๐œ„ ๐‘กโ„Ž , ๐œ„ ๐‘ˆ ) ๐œ–เท  ๐‘€(๐œ„ ๐‘กโ„Ž , ๐œ„ ๐‘ˆ ) ๏ฟฝ ๏ฟฝ l t ( f t ( g ( x t i )) , y t i ) Input Task T ๐œ–๐œ„ ๐‘ˆ ๐œ–๐œ„ ๐‘กโ„Ž โ€ฆโ€ฆ nT t =1 i =1 Loss of the adversarial module: Discriminator MinMax โ€ฆ ๐‘€ ๐ธ (๐œ„ ๐‘กโ„Ž ) ๐‘€ ๐ธ (๐œ„ ๐‘กโ„Ž ) T n โˆ’๐œ–เท  ๐œ–เท  1 L adv = max ๏ฟฝ ๏ฟฝ e t ฮฆ ( g ( x t ๐œ–๐œ„ ๐‘กโ„Ž i )) ๐œ–๐‘‹ nT ฮฆ Shared Layers Gradient Reversal Layer t =1 i =1

  5. Adaptive AMTRL Adversarial AMTRL aims to minimize the task-averaged empirical risk and enforce the representation of each task to share an identical distribution. We formulate it as a constraint optimization problem min L S ( h ) h L adv โˆ’ c = 0 , s.t. and propose to solve the problem with an augmented Lagrangian method. 1 T L S ( h ) + ฮป ( L adv โˆ’ c ) + r 2( L adv โˆ’ c ) 2 . min h ๐œ‡ and ๐‘  updates in the training process.

  6. Relatedness for AMTRL ๏ฟฝ N n =1 e j ฮฆ ( g ( x i n )) + e i ฮฆ ( g ( x j n )) Relatedness between task i and task j: R ij = min { , 1 } ๏ฟฝ N n )) + e j ฮฆ ( g ( x j n =1 e i ฮฆ ( g ( x i n )) ๏ฃฎ ๏ฃน R 11 R 12 R 1 T ยท ยท ยท R 21 R 22 R 2 T Relatedness matrix: ๏ฃฏ ๏ฃบ ยท ยท ยท R = ๏ฃป . ๏ฃฏ ๏ฃบ . . . ... ๏ฃฏ ๏ฃบ . . . . . . ๏ฃฐ R T 1 R T 2 R TT ยท ยท ยท ๐‘ก๐‘๐‘”๐‘ข๐‘›๐‘๐‘ฆ(๐‘‹๐‘Œ + ๐‘) (c) Relatedness changing curve (a) Three 2-d Gaussian distributions (b) Discriminator

  7. Adaptive AMTRL In multi-task learning, tasks regularize each other and improve the generalization of some tasks. The weights of each task influences the effect of the regularization. This paper proposes a weighting strategy for AMTRL based on the proposed task relatedness. 1 1 R 1 โ€ฒ 1 R, w = where 1 is a 1ร—๐‘ˆ vector of all 1, and ๐‘† is the relatedness matrix. Combining the augmented Lagrangian method with the weighting strategy, optimization objective of our adaptive AMTRL method is T 1 w t L S t ( f t โ—ฆ g ) + ฮป ( L adv โˆ’ c ) + r ๏ฟฝ 2( L adv โˆ’ c ) 2 . min T h t =1

  8. PAC Bound and Analysis Assume the representation of each task share an identical distribution, we have the following generalization error bound. ๏ฟฝ L D ( h ) โˆ’ L S ( h ) โ‰ค c 1 ฯ G a ( G โˆ— ( X 1 )) + c 2 Qsup g โˆˆ G โˆ— โˆฅ g ( X 1 ) โˆฅ 9 ln (2 / ฮด ) โˆš n + n 2 nT Generalization Error Negligible The number of tasks does not matter โ€ข The generalization error bound for AMTRL is tighter than that for MTRL. โ€ข The number of tasks slightly influence the generalization bound of AMTRL.

  9. Experiments - Relatedness Evolution Sentiment Analysis and Topic Classification. T R t = 1 Mean of ๏ฟฝ R tk . T k =0 Sentiment Analysis. Topic Classification

  10. Experiments - Classification Accuracy Sentiment Analysis and Topic Classification. Sentiment Analysis. Topic Classification

  11. Experiments - Influence of the Number of Tasks Sentiment Analysis. Relative Error: er MTL er rel = ๏ฟฝ T 1 1 er t STL T Error rate for the task โ€™ appeal โ€™ .

  12. THANK YOU

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend