implicit class conditioned domain alignment for
play

Implicit Class-Conditioned Domain Alignment for Unsupervised Domain - PowerPoint PPT Presentation

Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation 1,2 1,4 Xiang Jiang Qicheng Lao 1,3 1 Stan Matwin Mohammad Havaei 1 Imagia 2 Dalhousie University 3 Polish Academy of Sciences 4 Mila, Universit e de Montr


  1. Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation 1,2 1,4 Xiang Jiang Qicheng Lao 1,3 1 Stan Matwin Mohammad Havaei 1 Imagia 2 Dalhousie University 3 Polish Academy of Sciences 4 Mila, Universit´ e de Montr´ eal June 13, 2020 Implicit Alignment for UDA June 13, 2020 1 / 32

  2. Introduction: Unsupervised Domain Adaptation Introduction: Unsupervised Domain Adaptation (UDA) The setup of UDA: predict observed variable X labeling function f , labels Y = f ( X ) domain variable D scanner The goal is to learn p ( y | x ) where predict D S = { ( x i , f S ( x i )) } n i =1 D T = { x j } m j =1 disease image f S = f T Implicit Alignment for UDA June 13, 2020 2 / 32

  3. ✶ Related Work Related Work Adversarial domain-discriminator based approaches [Ganin et al., 2016]: min L ( D S ) + λ dis ( D S , D T ) (1) θ max dis ( D S , D T ) (2) f Implicit Alignment for UDA June 13, 2020 3 / 32

  4. ✶ Related Work Related Work Adversarial domain-discriminator based approaches [Ganin et al., 2016]: min L ( D S ) + λ dis ( D S , D T ) (1) θ max dis ( D S , D T ) (2) f Limitation : p S ( x ) = p T ( x ) � p S ( x | y ) = p T ( x | y ) Implicit Alignment for UDA June 13, 2020 3 / 32

  5. Related Work Related Work Adversarial domain-discriminator based approaches [Ganin et al., 2016]: min L ( D S ) + λ dis ( D S , D T ) (1) θ max dis ( D S , D T ) (2) f Limitation : p S ( x ) = p T ( x ) � p S ( x | y ) = p T ( x | y ) Prototype-based class-conditioned explicit alignment [Luo et al., 2017, Xie et al., 2018]: min L ( D S ) + λ 1 dis ( D S , D T ) + λ 2 L explicit (3) θ max dis ( D S , D T ) (4) f where S − c j T ] L explicit = E [ c j (5) S = 1 � c j ✶ { y i = j } f φ ( x i ) (6) N j ( x i , y i ) ∈D S Implicit Alignment for UDA June 13, 2020 3 / 32

  6. Related Work Related Work Adversarial domain-discriminator based approaches [Ganin et al., 2016]: min L ( D S ) + λ dis ( D S , D T ) (1) θ max dis ( D S , D T ) (2) f Limitation : p S ( x ) = p T ( x ) � p S ( x | y ) = p T ( x | y ) Prototype-based class-conditioned explicit alignment [Luo et al., 2017, Xie et al., 2018]: min L ( D S ) + λ 1 dis ( D S , D T ) + λ 2 L explicit (3) θ max dis ( D S , D T ) (4) f where S − c j T ] L explicit = E [ c j (5) S = 1 � c j ✶ { y i = j } f φ ( x i ) (6) N j ( x i , y i ) ∈D S Limitation : Error accumulation in explicit optimization on pseudo-labels Implicit Alignment for UDA June 13, 2020 3 / 32

  7. Motivation Motivations Applied motivation Theoretical motivation Implicit Alignment for UDA June 13, 2020 4 / 32

  8. Motivation Applied Motivation Challenges for applying UDA in real-world applications [Tan et al., 2019]: within-domain class imbalance; between-domain class distribution shift, aka, prior probability shift. ������������������� ������ � ����������������������������� � ��������������������������������������� � ���������������������������� ������ Implicit Alignment for UDA June 13, 2020 5 / 32 �

  9. Motivation Theoretical Motivation: Empirical Domain Divergence Definition ([Ben-David et al., 2010]) The H ∆ H divergence between two domains is defined as | E D T [ h � = h ′ ] − E D S [ h � = h ′ ] | , d H ∆ H ( D S , D T ) = 2 sup (7) h , h ′ ∈H Definition (mini-batch based empirical domain discrepancy) Let B S , B T be minibatches from U S and U T , respectively, where B S ⊆ U S , B T ⊆ U T , and |B S | = |B T | . The empirical estimation of d H ∆ H ( B S , B T ) over the minibatches B S , B T is defined as � � � � ˆ � [ h � = h ′ ] − � [ h � = h ′ ] d H ∆ H ( B S , B T ) = sup � . (8) � � � � h , h ′ ∈H � B T B S Implicit Alignment for UDA June 13, 2020 6 / 32

  10. Motivation Theoretical Motivation: The Decomposition Theorem (The decomposition of ˆ d H ∆ H ( B S , B T )) We define three disjoint sets on the label space: Y C := Y S ∩ Y T , Y S := Y S − Y C , and Y T := Y T − Y C . We also define the following disjoint sets on the input space where B C S := { x ∈ B S | y ∈ Y C } , B C ∈ Y C } , B C S := { x ∈ B S | y / T := { x ∈ B T | y ∈ Y C } , ∈ Y C } . The empirical ˆ B C T := { x ∈ B T | y / d H ∆ H ( B S , B T ) divergence can be decomposed into as the following: � � ˆ � ξ C ( h , h ′ ) + ξ C ( h , h ′ ) d H ∆ H ( B S , B T ) = sup � , (9) � � h , h ′ ∈H where ξ C ( h , h ′ ) = � h � = h ′ � � h � = h ′ � � � ✶ − ✶ , (10) B C B C T S ξ C ( h , h ′ ) = � h � = h ′ � � h � = h ′ � � � − . (11) ✶ ✶ B C B C T S Implicit Alignment for UDA June 13, 2020 7 / 32

  11. Input samples � � � � 3 4 5 6 Label space � � � � � � shortcut shortcut Domain discriminator 1 0 (source, target) domain discriminator (source, target) domain discriminator 3 shortcut 6 3 shortcut ( , ) 6 ( , ) 3 6 goal Motivation goal ( , ) 4 4 ( , ) Theoretical Motivation: Domain-Discriminator Shortcut (source, target) domain discriminator (source, target) domain discriminator 3 shortcut 3 6 shortcut 6 ( , ) 3 6 Misaligned: ( , ) 3 6 goal goal ( , ) 4 4 4 4 Aligned: ( , ) Remark (The domain discriminator shortcut) Let f c be a classifier that maps x to a class label y c . Let f d be a domain discriminator that maps x to a binary domain label y d . For the empirical class-misaligned divergence ξ C ( h , h ′ ) with sample x ∈ B C S ∪ B C T , there exists a domain discriminator shortcut function � 1 f c ( x ) ∈ Y S f d ( x ) = (12) 0 f c ( x ) ∈ Y T , such that the domain label can be solely determined by the domain-specific class labels. (More pronounced under imbalance and distribution shift.) Implicit Alignment for UDA June 13, 2020 8 / 32

  12. Proposed Approach Proposed Approach 𝑞 - (𝑦) 𝑞 - 𝑦 𝑧 𝑞(𝑧) 𝑞(𝑨|𝑦; 𝜚) 𝑞(𝑧 * |𝑨 * ; 𝜄) 𝑞 * (𝑦) 𝑞 * 𝑦 , 𝑧 𝑞(𝑧) 𝑧 * , pseudo-labels sampling implicit domain-invariant data classifier alignment representations (a) (b) (c) (d) For p S ( x ), we sample x ∼ p S ( x | y ) p ( y ) based on the alignment distribution p ( y ) For p T ( x ), we sample a class aligned minibatch x ∼ p T ( x | ˆ y ) p ( y ) using identical p ( y ), with the help of pseudo-labels ˆ y T Implicit Alignment for UDA June 13, 2020 9 / 32

  13. Proposed Approach Proposed Approach 1: Input: dataset S = { ( x i , y i ) } N i =1 , T = { x i } M i =1 , label space Y , label alignment distribution p ( y ), 2: classifier f c ( · ; θ ) 3: 4: while not converged do # predict pseudo-labels for T 5: ˆ y i ) } M T ← { ( x i , ˆ i =1 where x i ∈ T and ˆ y i = f c ( x i ; θ ) 6: # sample N unique classes in the label space 7: Y ← draw N samples in Y from p ( y ) 8: # sample K examples conditioned on each y j ∈ Y 9: for y j in Y do 10: ( X ′ S , Y ′ S ) � draw K samples in S from p S ( x | y = y j ) 11: T � draw K samples in ˆ X ′ T from p T ( x | ˆ y = y j ) 12: end for 13: # domain adaptation training on this minibatch 14: train minibatch ( X ′ S , Y ′ S , X ′ T ) 15: 16: end while Implicit Alignment for UDA June 13, 2020 10 / 32

  14. Proposed Approach Advantages of the proposed approach Minimizes the class-misaligned divergence ξ C ( h , h ′ ), providing a more reliable 1 empirical estimation of domain divergence; Implicit Alignment for UDA June 13, 2020 11 / 32

  15. Proposed Approach Advantages of the proposed approach Minimizes the class-misaligned divergence ξ C ( h , h ′ ), providing a more reliable 1 empirical estimation of domain divergence; Provides balanced training across all classes; 2 Implicit Alignment for UDA June 13, 2020 11 / 32

  16. Proposed Approach Advantages of the proposed approach Minimizes the class-misaligned divergence ξ C ( h , h ′ ), providing a more reliable 1 empirical estimation of domain divergence; Provides balanced training across all classes; 2 Removes the need to optimize model parameters from pseudo-labels explicitly; 3 Implicit Alignment for UDA June 13, 2020 11 / 32

  17. Proposed Approach Advantages of the proposed approach Minimizes the class-misaligned divergence ξ C ( h , h ′ ), providing a more reliable 1 empirical estimation of domain divergence; Provides balanced training across all classes; 2 Removes the need to optimize model parameters from pseudo-labels explicitly; 3 Simple to implement and is orthogonal to different domain discrepancy 4 measures: DANN and MDD. Implicit Alignment for UDA June 13, 2020 11 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend