 
              Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Understanding deeply and improving VAT Dongha Kim and Yongchan Choi Speaker : Dongha Kim Department of Statistics, Seoul National University, South Korea July 5, 2018
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Introduction 1 Related works 2 Explanations of VAT objective function 3 New methodologies improving VAT 4 Experiments 5
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Introduction 1 Related works 2 Explanations of VAT objective function 3 New methodologies improving VAT 4 Experiments 5
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Introduction • Deep learning suffers from the lack of labels , since labeling is proceeded manually which results in a lot of expenditure in both money and time. • Many researches have been proposed to deal with the lack of labels exploiting unlabeled data as well as labeled data to learn a optimal classifier (Weston et al., 2012; Rasmus et al., 2015; Kingma et al., 2014). • Recently, two powerful methods have been proposed, one is called VAT method(Miyato et al., 2015, 2017) and the other is called bad GAN method(Dai et al., 2017).
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Introduction • VAT is efficient and powerful method, but its learning procedure is rather unstable and it is still not clear why the VAT method also works well in semi-supervised case. • The method using bad GAN has clear principle and state-of-art prediction power, but it needs additional architectures which leads to heavy computational costs. So, it is infeasible to apply this to very large dataset.
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Our contributions • We give a clear explanation why VAT works well in semi-supervised learning. • Based on our findings, we propose some simple and powerful techniques to improve VAT. • Especially we adopt the main idea of bad GAN which generates bad samples using bad generator, and apply this idea to VAT without any additional architectures. • By using these methods, we can achieve superior results than other approaches, especially VAT, in both prediction power and efficiency aspects.
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Introduction 1 Related works 2 Explanations of VAT objective function 3 New methodologies improving VAT 4 Experiments 5
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Adversarial training (AT, Goodfellow et al. (2014)) � Smooth the model by using adversarial perturbations. • p ( ·| x ; θ ) : a conditional distribution of deep architecture parametrized by θ . • Regularization term is the following: L AT ( θ ; x, y, ǫ ) = KL [ h ( y ) , p ( ·| x + r advr ; θ )] where r advr = argmax KL [ h ( y ) , p ( ·| x + r ; θ )] r ; || r || 2 ≤ ǫ where h ( y ) is a one hot vector of y whose entries are all 0 except for the index corresponding to label y . • The final objective function of AT is as follows: L AT ( θ ; x, y, ǫ ) � � E ( x,y ) ∼L tr [ − log p ( y | x ; θ )] + E ( x,y ) ∼L tr where L tr is labeled data and ǫ > 0 is a hyperparameter.
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Virtual adversarial training (VAT, Miyato et al. (2017)) � VAT succeeds the key idea of AT. • VAT just substitutes h ( y ) by p ( ·| x ; ˆ θ cur ) and this substitution allows VAT to be applicable to semi-supervised case. • Regularization term of VAT is the following: L V AT ( θ ; ˆ � � p ( ·| x ; ˆ θ cur , x, ǫ ) = KL θ cur ) , p ( ·| x + r advr ; θ ) � � p ( ·| x ; ˆ θ cur ) , p ( ·| x + r ; θ ) where r advr = argmax KL r ; || r || 2 ≤ ǫ where ˆ θ cur is current estimated parameters which is treated as constant and p ( ·| x ; ˆ θ cur ) is current conditional distribution. • The final objective function of VAT is as follows: � L V AT ( θ ; ˆ � θ cur , x, ǫ ) E ( x,y ) ∼L tr [ − log p ( y | x ; θ )] + E x ∼U tr where U tr is unlabeled data.
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Virtual adversarial training (VAT, Miyato et al. (2017)) Remark • Note that p ( ·| x ; ˆ θ cur ) is a constant vector, thus we can rewrite the regularization term as follows: K L V AT ( θ ; ˆ � � � p ( k | x ; ˆ θ cur , x, ǫ ) = − θ cur ) log p ( k | x + r advr ; θ ) + C, k =1 which is equal to cross-entropy term between p ( ·| x ; ˆ θ cur ) and p ( ·| x ; θ ) .
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments bad GAN approach (Dai et al., 2017) • Bad GAN approach is a method that trains a good discriminator with a bad generator which generates samples over the support with low density. • This approach trains a generator p ( ·| x ; θ ) and a bad generator p G ( ·| η ) simultaneously with their own objective functions. • To train p G ( ·| η ) , we need a pre-trained density estimation model, for instance PIXELCNN++ (Salimans et al., 2017). • To train the discriminator, we consider K -class classification problem as ( K + 1) -class classification problem where ( K + 1) -th class is an artificial label of bad samples generated by bad generator.
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments bad GAN approach (Dai et al., 2017) • The objective function of discriminator is as follows: � K � �� � E x,y ∼L tr [ − log p ( y | x ; θ, y ≤ K )] + E x ∼U tr − log p ( k | x ; θ ) k =1 + E x ∼G (ˆ η cur ) [ − log p ( K + 1 | x ; θ )] η cur ) is data generated by currently estimated where G (ˆ η cur ) . generator p G ( ·| ˆ
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Introduction 1 Related works 2 Explanations of VAT objective function 3 New methodologies improving VAT 4 Experiments 5
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Notations • L tr = { ( x l i =1 : labeled data ( x ∈ R p and y ∈ { 1 , ..., K } ). i , y i ) } n • U tr = { x u j } m j =1 : unlabeled data. • y ( x ) : ground-truth label of an input x . (of course, y ( x l i ) = y i .) • We can partition unlabeled data as following: U tr = ∪ K k =1 U tr k where U tr k = { x : x ∈ U tr , y ( x ) = k } . Definition 1. ′ ) is ǫ - connected iff d ( x, x ′ ) < ǫ , where d ( · , · ) We define a tuple ( x, x is Euclidean distance. And a set X is called ǫ - connected iff for all ′ ∈ X , there exists a path ( x, x 1 , ..., x q , x ′ ) such that x, x ′ ) are all ǫ - connected . ( x, x 1 ) , ( x 1 , x 2 ) , ..., ( x q − 1 , x q ) , ( x q , x
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Notations • With definition 1, we can partition U tr k as disjoint union of clusters as following: k = ∪ n ( ǫ,k ) U tr U tr k,l ( ǫ ) l =1 where U tr k,l ( ǫ ) is ǫ - connected for all l , d ( U tr k,l ( ǫ ) , U tr ′ ) ≥ ǫ for all k,l ′ ( ǫ )) = min x ∈U tr k,l ′ d ( x, x k,l ,x ′ ∈U tr ′ , and n ( ǫ, k ) is the number of clusters of U tr l � = l k .
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Main theorem Main theorem Let assume there exists ǫ > 0 s.t. ′ , 1 d ( U tr k,l ( ǫ ) , U tr k ′ ,l ′ ( ǫ )) ≥ 2 ǫ for all k � = k k,l ( ǫ ) , there exist at least one ( x, y ) ∈ L tr which have 2 For all U tr the same label s.t. d ( x, U tr k,l ) < ǫ. And also let assume that there exists a classifier f : R p → { 1 , ..., K } s.t. 3 f ( x ) = y for all ( x, y ) ∈ L tr and f ( x ) = f ( x ′ ) for all ′ ∈ B ( x, ǫ ) , x ∈ U tr . x Then, the f classify the unlabeled set perfectly, that is: f ( x ) = y ( x ) for all x ∈ U tr .
Introduction Related works Explanations of VAT objective function New methodologies improving VAT Experiments Derivation of VAT loss function • Let f ( x ; θ ) = argmax p ( k | x ; θ ) . k =1 ,...,K • We focus to find optimal θ satisfying the condition 3 in main theorem by using a suitable objective function. • The most plausible candidate may be using indicator function: E ( x,y ) ∼L tr [ I ( f ( x ; θ ) � = y )] ′ ∈ B ( x, ǫ ) � � �� ′ ; θ ) for ∀ x + E x ∼U tr f ( x ; θ ) � = f ( x I (1) • ˆ ⇒ f ( · ; ˆ θ achieves 0 value ⇐ θ ) satisfies the condition 3. • Two problems to minimize the objective function (1): 1 The indicator function is impossible to be optimized because of discontinuity. ′ ∈ B ( x, ǫ ) in order to calculate the 2 It is infeasible to search all x second term of (1).
Recommend
More recommend