Adaptive Activation Network and Functional Regularization for - - PowerPoint PPT Presentation

adaptive activation network and functional regularization
SMART_READER_LITE
LIVE PREVIEW

Adaptive Activation Network and Functional Regularization for - - PowerPoint PPT Presentation

Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning (AAAI-2020) Reading Group Dec. 11, 2019 Suman Saha (postdoc), Computer Vision Lab @ ETH Zurich M o t t i i v v a a t t i


slide-1
SLIDE 1

Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning (AAAI-2020)

Reading Group Dec. 11, 2019

Suman Saha (postdoc), Computer Vision Lab @ ETH Zurich

slide-2
SLIDE 2

2

M

  • t

i t i v a v a t i t i

  • n

: D N n : D N N A A c t i c t i v a t i t i

  • n

F n F u n c t i n c t i

  • n

s n s

 L

e a r n i n g a c t i v a t i

  • n

f u n c t i

  • n

s t

  • i

m p r

  • v

e d e e p n e u r a l n e t w

  • r

k s ( D N N s ) [ 1 ]

 P

a r a m e t e r s i n t h e l i n e a r c

  • m

p

  • n

e n t s ( W a n d b ) a r e l e a r n e d f r

  • m

d a t a

 Wh

i l e n

  • n

l i n e a r i t i e s a r e p r e d e fj n e d , e . g . s i g m

  • i

d , t a n h

  • r

R e L U e t c .

 A

s s u m p t i

  • n

– a n a r b i t r a r y c

  • m

p l e x f u n c t i

  • n

c a n b e a p p r

  • x

i m a t e d u s i n g a n y

  • f

t h e s e c

  • m

m

  • n

n

  • n

l i n e a r f u n c t i

  • n

s

I n p r a c t i c e , t h e c h

  • i

c e

  • f

n

  • n

l i n e a r i t y a fg e c t s :

 →

t h e l e a r n i n g d y n a m i c s

 →

n e t w

  • r

k e x p r e s s i v e p

  • w

e r

[1] Agostinelli, Forest, et al. "Learning activation functions to improve deep neural networks." arXiv preprint arXiv:1412.6830 (2014).

slide-3
SLIDE 3

3

M

  • t

i t i v a v a t i t i

  • n

: C n : C h

  • i

c e

  • f

N

  • N
  • n

l i n l i n e n e a r i t y t y

 A

c t i v e r e s e a r c h a r e a – d e s i g n a c t i v a t i

  • n

f u n c t i

  • n

s t h a t e n a b l e f a s t t r a i n i n g

  • f

D N N V a n i s h i n i n g G r a d i e n i e n t P t P r

  • b
  • b

l e m

 D

e r i v a t i v e

  • f

a S i g m

  • i

d F u n c t i

  • n

 r

a n g e s b e t w e e n t

  • t
  • .

2 5

We i g h t U t U p d a t e a t e

 F

  • r

D N N w i t h m

  • r

e l a y e r s , t h e g r a d i e n t s t e n d t

  • v

a n i s h m

  • r

e i n t h e l

  • w

e r l a y e r s

slide-4
SLIDE 4

4

M

  • t

i t i v a v a t i t i

  • n

: C n : C h

  • i

c e

  • f

N

  • N
  • n

l i n l i n e n e a r i t y t y

 Tie

r e c t i fj e d l i n e a r a c t i v a t i

  • n

f u n c t i

  • n

( R e L U ) d

  • e

s n

  • t

s a t u r a t e l i k e s i g m

  • i

d a l f u n c t i

  • n

s

 h

e l p s t

  • v

e r c

  • m

e t h e v a n i s h i n g g r a d i e n t p r

  • b

l e m

 M

a x

  • u

t a c t i v a t i

  • n

( G

  • d

f e l l

  • w

e t a l . , 2 1 3 ) – c

  • m

p u t e s t h e m a x i m u m

 o

f a s e t

  • f

l i n e a r f u n c t i

  • n

s

 S

p r i n g e n b e r g & R i e d m i l l e r ( 2 1 3 ) r e p l a c e d t h e m a x f u n c t i

  • n

 G

u l c e h r e e t a l . ( 2 1 4 ) e x p l

  • r

e d a n a c t i v a t i

  • n

f u n c t i

  • n

t h a t r e p l a c e s t h e m a x f u n c t i

  • n

w i t h a n L

P

n

  • r

m A n A n

  • h

t e r t e r r r e c e n e n t a c t a c t i v t i v a t i

  • a

t i

  • n

f u n c t i

  • n

t i

  • n

s

slide-5
SLIDE 5

5

M

  • t

i t i v a v a t i t i

  • n

 Tie

t y p e

  • f

a c t i v a t i

  • n

f u n c t i

  • n

c a n h a v e a s i g n i fj c a n t i m p a c t

  • n

l e a r n i n g

 O

n e w a y t

  • e

x p l

  • r

e t h e s p a c e

  • f

p

  • s

s i b l e f u n c t i

  • n

s i s t

  • l

e a r n t h e a c t i v a t i

  • n

f u n c t i

  • n

d u r i n g t r a i n i n g ( A g

  • s

t i n e l l i e t a l . , 2 1 4 )

slide-6
SLIDE 6

6

A d a d a p t i t i v e P P i e c e w i s e s e L i L i n e n e a r ( ( A P A P L ) L ) u n i n i t s t s

 A

c t i v a t i

  • n

f u n c t i

  • n

s a s a s u m

  • f

h i n g e

  • s

h a p e d f u n c t i

  • n

s r e s u l t i n g a p i e c e w i s e l i n e a r a c t i v a t i

  • n

f u n c t i

  • n

 S

( t h e n u m b e r e r

  • f
  • f

h h i n i n g e s e s ) i s a h y p e r p a r a m e t e r s e t i n a d v a n c e

a r e t h e l e a r a r n a b a b l e p e p a r a m a m e t e r e t e r s , w h e r e

v a r i a b l e s c

  • n

t r

  • l

t h e s l

  • p

e s e s

  • f

t h e l i n e a r s e g m e n t s

v a r i a b l e s d e t e r m i n e t h e l

  • c

a t i a t i

  • n
  • n

s

  • f

t h e h i n g e s

slide-7
SLIDE 7

7

A d a d a p t i t i v e P P i e c e c e w i s e L i n e n e a r ( ( A P L ) u u n i n i t s t s

 F

i g . 1 s h

  • w

s e x a m p l e A P L f u n c t i

  • n

s f

  • r

S = 1

f

  • r

l a r g e e n

  • u

g h S , A P L c a n a p p r

  • x

i m a t e a r b i t r a r i l y c

  • m

p l e x c

  • n

t i n u

  • u

s f u n c t i

  • n

s

 t

h e fj r s t t e r m i n E q . ( 1 ) i s R e L e L U

 w

h e n x < x < t h e d e r e r i v i v a t i v a t i v e

  • f

R e L U i s r e s u l t i n g d e a d a d n n e u e u r

  • n
  • n

s L e a k y R e L U s a d d r e s s e s t h e d e a d n e u r

  • n

s p r

  • b

l e m s , e . g . l e a k y R e L U m a y h a v e y = . 1 x w h e n x <

Figure 1: Sample activation functions obtained from changing the parameters. Notice that figure b shows that the activation function can also be non-convex. Figure 2

slide-8
SLIDE 8

8

T A A N ( T a s k A A d a d a p t i t i v e A c t i v a t i t i

  • n

N n N e t w

  • r

k ) )

  • P

r

  • p
  • p
  • s
  • s

e d a p a p p r

  • a

c

  • a

c h = h = h a r d

  • s

h a r i n i n g + l + l e a r a r n a b a b l e t a s e t a s k

  • k
  • s

p e c i fj i fj c a c t i t i v a t i

  • n

t i

  • n

f u n c t i

  • n

t i

  • n

s

  • a

l l t a s k s c a n s h a r e t h e i r w e i g h t s a n d b i a s e s

  • n

t h e h i d d e n l a y e r s

  • m
  • r

e s c a l a b l e t h a n t h e s

  • f

t

  • s

h a r i n g m e t h

  • d

s w h e r e t h e n u m b e r

  • f

n e t w

  • r

k c

  • m

p

  • n

e n t s i s p r

  • p
  • r

t i

  • n

a l t

  • t

h e n u m b e r

  • f

t a s k s

Categories of deep learning MTL: (a) Hard-sharing; (b) Soft-sharing; (c) Task Adaptive Activation Network (proposed model); (d) Inner Structure of Adaptive Activation Layer.

slide-9
SLIDE 9

9

T A A A A N N

  • F
  • r

a t a s k t , g i v e n t h e i n p u t f r

  • m

e i t h e r t h e p r e v i

  • u

s l a y e r

  • r

d a t a i n p u t , t h e

  • u

t p u t

  • f

t h e l

  • t

h A A L ( A d a p t i v e A c t i v a t i

  • n

L a y e r ) i s d e fj n e d b y

  • w

e i g h t a n d b i a s p a r a m e t e r s a r e s h a r e d a c r

  • s

s t a s k s

(c) Task Adaptive Activation Network (proposed model); (d) Inner Structure of Adaptive Activation Layer.

  • Tie

t a s k

  • s

p e c i fj c a c t i v a t i

  • n

f u n c t i

  • n

f

  • r

t a s t a s k k t a n d l a y e r e r l l i s d e fj n e d a s

  • R

e c a l l f r

  • m

s l i d e 6 a n d 7 , M ( t h e n u m b e r

  • f

h i n g e s ) i s a h y p e r p a r a m e t e r s e t i n a d v a n c e

  • d

e n

  • t

e s t h e c

  • r

d i n a t e s

  • f

t h e b a s i s f u n c t i

  • n

s

slide-10
SLIDE 10

1

T A A A A N

  • E

a c h t a s k h a s i t s

  • w

n c

  • r

d i n a t e v e c t

  • r
  • t

h e r e i s a c

  • r

d i n i n a t e m a t e m a t r a t r i x i x a tu a c h e d t

  • e

a c h A A L h i d d e n l a y e r

  • t

h e c

  • r

d i n a t e m a t r i c e s

  • f

t h e h i d d e n l a y e r s c

  • n
  • n

t r t r

  • l

t h t h e l e l e v e l e l

  • f
  • f

n e t w e t w

  • r
  • r

k s k s h a r i n i n g a m

  • n

g m u l t i p l e t a s k s

  • F
  • r

i n s t a n c e , i f t a s k s 1 a n d 2 h a v e m

  • r
  • r

e s e s h a r a r e d k n k n

  • w

l e d g e a t t h e 1

s t

h i d d e n l a y e r , a n d h a v e h i g h e r e r s s i m i m i l a r i t y i t y ,

  • t

h u s a n d a r e m

  • r

e s i m i l a r

  • O

n t h e

  • t

h e r h a n d , i f t a s k s 1 a n d 2 s h a r e l e s e s s k n k n

  • w

l e d g e a t t h e 2 n d h i d d e n l a y e r , t h t h e i e i r a c t i t i v a t i

  • n

t i

  • n

f f u n c t i t i

  • n
  • n

s a r e m

  • r

e d e d i v i v e r e r s e

  • D

u r i n g t h e t r a i n i n g p h a s e , t h e c

  • r

d i n a t e m a t r i c e s

  • f

a l l h i d d e n l a y e r s a r e

  • p

t i m i z e d t

  • e

x t r a c t b

  • t

h t h e s h a r e d a n d t a s k

  • s

p e c i fj c k n

  • w

l e d g e f r

  • m

d a t a

slide-11
SLIDE 11

1 1

M e t r t r i c s f s f

  • r

A A c t i t i v a t i

  • n

F u n c n c t i t i

  • n

s n s

  • I

n

  • r

d e r t

  • u

n d e r s t a n d h

  • w

T A A N c a p t u r e s t h e r e l a t i

  • n

s h i p

  • f

m u l t i p l e t a s k s , w e n e e d a m e t r i c s t

  • m

e a s u r e t h e d i fg i fg e r e r e n e n c e / s e / s i m i l i l a r i t y i t y b e t w e e n t w

  • a

c

  • a

c t i v t i v a t i

  • a

t i

  • n

f f u n c t i

  • t

i

  • n

s

  • A

s t h e b a s i s f u n c t i

  • n

s

  • f

A P L u n i t s a r e u n b

  • u

n d e d a n d n

  • n
  • r

t h

  • n
  • r

m a l , t h e c

  • r

d i n a t e v e c t

  • r

s d

  • n
  • t

r e v e a l m u c h p r

  • p

e r t y a b

  • u

t t h e f u n c t i

  • n

s

  • B

e s i d e s , t h e c

  • m

m

  • n

l y u s e d n

  • r

m a n d i n n e r p r

  • d

u c t a r e i n fj n i t e a l m

  • s

t e v e r y w h e r e , i t i s i m p

  • s

s i b l e t

  • u

s e t h e m a s m e t r i c s

  • A

u t h

  • r

s r e d e fj n e t h e fj n i t e i n n e r p r

  • d

u c t a n d n

  • r

m a s s u m i n g i s a r a n d

  • m

v a r i a b l e w i t h G a u s s i a n d i s t r i b u t i

  • n
slide-12
SLIDE 12

1 2

F u F u n c n c t i t i

  • n

a n a l R e g u l a r i z a t i

  • n
  • F
  • r

e a c h l a y e r

  • f

T A A N , t h e c

  • r

d i n a t e m a t r i x c a n b e l e a r n e d d i r e c t l y f r

  • m

t h e t r a i n i n g d a t a

  • A

s t h e t a s k s i n M T L a r e g e n e r a l l y c

  • n

s i d e r e d t

  • b

e r e l a t e d , i t i s r e a s

  • n

a b l e t

  • e

n c

  • u

r a g e s h a r i n g m

  • r

e t h a n s p l i tu i n g

  • Tii

s i n s i g h t i s i n c

  • r

p

  • r

a t e d i n t

  • T

A A N b y i n t r

  • d

u c i n g r e g u l a r i z a t i

  • n

t e r m

  • n

d u r i n g t r a i n i n g

  • A

u t h

  • r

s p r

  • p
  • s

e t w

  • f

u n c t i

  • n

a l r e g u l a r i z a t i

  • n

m e t h

  • d

s t

  • f

u r t h e r e n h a n c e t h e p e r f

  • r

m a n c e s

  • f

T A A N

slide-13
SLIDE 13

1 3

  • Tie

fj r s t r e g u l a r i z a t i

  • n

h y p

  • t

h e s i s i s t h a t t h e m a t r i x i s l

  • w
  • r

a n k , a s t h e t a s k s i n M T L

  • f

t e n h a v e h i g h c

  • r

r e l a t i

  • n
  • Tiu

s , a u t h

  • r

s i n t r

  • d

u c e a r e g u l a r i z a t i

  • n

t e r m t

  • d

e n

  • t

e s t h e s q u a r e r

  • t
  • f

m a t r i x

F u F u n c n c t i t i

  • n

a n a l R e g u l a r i z a t i

  • n

Baseline: Trace-Norm Functional regularization by cosine similarity

t h e s i m i l a r i t y

  • f

t w

  • t

a s k

  • s

p e c i fj c a c t i v a t i

  • n

f u n c t i

  • n

s c a n b e d e fj n e d b y t h e c

  • s

i n e s i m i l a r i t y , w h i c h i s c

  • m

p u t e d a s :

slide-14
SLIDE 14

1 4

  • G

i v e n t h e c

  • r

d i n a t e m a t r i x f

  • r

t h e l

  • t

h l a y e r

  • f

n e t w

  • r

k , a u t h

  • r

s c

  • m

p r e s s t h e d i s t a n c e f u n c t i

  • n

E q . ( 3 ) b e t w e e n t a s k

  • s

p e c i fj c a c t i v a t i

  • n

f u n c t i

  • n

s w i t h t h e f

  • l

l

  • w

i n g r e g u l a r i z a t i

  • n
  • t

h e t r a i n i n g l

  • s

s

  • f

a T A A N w i t h L t a s k

  • s

p e c i fj c a c t i v a t i

  • n

l a y e r s b e c

  • m

e s

  • Wh

e r e a n d c i s t h e r e g u l a r i z a t i

  • n

c

  • e

ffj c i e n t

F u F u n c n c t i t i

  • n

a n a l R e g u l a r i z a t i

  • n

Functional regularization by distance

slide-15
SLIDE 15

1 5

  • G

i v e n t h e c

  • r

d i n a t e m a t r i x f

  • r

t h e l

  • t

h l a y e r

  • f

n e t w

  • r

k , a u t h

  • r

s c

  • m

p r e s s t h e d i s t a n c e f u n c t i

  • n

E q . ( 3 ) b e t w e e n t a s k

  • s

p e c i fj c a c t i v a t i

  • n

f u n c t i

  • n

s w i t h t h e f

  • l

l

  • w

i n g r e g u l a r i z a t i

  • n
  • t

h e t r a i n i n g l

  • s

s

  • f

a T A A N w i t h L t a s k

  • s

p e c i fj c a c t i v a t i

  • n

l a y e r s b e c

  • m

e s

  • Wh

e r e a n d c i s t h e r e g u l a r i z a t i

  • n

c

  • e

ffj c i e n t

F u F u n c n c t i t i

  • n

a n a l R e g u l a r i z a t i

  • n

Functional regularization by distance

slide-16
SLIDE 16

1 6

  • C
  • n

d u c t e x p e r i m e n t s

  • n

Y

  • u

t u b e

  • 8

M , a l a r g e d a t a s e t t h a t c

  • n

s i s t s

  • f
  • v

e r 6 . 1 b i l l i

  • n
  • f

Y

  • u

t u b e v i d e

  • s

. E a c h v i d e

  • h

a s m u l t i p l e l a b e l s f r

  • m

a v

  • c

a b u l a r y

  • f

3 8 t

  • p

i c a l e n t i t i e s , w h i c h c a n b e f u r t h e r g r

  • u

p e d i n t

  • 2

4 t

  • p
  • l

e v e l c a t e g

  • r

i e s .

  • T
  • c

r e a t e a n M T L e x p e r i m e n t , a u t h

  • r

s c

  • n

s i d e r e a c h t

  • p
  • l

e v e l c a t e g

  • r

y a s a s p e c i fj c d

  • m

a i n .

  • F
  • r

e a c h d

  • m

a i n , t h e y h a v e t

  • d

e fj n e a m u l t i

  • l

a b e l c l a s s i fj e r t

  • r

e c

  • g

n i z e v a r i

  • u

s a tu r i b u t e s

  • f

t h e d a t a

E x p e r i m e n t s n t s

Multi-Domain Multi-Label Classification

slide-17
SLIDE 17

1 7

  • Tie

t a s k I D s a n d t h e i r c

  • r

r e s p

  • n

d i n g d

  • m

a i n s

E x p e r i m e n t s n t s

Multi-Domain Multi-Label Classification

slide-18
SLIDE 18

1 8

E x p e r i m e n t s n t s

Multi-Domain Multi-Label Classification

slide-19
SLIDE 19

1 9

E x p e r i m e n t s n t s

Multi-Domain Multi-Label Classification

slide-20
SLIDE 20

2

E x p e r i m e n t s n t s

Visualization

  • T

A A N i s a b l e t

  • c

a p t u r e t h e c

  • m

p l i c a t e d k n

  • w

l e d g e s h a r i n g f

  • r

t h e t a s k s

  • n

t h e Y

  • u

t u b e

  • 8

M d a t a s e t . F

  • r

i n s t a n c e , d

  • m

a i n “ F

  • d

& D r i n k ” s h a r e s a l l t h e h i d d e n l a y e r s w i t h d

  • m

a i n “ H

  • m

e & G a r d e n ” . T A A N a l s

  • d

i s c

  • v

e r t h a t t h e d

  • m

a i n s “ F

  • d

& D r i n k ” a n d “ I n t e r n e t & T e l e c

  • m

” a r e t h e m

  • s

t u n r e l a t e d , a s t h e d i s t a n c e s b e t w e e n t h e i r a c t i v a t i

  • n

f u n c t i

  • n

s a r e a l w a y s h i g h .

Distance matrices of the activation functions in TAAN. Light colors denote less similarity.

slide-21
SLIDE 21

Questions?

slide-22
SLIDE 22

Thank you for your attention!

slide-23
SLIDE 23

2 3

R e f e r e n c n c e s