I ma g e C a p t i o n i n g A mb a r P a l I I - - PowerPoint PPT Presentation

i ma g e c a p t i o n i n g
SMART_READER_LITE
LIVE PREVIEW

I ma g e C a p t i o n i n g A mb a r P a l I I - - PowerPoint PPT Presentation

I ma g e C a p t i o n i n g A mb a r P a l I I I T D e l h i M a y 2 3 , 2 0 1 6 P r o b l e m D e s c r i p t i o n - 1 I n n o v e l o b j e c t d i s c o v e r y ,


slide-1
SLIDE 1

I ma g e C a p t i

  • n

i n g

A mb a r P a l I I I T D e l h i

M a y 2 3 , 2 1 6

slide-2
SLIDE 2

I n n

  • v

e l

  • b

j e c t d i s c

  • v

e r y , w e w a n t t

  • d

e s c r i b e

  • b

j e c t s i n t h e i ma g e w h i c h w e r e a r e n

  • t

p r e s e n t i n t h e p a i r e d t r a i n i n g d a t a

Cat Dogs Running in the Park Lots of fruits on the table

P r

  • b

l e m D e s c r i p t i

  • n
  • 1

A Cat is sitting on the table

slide-3
SLIDE 3

Dogs Running in the Park Lots of fruits on the table

P r

  • b

l e m D e s c r i p t i

  • n
  • 2

An object like a dog is sitting on the table

A h a r d e r v e r s i

  • n
  • f

t h e p r

  • b

l e m

slide-4
SLIDE 4

T h i s i s a w

  • r

k b y V i n y a l s e t a l ( 2 1 5 ) , S h

  • w

a n d T e l l : A N e u r a l I ma g e C a p t i

  • n

G e n e r a t

  • r

P r e v i

  • u

s A p p r

  • a

c h e s

  • 1

Image taken from http://arxiv.org/abs/1411.4555

slide-5
SLIDE 5

P r e v i

  • u

s A p p r

  • a

c h e s

  • 1
  • G

e n e r a t e a f i x e d l e n g t h v e c t

  • r

r e p r e s e n t a t i

  • n
  • f

t h e i ma g e , s a y V

i

. T h i s r e p r e s e n t a t i

  • n

i s t a k e n b y e x t r a c t i n g t h e f e a t u r e s f r

  • m

a h i d d e n l a y e r

  • f

a p r e t r a i n e d C N N

  • C
  • n

v e r t t h e c a p t i

  • n

t

  • a

v e c t

  • r

r e p r e s e n t a t i

  • n

, s a y V

c

  • V

i

i s f e d t

  • a

n L

  • n

g S h

  • r

t T e r m M e mo r y R N N ( L S T M ) a s a u x i l i a r y i n p u t

  • N
  • w

V

c

i s f e d , e l e me n t b y e l e me n t t

  • t

h e R N N t

  • t

r a i n i t

slide-6
SLIDE 6

T h i s i s a w

  • r

k b y F a n g e t a l ( 2 1 5 ) , F r

  • m

C a p t i

  • n

s t

  • V

i s u a l C

  • n

c e p t s a n d B a c k

P r e v i

  • u

s A p p r

  • a

c h e s

  • 2

Image taken from https://arxiv.org/abs/1411.4952

slide-7
SLIDE 7

Wo r d D e t e c t i

  • n

S t a g e T h e y f i l t e r

  • u

t t h e mo s t c

  • mmo

n w

  • r

d s i n t h e t r a i n i n g s e t a n d t h e n u s e n

  • i

s y

  • O

R M u l t i p l e I n s t a n c e L e a r n i n g

  • t

a k i n g a s l i d i n g w i n d

  • w
  • n

t h e i ma g e a s t h e b a g s

  • t
  • d

e t e r mi n e t h e r e g i a f

  • r

e a c h w

  • r

d f

  • r

e a c h i ma g e . T h e n e e d f

  • r

b

  • u

n d i n g b

  • x

e s i s e l i mi n a t e d b y t a k i n g t h e r e p r e s e n t a t i

  • n

g e n e r a t e d b y a C N N t r a i n e d

  • n

I ma g e N e t f

  • r

t h e b a s e

  • f

t h e M I L

P r e v i

  • u

s A p p r

  • a

c h e s

  • 2

Image taken from https://arxiv.org/abs/1411.4952

slide-8
SLIDE 8

S e n t e n c e G e n e r a t i

  • n

S t a g e F

  • r

e a c h i ma g e , w i t h t h e s u g g e s t e d p r

  • b

a b i l i t y

  • f

e a c h w

  • r

d f

  • r

t h e i ma g e , t h e y u s e a M a x i mu m E n t r

  • p

y L a n g u a g e M

  • d

e l t

  • p

r e d i c t t h e b e s t s e n t e n c e s f i t t i n g t h e w

  • r

d s . T h e f e a t u r e s u s e d f

  • r

t h e L M a r e s p e c i f i e d R a n k i n g G e n e r a t e d S e n t e n c e s T h e y t h e n r a n k t h e s e n t e n c e s g e n e r a t e d

P r e v i

  • u

s A p p r

  • a

c h e s

  • 2
slide-9
SLIDE 9
  • I

d e a l l y , w e w

  • u

l d l i k e t

  • c
  • mp

l e t e l y e l i mi n a t e t h e n e e d f

  • r

c a p t i

  • n

s d u r i n g t r a i n t i me

  • We

w

  • u

l d w a n t t

  • g

e n e r a t e a mo d e l t h a t t a k e s a s i n p u t

  • n

l y w h a t e a c h

  • b

j e c t i s c a l l e d , a n d i n w h a t c

  • n

t e x t i s a n

  • b

j e c t d e s c r i b e d

  • U

s i n g t h e s e p i e c e s

  • f

i n f

  • r

ma t i

  • n

, i t s h

  • u

l d b e a b l e t

  • g

e n e r a t e c

  • h

e r e n t s e n t e n c e s w h i c h d e s c r i b e i ma g e s

  • P

r e r e q u i s i t e s a r e , a g

  • d

w a y t

  • l

e a r n t h e r e p r e s e n t a t i

  • n

s

  • f

i ma g e s a n d a g

  • d

L a n g u a g e M

  • d

e l

  • T

h

  • u

g h D C C d

  • e

s n

  • t

e l i mi n a t e t h e n e e d

  • f

p a i r e d i ma g e

  • t

e x t d a t a , i t i s a s t e p i n t h i s d i r e c t i

  • n

M

  • t

i v a t i

  • n
slide-10
SLIDE 10

T h e D e e p C

  • mp
  • s

i t i

  • n

a l C a p t i

  • n

e r i s a w

  • r

k b y H e n d r i c k s e t a l . ( 2 1 6 )

D C C

Image taken from https://arxiv.org/abs/1511.05284

slide-11
SLIDE 11

L e x i c a l C l a s s i f i e r

I t s w

  • r

k i s t

  • ma

p i ma g e s t

  • mu

l t i p l e l a b e l s i n f

  • r

m

  • f

“ V i s u a l C

  • n

c e p t s ” A v i s u a l c

  • n

c e p t ma y b e a n

  • b

j e c t ( T a b l e )

  • r

a n a b s t r a c t f

  • r

m( b e a u t i f u l ) T h e v i s u a l c

  • n

c e p t s a r e mi n e d f r

  • m

t e x t c

  • r

p

  • r

a b y u s i n g p r e v i

  • u

s w

  • r

k i n t h e a r e a

  • f

p a r t

  • f
  • s

p e e c h i d e n t i f i c a t i

  • n

N

  • w

a p r e t r a i n e d D N N i s t a k e n , f

  • r

e x a mp l e A l e x n e t a n d t h e p r

  • b

a b i l i t y l a y e r i s r e mo v e d A c r

  • s

s

  • e

n t r

  • p

y l a y e r i s n

  • w

a d d e d t

  • t

h e n e t a t t h e e n d t

  • h

a n d l e mu l t i p l e l a b e l s T h i s i ma g e f e a t u r e v e c t

  • r

F

i

  • f

s i z e 1 x m i s t

  • b

e u s e d l a t e r

slide-12
SLIDE 12

L a n g u a g e M

  • d

e l

I t c

  • n

s i s t s

  • f

a n e mb e d d i n g l a y e r , L S T M , a n d a w

  • r

d p r e d i c t i

  • n

l a y e r E mb e d d i n g L a y e r I n p u t i s a

  • n

e h

  • t

w

  • r

d v e c t

  • r

r e p r e s e n t a t i

  • n

, < , , . . . , , 1 , , … , > [ 1 x m ] T h e e mb e d d i n g l a y e r h a s a M a t r i x , M

  • f

S i z e [ m x n ] a n d i t c

  • mp

u t e s w

i

* M t

  • g

i v e a [ 1 x n ] w

  • r

d v e c t

  • r

r e p r e s e n t a t i

  • n
slide-13
SLIDE 13

L a n g u a g e M

  • d

e l

1 . T h e w

  • r

d i s f e d i n t

  • a

n e mb e d d i n g l a y e r t

  • g

e t a v e c t

  • r

r e p r e s e n t a t i

  • n

2 . T h i s v e c t

  • r

r e p r e s e n t a t i

  • n

i s t h e n f e d a s i n p u t t

  • a

n L S T M t h a t p r

  • d

u c e s a n

  • u

t p u t v e c t

  • r

Image taken from https://arxiv.org/abs/1511.05284

slide-14
SLIDE 14

L a n g u a g e M

  • d

e l

3 . T h i s

  • u

t p u t v e c t

  • r

i s c

  • n

c a t e n a t e d w i t h t h e w

  • r

d e mb e d d i n g t

  • g

i v e t h e L a n g u a g e F e a t u r e F

l

4 . F

l

i s t h e n i n p u t t

  • a

f u l l y c

  • n

n e c t e d l a y e r t

  • p

r e d i c t t h e n e x t w

  • r

d i n t h e s e q u e n c e T h e y a l s

  • e

n f

  • r

c e a c

  • n

s t r a i n t t h a t t h e s a me w

  • r

d c a n n

  • t

b e p r e d i c t e d t w i c e i n a r

  • w

Image taken from https://arxiv.org/abs/1511.05284

slide-15
SLIDE 15

C a p t i

  • n

M

  • d

e l

Image taken from https://arxiv.org/abs/1511.05284

slide-16
SLIDE 16

C a p t i

  • n

M

  • d

e l

T h e c a p t i

  • n

mo d e l c

  • mb

i n e s t h e I ma g e f e a t u r e s a s w e l l a s t h e L a n g u a g e f e a t u r e s t

  • u

t p u t t h e p r

  • b

a b i l i t y

  • f

e a c h c

  • n

c e p t i n t h e i ma g e

Image taken from https://arxiv.org/abs/1511.05284

slide-17
SLIDE 17

T r a n s f e r r i n g O b j e c t I n f

  • r

ma t i

  • n

We a r e t r a n s f e r r i n g i n f

  • r

ma t i

  • n

t

  • n
  • v

e l

  • b

j e c t s , i . e . w

  • r

d s t h a t d

  • n
  • t

a p p e a r i n t h e t r a i n i n g p a i r e d

  • b

j e c t s e t

Image taken from https://arxiv.org/abs/1511.05284

slide-18
SLIDE 18

D i r e c t T r a n s f e r

I n t h i s f

  • r

m

  • f

t r a n s f e r , u s e d f

  • r

Wi , w e d i r e c t l y c

  • p

y t h e w e i g h t s f r

  • m

t h e s e ma n t i c a l l y s i mi l a r w

  • r

d a l r e a d y k n

  • w

n 1 . C

  • p

y Wi [ : , s h e e p ] t

  • Wi

[ : , a l p a c a ] 2 . S e t Wi [ a l p a c a , a l p a c a ] t

  • Wi

[ s h e e p ] [ s h e e p ] 3 . S e t Wi [ s h e e p , a l p a c a ] a n d Wi [ a l p a c a , s h e e p ] t

  • Image taken from https://arxiv.org/abs/1511.05284
slide-19
SLIDE 19

D e l t a T r a n s f e r

T r a n s f e r s w e i g h t s t

  • a

n e w w

  • r

d b a s e d

  • n

h

  • w

t h e w e i g h t s f

  • r

t h e

  • l

d w

  • r

d c h a n g e

  • n

t r a i n i n g

  • n

p a i r e d i ma g e

  • t

e x t d a t a 1 . C a l c u l a t e t h e c h a n g e i n w e i g h t s

  • f

s h e e p 2 . U s e t h i s c h a n g e t

  • g

e n e r a t e t h e c a p t i

  • n

mo d e l w e i g h t s f

  • r

a l p a c a

Image taken from https://arxiv.org/abs/1511.05284

slide-20
SLIDE 20

M e t r i c s

  • B

L E U

I n p u t : N i ma g e s , e a c h c

  • n

t a i n i n g

  • n

e C a n d i d a t e S e n t e n c e t

  • b

e c

  • mp

a r e d a g a i n s t a r e f e r e n c e s e n t e n c e s e t R N

  • G

r a ms : A s e t

  • f

a l l

  • r

d e r e d p a i r s

  • f

N

  • w
  • r

d s t h a t a r e a d j a c e n t E x a mp l e : C a n d i d a t e – A c a t i s s i t t i n g

  • n

a c h a i r 1

  • g

r a ms ( u n i g r a ms ) : A , c a t , i s , s i t t i n g ,

  • n

, a , c h a i r 2

  • g

r a ms ( b i g r a ms ) : A c a t , c a t i s , i s s i t t i n g , s i t t i n g

  • n

,

  • n

a , a c h a i r

slide-21
SLIDE 21

M e t r i c s

  • B

L E U

M

  • d

i f i e d N

  • G

r a m p r e c i s i

  • n

( P ) T h e c l i p p i n g i s d

  • n

e t

  • e

n s u r e t h a t

  • n

e d

  • e

s n

  • t

g e t a b e t t e r s c

  • r

e b y “ t h e t h e t h e t h e t h e ” S c

  • r

e f

  • r

t h e e n t i r e c

  • r

p u s

slide-22
SLIDE 22

M e t r i c s

  • B

L E U

C

  • mb

i n i n g t h e n

  • g

r a m p r e c i s i

  • n

s I t i s s e e n t h a t mo d i f i e d n

  • g

r a m p r e c i s i

  • n

s d e c a y s e x p

  • n

e n t i a l l y w i t h n . H e n c e a w e i g h t e d a v e r a g e

  • f

t h e l

  • g
  • f

t h e mo d i f i e d n

  • g

r a m p r e c i s i

  • n

s i s t a k e n

Image taken from http://www.aclweb.org/anthology/P02-1040.pdf

slide-23
SLIDE 23

M e t r i c s

  • B

L E U

S e n t e n c e L e n g t h P e n a l t y T h e s c h e me u p t

  • n
  • w

c a n s t i l l b e f

  • l

e d b y v e r y s h

  • r

t p h r a s e s U n i g r a m P r e c i s i

  • n

= 1 , B i g r a m P r e c i s i

  • n

= 1

Image taken from http://www.aclweb.org/anthology/P02-1040.pdf

slide-24
SLIDE 24

M e t r i c s

  • B

L E U

S e n t e n c e L e n g t h P e n a l t y B e s t M a t c h L e n g t h = C l

  • s

e s t R e f e r e n c e L e n g t h r = E f f e c t i v e R e f e r e n c e L e n g t h c = C a n d i d a t e L e n g t h B r e v i t y P e n a l t y , I n B L E U

  • 4

, N = 4

Image taken from http://www.aclweb.org/anthology/P02-1040.pdf

slide-25
SLIDE 25

M e t r i c s

  • M

E T E O R

A l i g n me n t s t a g e s : T h e me t e

  • r

h a s d i f f e r e n t p a r t s

  • f

t h e f i r s t s t a g e w h e r e t h e y d

  • t

h e a l i g n me n t

  • f

t h e c a n d i d a t e t

  • t

h e r e f e r e n c e : 1 . E x a c t ma t c h i n g 2 . P l u r a l M a t c h i n g 3 . S y n

  • n

y m M a t c h i n g D i f f e r e n t

  • r

d e r i n g s c a n a f f e c t t h e a l i g n me n t , t h e a b

  • v

e

  • r

d e r i n g i s f

  • l

l

  • w

e d i n g e n e r a l A f t e r ma p p i n g t h r

  • u

g h a l l t h e s e s t a g e s M E T E O R c

  • mp

u t e s t h e l a r g e s t s u b s e t

  • f

u n i g r a m s t h a t c a n b e ma t c h e d . I f mo r e t h a n

  • n

e c

  • r

r e c t ma p p i n g i s t h e r e , M E T E O R s e l e c t s t h a t ma p p i n g t h a t mi n i mi z e s t h e n u mb e r

  • f

c r

  • s

s i n g s

slide-26
SLIDE 26

M e t r i c s

  • M

E T E O R

U n i g r a m P r e c i s i

  • n

( P ) R a t i

  • f

t h e n u mb e r

  • f

u n i g r a ms i n t h e c a p t i

  • n

t h a t a r e ma p p e d ( t

  • u

n i g r a ms i n t h e r e f e r e n c e ) t

  • t

h e t

  • t

a l n u mb e r

  • f

u n i g r a ms i n t h e c a p t i

  • n

E g . T h i s i s a c a t A d

  • g

i s c h a s i n g a c a t i n t h e f i e l d P = 3 / 4

slide-27
SLIDE 27

M e t r i c s

  • M

E T E O R

U n i g r a m R e c a l l ( R ) R a t i

  • f

t h e n u mb e r

  • f

u n i g r a ms i n t h e c a p t i

  • n

t h a t a r e ma p p e d ( t

  • u

n i g r a ms i n t h e r e f e r e n c e ) t

  • t

h e t

  • t

a l n u mb e r

  • f

u n i g r a ms i n t h e r e f e r e n c e E g . T h i s i s a c a t A d

  • g

i s c h a s i n g a c a t i n t h e f i e l d R = 3 / 9 F M e a n

slide-28
SLIDE 28

M e t r i c s

  • M

E T E O R

P e n a l t y f

  • r

b r

  • k

e n ma t c h e s A d j a c e n t u n i g r a ms t h a t a r e ma p p e d a r e g r

  • u

p e d i n t

  • a

c h u n k E x a mp l e T h i s i s a c a t A d

  • g

i s c h a s i n g a c a t i n t h e f i e l d P e n a l t y = . 5 * ( 2 / 3 ) ^ 3

slide-29
SLIDE 29

M e t r i c s

  • F

1

We c

  • n

s i d e r

  • n

l y t h

  • s

e s e n t e n c e s w h i c h h a v e a t l e a s t

  • n

e n

  • v

e l w

  • r

d F P : w h e n a w

  • r

d a p p e a r s i n a s e n t e n c e i t s h

  • u

l d n

  • t

a p p e a r i n F N : w h e n a w

  • r

d d

  • e

s n

  • t

a p p e a r i n a s e n t e n c e i t s h

  • u

l d a p p e a r i n T P : w h e n a w

  • r

d a p p e a r s i n a s e n t e n c e i t s h

  • u

l d a p p e a r i n

Equation taken from https://en.wikipedia.org/wiki/F1_score

slide-30
SLIDE 30

O t h e r M e t r i c s

Table taken from http://mscoco.org/dataset/#captions-leaderboard

slide-31
SLIDE 31

T r a i n i n g D C C

T r a i n i n g t h e L e x i c a l C l a s s i f i e r F

  • r
  • b

j e c t s i n t r a i n s e t , C O C O c a p t i

  • n

s a r e u s e d F

  • r
  • b

j e c t s n

  • t

i n t r a i n s e t , I ma g e N e t i ma g e d e s c r i p t i

  • n

s a r e u s e d T h e C N N i s f i n e t u n e d f r

  • m

V G G

  • 1

6 T r a i n i n g t h e L a n g u a g e M

  • d

e l 1 . M S C O C O t r a i n s e t 2 . O t h e r c

  • r

p

  • r

a e g . F l i c k r 1 M , F l i c k r 3 k , P a s c a l

  • 1

k a n d I ma g e C L E F

  • 2

1 2 3 . B r i t i s h N a t i

  • n

a l C

  • r

p u s ( B N C ) a n d Wi k i p e d i a

slide-32
SLIDE 32

T r a i n i n g D C C

T r a i n i n g t h e C a p t i

  • n

M

  • d

e l F

  • r

d i r e c t t r a n s f e r , Wi a n d Wl a r e t r a i n e d

  • n

i ma g e

  • c

a p t i

  • n

d a t a w h i l e f r e e z i n g a l l

  • t

h e r w e i g h t s , a n d t h e n w e i g h t s a r e t r a n s f e r r e d i n t h e mu l t i mo d a l u n i t F

  • r

d e l t a t r a n s f e r , Wl i s f i r s t f r

  • z

e n , a n d Wi i s t r a i n e d . T h e n b

  • t

h a r e j

  • i

n t l y t r a i n e d

slide-33
SLIDE 33

R e s u l t s

Table taken from https://arxiv.org/abs/1511.05284

slide-34
SLIDE 34

R e s u l t s

Table taken from https://arxiv.org/abs/1511.05284

slide-35
SLIDE 35

R e s u l t s

Image taken from https://arxiv.org/abs/1511.05284

slide-36
SLIDE 36

R e s u l t s

Image taken from https://arxiv.org/abs/1511.05284

slide-37
SLIDE 37

E r r

  • r

s

Image taken from https://arxiv.org/abs/1511.05284

slide-38
SLIDE 38

E r r

  • r

s

Image taken from https://arxiv.org/abs/1511.05284

slide-39
SLIDE 39

E r r

  • r

s

Image taken from https://arxiv.org/abs/1511.05284

slide-40
SLIDE 40

E r r

  • r

s

Image taken from https://arxiv.org/abs/1511.05284

slide-41
SLIDE 41

E r r

  • r

s

Image taken from https://arxiv.org/abs/1511.05284

slide-42
SLIDE 42

E r r

  • r

s

Image taken from https://arxiv.org/abs/1511.05284

slide-43
SLIDE 43

F i n .

T h a n k s