Yeni Kavramlar En Az Denetim ile Temsil Etme ve A cklama Zeynep - - PowerPoint PPT Presentation

yeni kavramlar en az denetim ile temsil etme ve a c klama
SMART_READER_LITE
LIVE PREVIEW

Yeni Kavramlar En Az Denetim ile Temsil Etme ve A cklama Zeynep - - PowerPoint PPT Presentation

Yeni Kavramlar En Az Denetim ile Temsil Etme ve A cklama Zeynep Akata Bilim Akademisi - Bilkent Universitesi Yapay O grenme Yaz Okulu 2020 30 Haziran 2020 1 Outline Generalized Low-Shot Learning with Side-Information


slide-1
SLIDE 1

Yeni Kavramları En Az Denetim ile Temsil Etme ve A¸ cıklama Zeynep Akata

Bilim Akademisi - Bilkent ¨ Universitesi Yapay ¨ O˘ grenme Yaz Okulu 2020 30 Haziran 2020

1

slide-2
SLIDE 2

Outline

Generalized Low-Shot Learning with Side-Information Generating Natural Language Explanations for Visual Decisions Summary and Future Work

2

slide-3
SLIDE 3

Outline

Generalized Low-Shot Learning with Side-Information Generating Natural Language Explanations for Visual Decisions Summary and Future Work

3

slide-4
SLIDE 4

Data Distribution in Large-Scale Datasets

Akata et.al. TPAMI’14

number of classes number of images

4

slide-5
SLIDE 5

Learning via Explanation

Lombrozo TICS’16

5

slide-6
SLIDE 6

Learning via Explanation

Lombrozo TICS’16

5

slide-7
SLIDE 7

Learning via Explanation

Lombrozo TICS’16

5

slide-8
SLIDE 8

Learning via Explanation

Lombrozo TICS’16

5

slide-9
SLIDE 9

Learning via Explanation

Lombrozo TICS’16

5

slide-10
SLIDE 10

Attributes as Explanations

Lampert et al. CVPR’09

class attributes images black-white has tail lives on land small gray has tail lives in water big zebra [1 0 1 1 0 1] whale [0 1 1 0 1 0]

6

slide-11
SLIDE 11

Attributes as Explanations

Lampert et al. CVPR’09

class attributes images black-white has tail lives on land small gray has tail lives in water big zebra [1 0 1 1 0 1] whale [0 1 1 0 1 0]

6

slide-12
SLIDE 12

Attributes as Explanations

Lampert et al. CVPR’09

class attributes images black-white has tail lives on land small gray has tail lives in water big zebra [1 0 1 1 0 1] whale [0 1 1 0 1 0]

6

slide-13
SLIDE 13

Generalized Zero-Shot Learning

images attributes ...

black-white has tail lives on land small

...

gray has tail lives in water big

...

black-white no tail lives on land medium white has tail lives on land tiny

7

slide-14
SLIDE 14

Muldimodal Embeddings

Akata et al. CVPR’13 & TPAMI’16

zebra whale

white black

IMAGES IMAGE FEATURES CLASS ATTRIBUTES CLASS LABELS

8

slide-15
SLIDE 15

Multimodal Embeddings

Akata et al.CVPR’13 & TPAMI’16

S = {(x, y, ϕ(y)) | x ∈ X, y ∈ Ys, ϕ(y) ∈ C} and U = {(y, ϕ(y)) | y ∈ Yu, ϕ(y) ∈ C}

9

slide-16
SLIDE 16

Multimodal Embeddings

Akata et al.CVPR’13 & TPAMI’16

S = {(x, y, ϕ(y)) | x ∈ X, y ∈ Ys, ϕ(y) ∈ C} and U = {(y, ϕ(y)) | y ∈ Yu, ϕ(y) ∈ C} Learn f : X → Y by minimizing regularized empirical risk: 1 N

N

  • n=1

L(yn, f(xn; W)) + Ω(W) L(.) = loss function, Ω(.) = regularization term, using pairwise ranking loss:

9

slide-17
SLIDE 17

Multimodal Embeddings

Akata et al.CVPR’13 & TPAMI’16

S = {(x, y, ϕ(y)) | x ∈ X, y ∈ Ys, ϕ(y) ∈ C} and U = {(y, ϕ(y)) | y ∈ Yu, ϕ(y) ∈ C} Learn f : X → Y by minimizing regularized empirical risk: 1 N

N

  • n=1

L(yn, f(xn; W)) + Ω(W) L(.) = loss function, Ω(.) = regularization term, using pairwise ranking loss: L(xn, yn, y; W) =

  • y∈Ys

[∆(yn, y) + F(xn, y; W) − F(xn, yn; W)]+ with the compatibility function: F(x, y; W) = θ(x)T Wϕ(y)

9

slide-18
SLIDE 18

Benchmark Example Datasets

10

slide-19
SLIDE 19

Benchmark Results

Xian et al. CVPR 2017

CUB AWA Method u s H u s H Supervised Learning – 82.1 – – 96.2 – Multimodal Embeddings 23.7 62.8 34.4 16.8 76.1 27.5 u/s: accYu/s = 1 Yu/s

Yu/s

  • c=1

# correct in c # samples in c and H = 2 ∗ accYs ∗ accYu accYs + accYu

11

slide-20
SLIDE 20

How to Tackle the Missing Data Problem?

Labels are difficult to obtain, attributes require expert knowledge

12

slide-21
SLIDE 21

How to Tackle the Missing Data Problem?

Labels are difficult to obtain, attributes require expert knowledge Proposed solution: Free text to image synthesis!

12

slide-22
SLIDE 22

Detailed Visual Descriptions as Side Information

Reed et al. CVPR’16 The bird has a white underbelly, black feathers in the wings, a large wingspan, and a white beak. This bird has distinctive-looking brown and white stripes all over its body, and its brown tail sticks up. This swimming bird has a black crown with a large white strip on its head, and yellow eyes. This flower has a central white blossom surrounded by large pointed red petals which are veined and leaflike. Light purple petals with orange and black middle green leaves This flower is yellow and orange in color, with petals that are ruffled along the edges.

13

slide-23
SLIDE 23

Deep Representations of Text

Reed et al. CVPR’16

The beak is yellow and pointed and the wings are blue. Convolutional encoding Sequential encoding 14

slide-24
SLIDE 24

GAN1 Conditioned on Text

Reed et al. ICML’16 & NIPS’16

This flower has small, round violet petals with a dark purple center

φ φ

z ~ N(0,1)

This flower has small, round violet petals with a dark purple center

Generator Network Discriminator Network

φ(t) x := G(z,φ(t)) D(x’,φ(t))

1Generative Adversarial Networks [Goodfellow et al. NIPS’14]

15

slide-25
SLIDE 25

Text to Image Synthesis Results

‘Blue bird with black beak’ → ‘Red bird with black beak’ ‘Small blue bird with black wings’ → ‘Small yellow bird with black wings’ ‘This bird is bright.’ → ‘This bird is dark.’ ‘This bird is completely red with black wings’ ‘A small sized bird that has a cream belly and a short pointed bill’ ‘This is a yellow bird. The wings are bright blue’

16

slide-26
SLIDE 26

Generalized Zero-Shot Learning with Synthesized Images

CUB Data u s H Only real data 23.7 62.8 34.4

17

slide-27
SLIDE 27

Generalized Zero-Shot Learning with Synthesized Images

CUB Data u s H Only real data 23.7 62.8 34.4 With generated images 23.8 48.5 31.9 This is not better than having no images!

17

slide-28
SLIDE 28

f-CLSWGAN for Text to Image Feature Synthesis

Xian et al. CVPR’18

Head color: red Back color: black Crown color: red Wing shape: short

z ~ N(0, 1)

G(z, a) seen unseen

ResNet space

? x xg

f-CLSWGAN

CNN CNN

CNN feature space synthetic image real image

This is a small bird with a brown head and a yellow belly.

18

slide-29
SLIDE 29

f-CLSWGAN for Text to Image Feature Synthesis

Xian et al. CVPR’18

Head color: red Back color: black Crown color: red Wing shape: short

z ~ N(0, 1)

G(z, a) seen unseen

ResNet space

? x xg

f-CLSWGAN

CNN CNN

CNN feature space synthetic image real image

This is a small bird with a brown head and a yellow belly.

S = {(x, y, ϕ(y)) | x ∈ X, y ∈ Ys, ϕ(y) ∈ C} and U = {(˜ x, y, ϕ(y)) | ˜ x = G(z, ϕ(y)), y ∈ Yu, ϕ(y) ∈ C} : combine to train a classifier

18

slide-30
SLIDE 30

Generalized Zero-Shot Learning with Synthesized Image Features

CUB Data u s H Only real data 23.7 62.8 34.4 With generated images 23.8 48.5 31.9

19

slide-31
SLIDE 31

Generalized Zero-Shot Learning with Synthesized Image Features

CUB Data u s H Only real data 23.7 62.8 34.4 With generated images 23.8 48.5 31.9 With generated features (f-CLSWGAN) 43.7 57.7 49.7

19

slide-32
SLIDE 32

CADA-VAE for Text to Latent Feature Synthesis

Sch¨

  • nfeld et al. CVPR’19

E1 D1 E2 D2 red head pink belly brown wings gray beak E1 E2 D1 D2 red head pink belly brown wings gray beak 20

slide-33
SLIDE 33

CADA-VAE for Text to Latent Feature Synthesis

Sch¨

  • nfeld et al. CVPR’19

E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 COMPACT FIGURES (SMALL ENOUGH TO PUT 3 IN A ROW) SLIGHTLY MORE DETAILED FIGURES (PROBABLY TOO BIG TO PUT 3 IN A ROW) DETAILED FIGURE (THE EQUATIONS ON THE RIGHT ARE THE CROSS-RECONSTRUCTION LOSS. THE BASIC VAE LOSS IS NOT SHOWN)

CADA-VAE: DA-VAE: CA-VAE:

Current choice:

red head pink belly brown wings gray beak 20

slide-34
SLIDE 34

CADA-VAE for Text to Latent Feature Synthesis

Sch¨

  • nfeld et al. CVPR’19

E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 E1 E2 D1 D2 COMPACT FIGURES (SMALL ENOUGH TO PUT 3 IN A ROW) SLIGHTLY MORE DETAILED FIGURES (PROBABLY TOO BIG TO PUT 3 IN A ROW) DETAILED FIGURE (THE EQUATIONS ON THE RIGHT ARE THE CROSS-RECONSTRUCTION LOSS. THE BASIC VAE LOSS IS NOT SHOWN)

CADA-VAE: DA-VAE: CA-VAE:

Current choice:

red head pink belly brown wings gray beak

S = {(z, y, c) | z ∈ z1, y ∈ Ys, c ∈ C} and U = {(z, y, c) | z ∈ z2, y ∈ Yu, c ∈ C} : combine to train a classifier

20

slide-35
SLIDE 35

Generalized Zero-Shot Learning with Latent Features

CUB Data u s H Only real data 23.7 62.8 34.4 With generated images 23.8 48.5 31.9 With generated features (f-CLSWGAN) 43.7 57.7 49.7 With generated features (CADA-VAE) 63.6 51.6 52.4

21

slide-36
SLIDE 36

f-VAEGAN-D2 for Text to Image Feature Synthesis

Xian et al. CVPR’19

Encoder (E) Decoder/Generator(G)

Cape May Warbler Seen Feature Reconstruction (f-VAE)

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Discriminator2 (D2)

VAE GAN D2

Transductive Learning (D2) Novel Feature Generation (f-WGAN) Seen Feature Reconstruction (f-VAE)

22

slide-37
SLIDE 37

f-VAEGAN-D2 for Text to Image Feature Synthesis

Xian et al. CVPR’19

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Seen Feature Reconstruction (f-VAE) Novel Feature Generation (f-WGAN)

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Discriminator2 (D2)

VAE GAN D2

Transductive Learning (D2) Novel Feature Generation (f-WGAN) Seen Feature Reconstruction (f-VAE)

22

slide-38
SLIDE 38

f-VAEGAN-D2 for Text to Image Feature Synthesis

Xian et al. CVPR’19

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Seen Feature Reconstruction (f-VAE) Novel Feature Generation (f-WGAN)

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Discriminator2 (D2)

VAE GAN D2

Transductive Learning (D2) Novel Feature Generation (f-WGAN) Seen Feature Reconstruction (f-VAE)

S = {(xs, y, c(ys)) | xs ∈ X, y ∈ Ys, c(ys) ∈ C} and U = {(ˆ xu, y, c(yu)) | ˆ xu = G(z, ϕ(y)), y ∈ Yu, c(yu) ∈ C}: combine to train a classifier

22

slide-39
SLIDE 39

Generalized Zero-Shot Learning with Synthesized Image Features

CUB Data u s H Only real data 23.7 62.8 34.4 With generated images 23.8 48.5 31.9 With generated features (f-CLSWGAN) 43.7 57.7 49.7 With generated features (CADA-VAE) 63.6 51.6 52.4 With generated features (f-VAEGAN-D2) 63.2 75.6 68.9

23

slide-40
SLIDE 40

Generalized Few-Shot Learning Results

# training samples per class 1 2 5 10 Harmonic mean 30 35 40 45 50 55 60 65

CUB Softmax

24

slide-41
SLIDE 41

Generalized Few-Shot Learning Results

# training samples per class 1 2 5 10 Harmonic mean 30 35 40 45 50 55 60 65

CUB CADA-VAE f-VAEGAN-D2-ind Softmax

24

slide-42
SLIDE 42

f-VAEGAN-D2 for Text to Image Feature Synthesis

Xian et al. CVPR’19

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Seen Feature Reconstruction (f-VAE) Novel Feature Generation (f-WGAN)

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Discriminator2 (D2)

VAE GAN D2

Transductive Learning (D2) Novel Feature Generation (f-WGAN) Seen Feature Reconstruction (f-VAE)

25

slide-43
SLIDE 43

f-VAEGAN-D2 for Text to Image Feature Synthesis

Xian et al. CVPR’19

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Seen Feature Reconstruction (f-VAE) Discriminator2 (D2)

VAE GAN

Novel Feature Generation (f-WGAN)

D2

Transductive Learning (D2)

Encoder (E) Decoder/Generator(G)

Cape May Warbler Discriminator1 (D1) Discriminator2 (D2)

VAE GAN D2

Transductive Learning (D2) Novel Feature Generation (f-WGAN) Seen Feature Reconstruction (f-VAE)

25

slide-44
SLIDE 44

Generalized Zero-Shot Learning with Synthesized Image Features

CUB Data u s H Only real data 23.7 62.8 34.4 With generated images 23.8 48.5 31.9 With generated features (f-CLSWGAN) 43.7 57.7 49.7 With generated features (CADA-VAE) 63.6 51.6 52.4 With generated features (f-VAEGAN-D2) 63.2 75.6 68.9 With generated features (f-VAEGAN-D2 tran) 73.8 81.4 77.3

26

slide-45
SLIDE 45

Generalized Few-Shot Learning Results

# training samples per class 1 2 5 10 Harmonic mean 30 35 40 45 50 55 60 65

CUB f-VAEGAN-D2-tran CADA-VAE f-VAEGAN-D2-ind Softmax

27

slide-46
SLIDE 46

Conclusions

Language complements visual information

  • 1. Provides an intuitive interface for the model
  • 2. Strong and generalizable: any-shot image classification
  • 3. Guides generative models for learning representations

Akata et al. IEEE CVPR 2013, 2015, 2016, TPAMI 2014, 2016 Reed et al. IEEE CVPR 2016 & ICML 2016 & NIPS 2016 Xian et al. IEEE CVPR 2016, 2017, 2018, 2019a, 2019b Sch¨

  • nfeld et al. IEEE CVPR 2019; Dutta and Akata IEEE CVPR 2019

28

slide-47
SLIDE 47

Outline

Generalized Low-Shot Learning with Side-Information Generating Natural Language Explanations for Visual Decisions Summary and Future Work

29

slide-48
SLIDE 48

Human Machine Communication: Visual Question Answering

30

slide-49
SLIDE 49

Human Machine Communication: Visual Question Answering

What type of bird is this?

30

slide-50
SLIDE 50

Human Machine Communication: Visual Question Answering

What type of bird is this? It is a Cardinal What type of bird is this? It is a Cardinal because it is a red bird with a red beak and a black face Why not a Vermilion Flycatcher? It is not a Vermilion Flycatcher because it does not have black wings.

30

slide-51
SLIDE 51

Human Machine Communication: Visual Question Answering

What type of bird is this? It is a Cardinal because it is a red bird with a red beak and a black face What type of bird is this? It is a Cardinal because it is a red bird with a red beak and a black face Why not a Vermilion Flycatcher? It is not a Vermilion Flycatcher because it does not have black wings.

30

slide-52
SLIDE 52

Human Machine Communication: Visual Question Answering

What type of bird is this? It is a Cardinal because it is a red bird with a red beak and a black face

30

slide-53
SLIDE 53

Human Machine Communication: Visual Question Answering

What type of bird is this? It is a Cardinal because it is a red bird with a red beak and a black face Why not a Vermilion Flycatcher?

30

slide-54
SLIDE 54

Human Machine Communication: Visual Question Answering

What type of bird is this? It is a Cardinal because it is a red bird with a red beak and a black face Why not a Vermilion Flycatcher? It is not a Vermilion Flycatcher because it does not have black wings.

30

slide-55
SLIDE 55

Grounding Visual Explanations

Hendricks et al. ECCV’16 & ECCV’18

Explanation Sampler

This red bird has a red beak and a black face.

31

slide-56
SLIDE 56

Grounding Visual Explanations

Hendricks et al. ECCV’16 & ECCV’18

Explanation Sampler

attribute chunker This red bird has a red beak and a black face.

Explanation Grounder

red bird red beak black face

31

slide-57
SLIDE 57

Grounding Visual Explanations

Hendricks et al. ECCV’16 & ECCV’18

Explanation Sampler

attribute chunker attribute chunker This red bird has a red beak and a black face. This red bird has a black beak and a black face.

Explanation Grounder

red bird black beak black face red bird red beak black face

31

slide-58
SLIDE 58

Grounding Visual Explanations

Hendricks et al. ECCV’16 & ECCV’18

Explanation Sampler

1.02 attribute chunker 2.05 attribute chunker This red bird has a red beak and a black face. This red bird has a black beak and a black face.

Phrase-Critic Explanation Grounder

red beak black face red bird black face red bird black beak

red bird black beak black face red bird red beak black face

31

slide-59
SLIDE 59

Generating Visual Explanations Results

D: this bird has a white breast black wings and a red spot on its head. E: this is a white bird with a black wing and a black and white striped head. D: this bird has a white breast black wings and a red spot on its head. E: this is a black and white bird with a red spot on its crown. This is a Downy Woodpecker because... This is a Downy Woodpecker because...

Explanation: ...this is a brown and white spotted bird with a long pointed beak. Correct: Laysan Albatross, Predicted: Cactus Wren Correct & Predicted: Laysan Albatross Explanation: ...this bird has a white head and breast with a long hooked bill.

Cactus Wren Definition: ...this bird has a long thin beak with a brown body and black spotted feathers. Laysan Albatross Definition: ...this bird has a white head and breast a grey back and wing feathers and an orange beak. 32

slide-60
SLIDE 60

Generating Visual Explanations Results

D: this bird has a white breast black wings and a red spot on its head. E: this is a white bird with a black wing and a black and white striped head. D: this bird has a white breast black wings and a red spot on its head. E: this is a black and white bird with a red spot on its crown. This is a Downy Woodpecker because... This is a Downy Woodpecker because...

Explanation: ...this is a brown and white spotted bird with a long pointed beak. Correct: Laysan Albatross, Predicted: Cactus Wren Correct & Predicted: Laysan Albatross Explanation: ...this bird has a white head and breast with a long hooked bill.

Cactus Wren Definition: ...this bird has a long thin beak with a brown body and black spotted feathers. Laysan Albatross Definition: ...this bird has a white head and breast a grey back and wing feathers and an orange beak. 32

slide-61
SLIDE 61

Grounding Visual Explanations and Counterfactuals

This is a Red Winged Blackbird because …. this is a black bird with a red spot on its wingbars. Score: -11.29 this is a black bird with a red wing and a pointy black beak. This is a Red Faced Cormorant because …. this is a black bird with long neck and a red cheek patch. Score: -10.22 this is a black bird with a red cheek patch and a long white beak. This is a White Breasted Nuthatch because …. this is a white bird with a black crown and a black eye. Score: -13.20 this bird has a speckled belly and breast with a short pointy bill.

This bird is a Crested Auklet because this is a black bird with a small orange beak and it is not a Red Faced Cormorant because it does not have a long flat bill. This bird is a Parakeet Auklet because this is a black bird with a white belly and small feet and it is not a Horned Grebe because it does not have red eyes. This bird is a Least Auklet because this is a black and white spotted bird with a small beak and it is not a Belted Kingfisher because it does not have a long pointy bill.

33

slide-62
SLIDE 62

Grounding Visual Explanations and Counterfactuals

This is a Red Winged Blackbird because …. this is a black bird with a red spot on its wingbars. Score: -11.29 this is a black bird with a red wing and a pointy black beak. This is a Red Faced Cormorant because …. this is a black bird with long neck and a red cheek patch. Score: -10.22 this is a black bird with a red cheek patch and a long white beak. This is a White Breasted Nuthatch because …. this is a white bird with a black crown and a black eye. Score: -13.20 this bird has a speckled belly and breast with a short pointy bill.

Counterfactuals: Contrasting explanations are intuitive and informative This bird is a Crested Auklet because this is a black bird with a small orange beak and it is not a Red Faced Cormorant because it does not have a long flat bill. This bird is a Parakeet Auklet because this is a black bird with a white belly and small feet and it is not a Horned Grebe because it does not have red eyes. This bird is a Least Auklet because this is a black and white spotted bird with a small beak and it is not a Belted Kingfisher because it does not have a long pointy bill.

33

slide-63
SLIDE 63

Textual Explanations for Self Driving Vehicles

Kim et al. ECCV’18

The car heads down the road because traffic is moving at a steady pace. The car is slowing because it is approaching a stop sign. The car is stopped because the car in front of it is stopped.

34

slide-64
SLIDE 64

Modeling Conceptual Understanding

Rodriguez et al. NeurIPS’19

Image reference game between agents with variations in the understanding of the world

Round Red

??? 35

slide-65
SLIDE 65

Modeling Conceptual Understanding

Rodriguez et al. NeurIPS’19

Image reference game between agents with variations in the understanding of the world

Round Red

??? 35

slide-66
SLIDE 66

Modeling Conceptual Understanding

Rodriguez et al. NeurIPS’19

Image reference game between agents with variations in the understanding of the world

Round Red

??? 35

slide-67
SLIDE 67

Modeling Conceptual Understanding

Rodriguez et al. NeurIPS’19

Speaker Listener (color-blind)

Red beak

Speaker Listener (color-blind)

  • Reward

“It’s image ”

+1 -1 Agent Embedding

= Cone beak Cone beak Yellow feet Red beak Red beak Cone beak Yellow feet Cone beak Red beak Yellow feet Cone beak Red beak

  • Red beak

Yellow feet Cone beak Yellow feet Yellow feet Cone beak

  • r

36

slide-68
SLIDE 68

Modeling Conceptual Understanding

Rodriguez et al. NeurIPS’19

Speaker Listener (color-blind)

Red beak

Speaker Listener (color-blind)

  • Reward

“It’s image ”

+1 -1 Agent Embedding

= Cone beak Red beak Cone beak Yellow feet Red beak Cone beak Cone beak Yellow feet Red beak Cone beak Red beak Yellow feet Cone beak Yellow feet Yellow feet Red beak Red beak Cone beak Yellow feet Cone beak Red beak Yellow feet Cone beak Red beak

  • Red beak

Yellow feet Cone beak Yellow feet Yellow feet Cone beak

  • r

36

slide-69
SLIDE 69

Modeling Conceptual Understanding

Rodriguez et al. NeurIPS’19

Speaker Listener (color-blind)

Red beak

Speaker Listener (color-blind)

  • Reward

“It’s image ”

+1 -1 Agent Embedding

= Cone beak Red beak Cone beak Yellow feet Red beak Cone beak Cone beak Yellow feet Red beak Cone beak Red beak Yellow feet Cone beak Yellow feet Yellow feet Red beak Red beak Cone beak Yellow feet Cone beak Red beak Yellow feet Cone beak Red beak

  • Red beak

Yellow feet Cone beak Yellow feet Yellow feet Cone beak

  • r
  • Agent

Embedding

= Cone beak Red beak Cone beak Yellow feet Red beak

  • Yellow feet

Cone beak

36

slide-70
SLIDE 70

Modeling Conceptual Understanding

Rodriguez et al. NeurIPS’19

Speaker Listener (color-blind)

Red beak

Speaker Listener (color-blind)

  • Reward

“It’s image ”

+1 -1 Agent Embedding

= Cone beak Red beak Cone beak Yellow feet Red beak Cone beak Cone beak Yellow feet Red beak Cone beak Red beak Yellow feet Cone beak Yellow feet Yellow feet Red beak Red beak Cone beak Yellow feet Cone beak Red beak Yellow feet Cone beak Red beak

  • Red beak

Yellow feet Cone beak Yellow feet Yellow feet Cone beak

  • r
  • Agent

Embedding Reward

“It’s image ”

+1 -1

= Cone beak Red beak Cone beak Yellow feet Red beak

  • Yellow feet

Cone beak

  • Speaker adapts to the listener by incorporating information after each game

36

slide-71
SLIDE 71

Modeling Conceptual Understanding Results

Rodriguez et al. NeurIPS’19

37

slide-72
SLIDE 72

Modeling Conceptual Understanding Results

Rodriguez et al. NeurIPS’19

37

slide-73
SLIDE 73

Modeling Conceptual Understanding Results

Rodriguez et al. NeurIPS’19

37

slide-74
SLIDE 74

Modeling Conceptual Understanding Results

Rodriguez et al. NeurIPS’19

37

slide-75
SLIDE 75

Modeling Conceptual Understanding Qualitative Results

38

slide-76
SLIDE 76

Modeling Conceptual Understanding Qualitative Results

38

slide-77
SLIDE 77

Modeling Conceptual Understanding Qualitative Results

38

slide-78
SLIDE 78

Conclusions

Generating visual/textual explanations

  • 1. A means for model interpretation: necessary to improve deep models
  • 2. Important criteria to trust deep models: through explanations
  • 3. A step towards effective human-machine communication

Hendricks et al. ECCV 2016 & ECCV 2018, Park et al. IEEE CVPR 2018, Kim et al. ECCV 2018 Rodriguez et.al. NeurIPS 2019

39

slide-79
SLIDE 79

Outline

Generalized Low-Shot Learning with Side-Information Generating Natural Language Explanations for Visual Decisions Summary and Future Work

40

slide-80
SLIDE 80

Summary

  • 1. Multi-modal Joint Embeddings tackle lack of visual data

[Akata et al. CVPR’13, CVPR’15, CVPR’16 & TPAMI’14, TPAMI’16]

41

slide-81
SLIDE 81

Summary

  • 1. Multi-modal Joint Embeddings tackle lack of visual data

[Akata et al. CVPR’13, CVPR’15, CVPR’16 & TPAMI’14, TPAMI’16]

  • 2. Vision and Language complement each other for generating novel concepts

[Reed et al. CVPR’16 & ICML’16 & NIPS’16, Xian et al. CVPR’16, CVPR’17, CVPR’18, CVPR’19a & CVPR’19b, Sch¨

  • nfeld et al. CVPR’19, Dutta and Akata CVPR’19 ]

41

slide-82
SLIDE 82

Summary

  • 1. Multi-modal Joint Embeddings tackle lack of visual data

[Akata et al. CVPR’13, CVPR’15, CVPR’16 & TPAMI’14, TPAMI’16]

  • 2. Vision and Language complement each other for generating novel concepts

[Reed et al. CVPR’16 & ICML’16 & NIPS’16, Xian et al. CVPR’16, CVPR’17, CVPR’18, CVPR’19a & CVPR’19b, Sch¨

  • nfeld et al. CVPR’19, Dutta and Akata CVPR’19 ]
  • 3. Developing explainable deep models is important for user acceptance

[Hendricks et al. ECCV’16 & ECCV’18, Park et al. CVPR’18, Kim et al. ECCV’18, Rodriguez et al. NeurIPS’19]

41

slide-83
SLIDE 83

Future of Deeply Explainable Artificial Intelligence

The car is driving down an empty road. A ball is on the side of the road. The car should slow down. The ball is in the middle of the road. There is a child running towards the ball. The car should stop and not hit the child.

42

slide-84
SLIDE 84

Future of Deeply Explainable Artificial Intelligence

The car is driving down an empty road. A ball is on the side of the road. The car should slow down. The ball is in the middle of the road. There is a child running towards the ball. The car should stop and not hit the child.

User: What happened?

42

slide-85
SLIDE 85

Future of Deeply Explainable Artificial Intelligence

The car is driving down an empty road. A ball is on the side of the road. The car should slow down. The ball is in the middle of the road. There is a child running towards the ball. The car should stop and not hit the child.

User: What happened? AI: I was driving down an empty road. I decided to slow down as a ball appeared on the right. I saw a child running towards the ball, so I decided to stop.

42

slide-86
SLIDE 86

Future of Deeply Explainable Artificial Intelligence

The car is driving down an empty road. A ball is on the side of the road. The car should slow down. The ball is in the middle of the road. There is a child running towards the ball. The car should stop and not hit the child.

User: What happened? AI: I was driving down an empty road. I decided to slow down as a ball appeared on the right. I saw a child running towards the ball, so I decided to stop. User: What would have happened if you did not stop ?

42

slide-87
SLIDE 87

Future of Deeply Explainable Artificial Intelligence

The car is driving down an empty road. A ball is on the side of the road. The car should slow down. The ball is in the middle of the road. There is a child running towards the ball. The car should stop and not hit the child.

User: What happened? AI: I was driving down an empty road. I decided to slow down as a ball appeared on the right. I saw a child running towards the ball, so I decided to stop. User: What would have happened if you did not stop ? AI: If there was an impact, the child would have gotten hurt.

42

slide-88
SLIDE 88

Akata, Z., Perronnin, F., Harchaoui, Z., and Schmid, C. (2014). Good practice in large-scale learning for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Akata, Z., Perronnin, F., Harchaoui, Z., and Schmid, C. (2016). Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Akata, Z., Reed, S., Walter, D., Lee, H., and Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In IEEE Computer Vision and Pattern Recognition (CVPR). Corona, R., Alaniz, S., and Akata, Z. (2019). Modeling conceptual understanding in image reference games. In Neural Information Processing Systems (NeurIPS). Hendricks, L.-A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., and Darrell, T. (2016). Generating visual explanations. In European Conference of Computer Vision (ECCV). Hendricks, L. A., Hu, R., Darrell, T., and Akata, Z. (2018). Grounding visual explanations. In European Conference of Computer Vision (ECCV). Kim, J., Rohrbach, A., Darrell, T., Canny, J., and Akata, Z. (2018). Textual explanations for self driving vehicles. In European Conference of Computer Vision (ECCV). Reed, S., Akata, Z., Lee, H., and Schiele, B. (2016a). Learning deep representations of fine-grained visual descriptions. In IEEE Computer Vision and Pattern Recognition (CVPR).

43

slide-89
SLIDE 89

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. (2016b). Generative adversarial text to image synthesis. In International Conference on Machine Learning (ICML). Schoenfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., and Akata, Z. (2019). Generalized zero- and few-shot learning via aligned variational autoencoders. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Xian, Y., Lampert, C., Schiele, B., and Akata, Z. (2018a). Zero-shot learning- a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Xian, Y., Lorenz, T., Schiele, B., and Akata, Z. (2018b). Feature generating networks for zero-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Xian, Y., Sharma, S., Schiele, B., and Akata, Z. (2019). F-vaegan-d2: A feature generating framework for any-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

44

slide-90
SLIDE 90

Thank you!

45