Sharing is Caring in the Land of The Long Tail
Samy Bengio
Sharing is Caring in the Land of The Long Tail Samy Bengio Real - - PowerPoint PPT Presentation
Sharing is Caring in the Land of The Long Tail Samy Bengio Real life setting Real problems rarely come packaged as 1M images uniformly belonging to a set of 1000 classes 2 The long tail Well known phenomena where a small number
Samy Bengio
2
“Real problems rarely come packaged as 1M images uniformly belonging to a set of 1000 classes…”
3
generic objects/entities/words appear very often and most others appear more rarely.
engines,
documents,
4 10 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 the anyways trickiest h-plane Frequency Words Frequency of words in Wikipedia
5
follows a long tail distribution?
classes to be well trained.
we have seen only once or even never?
models share/learn a joint representation.
from semantically similar but richer classes.
6
and labels
In this talk, I will cover the following ideas: I will NOT cover the following important issues:
7 100-dimensional embedding space Labels Obama Eiffel Tower Shark Dolphin Lion ...
Learn to embed images & labels to optimize top-ranked items.
Wsabie: J. Weston et al, ECML 2010, IJCAI 2011
8
Label i Image x W V sim(i,x) = <Wi,Vx> Triplet Loss: sim( , dolphin) > sim( , obama) + 1
000100 real values
Trained by stochastic gradient descent and smart sampling of negative examples
9
Method ImageNet 2010 Web prec@1 prec@10 prec@1 prec@10 approx kNN 1.55% 0.41% 0.30% 0.34% One-vs- Rest 2.27% 1.02% 0.52% 0.29% Wsabie 4.03% 1.48% 1.03% 0.44% Ensemble
Wsabies 10.03% 3.02%
ImageNet 2010: 16000 labels and 4M images Web: 109000 labels and 16M images
10
Label Nearest Neighbors barack obama barak obama, obama, barack, barrack obama, bow wow david beckham beckham, david beckam, alessandro del piero, del piero santa santa claus, papa noel, pere noel, santa clause, joyeux noel dolphin delphin, dauphin, whale, delfin, delfini, baleine, blue whale cows cattle, shire, dairy cows, kuh, horse, cow, shire horse, kone rose rosen, hibiscus, rose flower, rosa, roze, pink rose, red rose eiffel tower eiffel, tour eiffel, la tour eiffel, big ben, paris, blue mosque ipod i pod, ipod nano, apple ipod, ipod apple, new ipod f18 f 18, eurofighter, f14, fighter jet, tomcat, mig 21, f 16
11
delfini, orca, dolphin, mar, delfin, dauphin, whale, cancun, killer whale, sea world blue whale, whale shark, great white shark, underwater, white shark, shark, manta ray, dolphin, requin, blue shark, diving barrack obama, barak obama, barack hussein
eiffel, paris by night, la tour eiffel, tour eiffel, eiffel tower, las vegas strip, eifel, tokyo tower, eifel tower
Learn dense embedding vectors from an unannotated text corpus, e.g. Wikipedia
http://code.google.com/p/word2vec Tomas Mikolov, Kai Chen, Greg Corrado, Jeff Dean (ICLR 2013)
bull shark tuna Obama tiger shark wing chair “an exceptionally large male tiger shark can grow up to” W E E
13
tiger shark bull shark blacktip shark shark
sandbar shark dusky shark blue shark requiem shark great white shark lemon shark car cars muscle car sports car compact car autocar automobile pickup truck racing car passenger car dealership
transportation dogs birds musical instruments aquatic life insects animals clothing food reptiles
t-SNE visualization of ImageNet labels
14
15
Berlin Germany Rome Italy big hotter bigger hot
17
Input Layer 1 Layer 7 ...
But what about the long tail of classes? What about using our semantic embeddings for that?
18
p(Lion|x) p(Apple|x) p(Orange|x) p(Tiger|x) p(Bear|x)
19
i
p(Apple|x)s(Apple)+ p(Orange|x)s(Orange)+ p(Tiger|x)s(Tiger)+ p(Bear|x)s(Bear) f(x) = p(Lion|x)s(Lion)+ s(y) = embedding position of y Do a nearest neighbor search around f(x) to find the corresponding label from Skip-Gram for instance:
20
top(T) = {i | p(yi|x) is among top T probabilities} f(x) = 1 Z X
i∈top(T )
p(yi|x)s(yi) In practice, consider the average of only a few labels:
Training 2-hops 3-hops
ILSVRC 2012 images from 1,000 classes
classes.
22
23
24
Softmax Logistic GoogleLeNet model
25
Corgi Puppy Dog Cat
Exclusion Hierarchical Dog Cat Corgi Puppy Overlap
26
Corgi Puppy Dog Cat Visual Model
0.9 0.8 0.9 0.1
Knowledge Graph
Joint Inference
Corgi Puppy Dog Cat
Exclusion Hierarchical
Hierarchy and Exclusion (HEX) Graph [Deng et al, ECCV 2014]
27
i
x ∈ Rn
y ∈ {0,1}n
Binary Label vector
All illegal configurations have probability zero.
i, j
φi(xi, yi) =
sigmoid(xi)
1− sigmoid(xi)
Unary: same as logistic regression
ψi, j(yi, yj ) =
1
If violates constraints Otherwise Pairwise: set illegal configuration to zero
Input scores
if yi =1
if yi = 0
28
Dog Corgi Animal
…
Husky Dog Corgi Animal
…
Husky Relabel
Training (“weakened” labels) Test
Dog Corgi Animal
…
Husky
Original ILSVRC 2012 (leaf labels)
29
Top 1 accuracy (top 5 accuracy)
30
involving multi-word descriptions, or captions?
31
sitting on top of a stove top
sitting on top
top of a stove.
Vision Deep CNN Language Generating RNN
A group of people shopping at an
!
There are many vegetables at the fruit stand.
Vision! Deep CNN Language ! Generating! RNN
[Vinyals et al, CVPR 2015]
32
log p(S|I) =
N
X
t=0
log p(St|I, S0, . . . , St−1)
θ? = arg max
✓
X
(I,S)
log p(S|I; θ)
given the image:
33
Image Embedding P(word 1) P(word 2) P(<end>) Convolution Neural Net Recurrent Neural Net word 1 word N
Two dogs play in the grass. Two hockey players are fighting over the puck. A skateboarder does a trick
A little girl in a pink hat is blowing bubbles. A herd of elephants walking across a dry grass field. A group of young people playing a game of frisbee. A close up of a cat laying
A red motorcycle parked on the side of the road. A dog is jumping to catch a frisbee. A yellow school bus parked in a parking lot. A person riding a motorcycle on a dirt road. A refrigerator filled with lots of food and drinks.
Describes without errors Describes with minor errors Somewhat related to the image Unrelated to the image
34
Human: A blue and black dress ... No! I see white and gold! Our model: A close up of a vase with flowers.
36
37
38
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 200 400 600 800 1000 Exponential decay Inverse sigmoid decay Linear decay
39
interesting tasks.
generalize thanks to “rich” classes.
represent classes with zero training examples.
text and images, but also for complete sentences.
characters; good for long tail words.
41
New one year immersion program in deep learning research
resources