Sharing is Caring in the Land of The Long Tail Samy Bengio Real - PowerPoint PPT Presentation

Sharing is Caring in the Land of The Long Tail Samy Bengio

Real life setting “Real problems rarely come packaged as 1M images uniformly belonging to a set of 1000 classes…” 2

The long tail • Well known phenomena where a small number of generic objects/entities/words appear very often and most others appear more rarely. • Also knows as Zipf or Power law, or Pareto distribution. • The web is littered by this kind of distributions: • the frequency of each unique query on search engines, • the occurrences of each unique word in text documents, • etc. 3

Example of a long tail Frequency of words in Wikipedia 1e+09 1e+08 1e+07 1e+06 Frequency 100000 10000 1000 100 10 the anyways trickiest h-plane Words 4

Representation sharing • How do we design a classifier or a ranker when data follows a long tail distribution? • If we train one model per class, it is hard for poor classes to be well trained. • How come we humans are able to recognize objects we have seen only once or even never? • Most likely answer: representation sharing: all class models share/learn a joint representation. • Poor classes can then benefit from knowledge learned from semantically similar but richer classes. • Extreme case: zero-shot setting! 5

Outline In this talk, I will cover the following ideas: • Wsabie: a joint embedding space of images and labels • The many facets of text embeddings • Zero-shot setting through embeddings • Incorporate Knowledge Graph constraints • Use of a language model I will NOT cover the following important issues: • Prediction time issues for extreme classification • Memory issues 6

Wsabie Learn to embed images & labels to optimize top-ranked items. Labels Obama Eiffel Tower Shark Dolphin Lion ... 100-dimensional embedding space Wsabie: J. Weston et al, ECML 2010, IJCAI 2011 7

Wsabie: summary sim(i,x) = <Wi,Vx> W V 000100 real values Label i Image x Triplet Loss: sim( , dolphin) > sim( , obama) + 1 Trained by stochastic gradient descent and smart sampling of negative examples 8

Wsabie: experiments - results Method ImageNet 2010 Web prec@1 prec@10 prec@1 prec@10 approx 1.55% 0.41% 0.30% 0.34% kNN One-vs- 2.27% 1.02% 0.52% 0.29% Rest 4.03% 1.48% 1.03% 0.44% Wsabie Ensemble of 10 10.03% 3.02% Wsabies ImageNet 2010: 16000 labels and 4M images Web: 109000 labels and 16M images 9

Wsabie: embeddings Label Nearest Neighbors barack obama barak obama, obama, barack, barrack obama, bow wow david beckham beckham, david beckam, alessandro del piero, del piero santa santa claus, papa noel, pere noel, santa clause, joyeux noel dolphin delphin, dauphin, whale, delfin, delfini, baleine, blue whale cows cattle, shire, dairy cows, kuh, horse, cow, shire horse, kone rose rosen, hibiscus, rose flower, rosa, roze, pink rose, red rose eiffel tower eiffel, tour eiffel, la tour eiffel, big ben, paris, blue mosque ipod i pod, ipod nano, apple ipod, ipod apple, new ipod f18 f 18, eurofighter, f14, fighter jet, tomcat, mig 21, f 16 10

Wsabie: annotations delfini, orca, dolphin , mar, delfin, dauphin, whale, cancun, killer whale, sea world blue whale, whale shark, great white shark, underwater, white shark, shark, manta ray, dolphin , requin, blue shark, diving barrack obama, barak obama, barack hussein obama, barack obama , james marsden, jay z, obama, nelly, falco, barack eiffel, paris by night, la tour eiffel, tour eiffel, eiffel tower , las vegas strip, eifel, tokyo tower, eifel tower 11

“Why not an embedding of text only?”

Skip-Gram (Word2Vec) Learn dense embedding vectors from an unannotated text corpus, e.g. Wikipedia wing chair Obama W tuna E E bull shark “an exceptionally large male tiger shark can grow up to” tiger shark http://code.google.com/p/word2vec Tomas Mikolov, Kai Chen, Greg Corrado, Je ff Dean (ICLR 2013) 13

Skip-Gram Wikipedia t-SNE visualization of ImageNet Skip-gram trained on Wikipedia, labels 155K terms tiger shark car bull shark cars blacktip shark muscle car shark sports car oceanic whitetip shark compact car sandbar shark autocar dusky shark automobile blue shark pickup truck requiem shark racing car great white shark passenger car reptiles lemon shark dealership birds insects food musical instruments clothing dogs aquatic life animals transportation 14

Embeddings are powerful hot Berlin Germany big Rome hotter Italy bigger E( Rome ) - E( Italy ) + E( Germany ) ≈ E( Berlin ) E( hotter ) - E( hot ) + E( big ) ≈ E( bigger ) 15

Let’s go back to images!

Deep convolutional models for images But what about the long tail of classes? Layer 7 What about using our ... semantic embeddings Layer 1 for that? Input 17

ConSE: Convex Combination of Semantic Embeddings from Skip-Gram for instance: s (y) = embedding position of y X f ( x ) = p ( y i | x ) s ( y i ) i f ( x ) = p (Lion | x ) s (Lion)+ p (Apple | x ) s (Apple)+ p (Orange | x ) s (Orange)+ p (Tiger | x ) s (Tiger)+ p (Bear | x ) s (Bear) Do a nearest neighbor search around f ( x ) to find the corresponding label 19

ConSE(T): Convex Combination of Semantic Embeddings In practice, consider the average of only a few labels: top ( T ) = { i | p ( y i | x ) is among top T probabilities } f ( x ) = 1 X p ( y i | x ) s ( y i ) Z i ∈ top ( T ) 20

ConSE(T): experiments on ImageNet • Model trained with 1.2M 3-hops ILSVRC 2012 images from 1,000 classes 2-hops • Evaluated on images from same classes. Training • Results are measured as hit@ k.

ConSe(T) experiments 22

Knowledge Graph 23

Multiclass Classifiers Softmax GoogleLeNet model Logistic 24

Object labels have rich relations Exclusion Hierarchical Dog Dog Cat Cat Corgi Puppy Corgi Puppy Overlap 25

Visual Model + Knowledge Graph Dog 0.9 Visual Corgi 0.8 Knowledge Joint Model Graph Puppy 0.9 Inference Cat 0.1 Hierarchy and Exclusion (HEX) Graph Exclusion Hierarchical Dog Cat [Deng et al, ECCV 2014] Corgi Puppy 26

HEX Classification Model x ∈ R n y ∈ {0,1} n Input scores Binary Label vector 1 ∏ ψ i , j ( y i , y j ) Pr( y | x ) = ∏ φ i ( x i , y i ) Z ( x ) i , j i If violates constraints if y i = 1 sigmoid ( x i ) 0 φ i ( x i , y i ) = ψ i , j ( y i , y j ) = if y i = 0 1 − sigmoid ( x i ) 1 Otherwise Unary: same as logistic regression Pairwise: set illegal configuration to zero � All illegal configurations have probability zero. 27

Exp: Learning with weak labels • ILSVRC 2012: “relabel” or “weaken” a portion of fine-grained leaf labels to basic level labels. • Evaluate on fine-grained recognition Animal Animal Animal Relabel Dog Dog Dog … … … Corgi Husky Corgi Husky Corgi Husky Training Test Original ILSVRC 2012   (“weakened” labels) (leaf labels) 28

Exp: Learning with weak labels • ILSVRC 2012: “relabel” or “weaken” a portion of fine-grained leaf labels to basic level labels. • Evaluate on fine-grained recognition. • Consistently outperforms baselines. Top 1 accuracy (top 5 accuracy) 29

What about textual descriptions? • We have considered the long tail of objets. • What about more complex descriptions, involving multi-word descriptions, or captions? • We can use language models to help. 30

Neural Image Caption Generator [Vinyals et al, CVPR 2015] Vision Language 1. Two pizzas sitting on top of Deep CNN Generating a stove top RNN oven. 2. A pizza sitting on top of a pan on top of a stove. A group of people Language ! Vision ! shopping at an Generating ! Deep CNN outdoor market. RNN ! There are many vegetables at the fruit stand. 31

NIC: objective • Let I be an image (pixels). • Let S be the corresponding sentence (sequence of words). • Likelihood of producing the right sentence given the image: N X log p ( S | I ) = log p ( S t | I, S 0 , . . . , S t − 1 ) t =0 • We maximize the likelihood of producing the right sentence given the image: θ ? = arg max X log p ( S | I ; θ ) ✓ ( I,S ) 32

NIC: model P(word 1) P(word 2) P(<end>) Embedding word 1 word N Recurrent Image Convolution Neural Net Neural Net 33

Examples A person riding a A skateboarder does a trick A dog is jumping to catch a Two dogs play in the grass. motorcycle on a dirt road. on a ramp. frisbee. A refrigerator filled with lots of A group of young people Two hockey players are A little girl in a pink hat is food and drinks. playing a game of frisbee. fighting over the puck. blowing bubbles. A herd of elephants walking A close up of a cat laying A yellow school bus parked A red motorcycle parked on the across a dry grass field. on a couch. side of the road. in a parking lot. Describes without errors Describes with minor errors Somewhat related to the image Unrelated to the image 34

It doesn’t always work… Human: A blue and black dress ... No! I see white and gold! Our model: A close up of a vase with flowers.

Scheduled Sampling [NIPS 2015] 36

Scheduled Sampling 37

Sharing is Caring in the Land of The Long Tail Samy Bengio Real - PowerPoint PPT Presentation

Sharing is Caring in the Land of The Long Tail Samy Bengio Real life setting Real problems rarely come packaged as 1M images uniformly belonging to a set of 1000 classes 2 The long tail Well known phenomena where a small number

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

Based Care UPMC St. Margaret Caring Theories What is caring? What does it look like in

Caring for an Aging Parent Be Prepared Caring For An Aging Parent Caring for an Aging Parent

Day 3 Long Tail SEO Google Analytics How Google Analytics can help with our Long Tail

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

Caring Together & Caring Together & Getting it Right for Young Carers Getting it Right

Caring For the Plants/ Caring For the Land: Indigenous Peoples knowledge and use of plants

Tail Recursion FlashBack goToWall Iteration A tail recursive solution: public void

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

The Long Tail as a Pow g er Curve 120 100 80 60 40 40 20 0 1 1 11 11 21 21 31 31

Otter Tail County Drainage Authority County Drainage (Ditch) System No. 44 Public Information

Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde

Financial State of the Club Used the CPI to eliminate inflation (all in 2019 $) Going back

Metaphor Structure of reality built up through embodied interaction Categories created

Donor Recognition For financial contributions received July 1, 2016 - February 28, 2017 2016 -

Background Unprecedented growth of multimedia data on the Internet. Application: cross-modal

The Gospel of Mark John Chapman September 26, 1774 March 18, 1845 American Evangelist

Safety models & accident models Eric Marsden <eric.marsden@risk-engineering.org> A

TO TO 1 2 TRUTH ON THE WEB MINISTRIES WWW.TOTW.ORG CHURCH OF GOD AT WOODSTOCK, IL 1 John

Sharing is Caring in the Land of The Long Tail Samy Bengio Real - PowerPoint PPT Presentation

Sharing is Caring in the Land of The Long Tail Samy Bengio Real life setting Real problems rarely come packaged as 1M images uniformly belonging to a set of 1000 classes 2 The long tail Well known phenomena where a small number

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

Based Care UPMC St. Margaret Caring Theories What is caring? What does it look like in

Caring for an Aging Parent Be Prepared Caring For An Aging Parent Caring for an Aging Parent

Day 3 Long Tail SEO Google Analytics How Google Analytics can help with our Long Tail

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

Caring Together &amp; Caring Together &amp; Getting it Right for Young Carers Getting it Right

Caring For the Plants/ Caring For the Land: Indigenous Peoples knowledge and use of plants

Tail Recursion FlashBack goToWall Iteration A tail recursive solution: public void

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

The Long Tail as a Pow g er Curve 120 100 80 60 40 40 20 0 1 1 11 11 21 21 31 31

Otter Tail County Drainage Authority County Drainage (Ditch) System No. 44 Public Information

Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde

Financial State of the Club Used the CPI to eliminate inflation (all in 2019 $) Going back

Metaphor Structure of reality built up through embodied interaction Categories created

Donor Recognition For financial contributions received July 1, 2016 - February 28, 2017 2016 -

Background Unprecedented growth of multimedia data on the Internet. Application: cross-modal

The Gospel of Mark John Chapman September 26, 1774 March 18, 1845 American Evangelist

Safety models &amp; accident models Eric Marsden &lt;eric.marsden@risk-engineering.org&gt; A

TO TO 1 2 TRUTH ON THE WEB MINISTRIES WWW.TOTW.ORG CHURCH OF GOD AT WOODSTOCK, IL 1 John

Caring Together & Caring Together & Getting it Right for Young Carers Getting it Right

Safety models & accident models Eric Marsden <eric.marsden@risk-engineering.org> A