Predicting Deep Zero-Shot Convolutional Neural Networks using - - PowerPoint PPT Presentation

predicting deep zero shot convolutional neural networks
SMART_READER_LITE
LIVE PREVIEW

Predicting Deep Zero-Shot Convolutional Neural Networks using - - PowerPoint PPT Presentation

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions Jimmy Lei Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov ICCV 2015 Presenter: Fartash Faghri Zero-shot Learning Classify images of an unseen class


slide-1
SLIDE 1

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions

Jimmy Lei Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov ICCV 2015

Presenter: Fartash Faghri

slide-2
SLIDE 2

Zero-shot Learning

  • Classify images of an unseen

class given semantically or visually similar classes at training time.

  • Shared knowledge between

classes can be given in various forms, such as attributes or class descriptions.

Antol et al. [1]

slide-3
SLIDE 3

Contributions

  • The main contribution is the convolutional classifier.

The rest of the contributions are shared with [2].

  • Predicts visual classes using text corpus, in

particular, the encyclopedia corpus. This

  • vercomes the difficulty of hand-crafted attributes.
  • The key difference with the most related work is

that image and text features are transformed into a joint embedding space.

slide-4
SLIDE 4

Classifier

  • Image feature vectors:
  • Text feature vectors:
  • A linear classifier:
  • Image transformation:
  • Text transformation:
slide-5
SLIDE 5

Convolutional Classifier

  • Text can describe attributes (low) or objects (high).
  • Classifier on fully connected features:
  • Classifier on convolutional features:
  • Joint classifier:
  • is a global pooling function.
slide-6
SLIDE 6

Learning

  • Binary Cross Entropy:
  • Hinge Loss:
  • Euclidean Distance between and
slide-7
SLIDE 7

Loss Comparison

Produced by WolframAlpha

slide-8
SLIDE 8

Experiments

  • DA: the model is similar to the hinge loss form
  • DA+GP: in that model multiple text descriptions

can be given for a class, GP part gives p(c|t), a prior.

  • fc baseline feat.: features from [2], HOG, GIST, etc
  • ROC: true positive rate vs false positive rate
slide-9
SLIDE 9

Results

slide-10
SLIDE 10

Results (cont.)

slide-11
SLIDE 11
slide-12
SLIDE 12

References

  • [1] Antol, Stanislaw, C. Lawrence Zitnick, and Devi
  • Parikh. "Zero-shot learning via visual abstraction."

European Conference on Computer Vision. Springer International Publishing, 2014.

  • [2] Elhoseiny, Mohamed, Babak Saleh, and Ahmed
  • Elgammal. "Write a classifier: Zero-shot learning

using purely textual descriptions." Proceedings of the IEEE International Conference on Computer

  • Vision. 2013.