Deep Learning in Computer Vision (CSC2523) Reading List Bid for - - PDF document

deep learning in computer vision csc2523 reading list
SMART_READER_LITE
LIVE PREVIEW

Deep Learning in Computer Vision (CSC2523) Reading List Bid for - - PDF document

Deep Learning in Computer Vision (CSC2523) Reading List Bid for papers: Tue, Jan 26, 11.59pm, 2016 Reviews due: every Monday (one day before class), 11.59pm 1 Bid on papers NOW Below is a list of papers well be reading in the course. You


slide-1
SLIDE 1

Deep Learning in Computer Vision (CSC2523) Reading List

Bid for papers: Tue, Jan 26, 11.59pm, 2016 Reviews due: every Monday (one day before class), 11.59pm

1 Bid on papers NOW

Below is a list of papers we’ll be reading in the course. You are expected to present one

  • paper. We’ll have a bidding system. Please submit a ranked list of papers you’d like to

present here:

https://docs.google.com/forms/d/1UqtYUESRonNjX5mjXF3xGb5bvbsmrXIrq6t0gMbCkhM/viewform?usp=send_form

Use the numbers from this document to refer to papers. Ranking more than one paper is better, since I’ll just go lower down your list in case there is too much interest in one paper. If you don’t submit a preference list, I’ll do a random assignment. You are not expected to read all papers in order to bid. Just browse through them and decide which topics, types of approaches, etc, appeal you more. Note that the list contains a small subset of all available literature on deep learning and new papers are constantly being published. If you know of an interesting or newer paper that is not on the list, please suggest it through the link above.

2 Paper presentation

Each presentation should be 10 to 20 minutes long, depending on the paper (some papers are easier to explain than others). Time your presentation such that you don’t go overtime. Each presentation will be followed by a 5 to 10 minute discussion by everyone in the class. You can present with slides or by explaining the paper by showing it on the projector. You can use existing visualizations, or even a few existing slides (if available), as long as you reference them properly (include a reference on each slide where you borrowed text/visualization/slide). The structure of the presentation should be roughly as follows. You are free to choose your own flow if better suited for the paper.

  • High-level overview, motivation, problem definition, contributions
  • Overview of the technical approach
  • Overview of the experimental evaluation
  • Strengths/weaknesses of the paper (approach, evaluation)

Showing a demo (or some additional results) is a great addition (of course not possible for all papers). 1

slide-2
SLIDE 2

3 Reviewing

A rough guideline to write your paper reviews:

  • Short summary of the paper
  • List main contributions
  • List positive and negatives points with a short discussion
  • How strong is the evaluation? Are there some experiments missing?

You do not need to write a novel, make the review short and concise.

4 Reading List

Click on the title of the paper to access it.

  • 1. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected

CRFs Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L Yuille ICLR, Nov 2015 Project page: https://bitbucket.org/deeplab/deeplab-public/ Topic(s): Semantic segmentation Presentation date: Jan 19, presenter: Shenlong

  • 2. Highway Networks

Rupesh Kumar Srivastava, Klaus Greff, Jrgen Schmidhuber (arXiv:1505.00387), Nov 2015 Project page: http://people.idsia.ch/~rupesh/very_deep_learning/index.html Topic(s): Very deep networks Presentation date: Jan 26, presenter: Renjie

  • 3. Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (arXiv:1512.03385), Dec 2015 Topic(s): Very deep CNNs Presentation date: Jan 26, presenter: Renjie

  • 4. Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik CVPR, 2014 Project page: https://github.com/rbgirshick/rcnn Topic(s): Object detection Presentation date: Jan 26, presenter: Kaustav

  • 5. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (arXiv:1506.01497 ), June 2015 Code (Matlab): https://github.com/ShaoqingRen/faster_rcnn 2

slide-3
SLIDE 3

Code (Python): https://github.com/rbgirshick/py-faster-rcnn Topic(s): Object detection Presentation date: Jan 26, presenter: Kaustav

  • 6. DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf CVPR, June 2014 Topic(s): Face verification

  • 7. PANDA: Pose Aligned Networks for Deep Attribute Modeling

Ning Zhang, Manohar Paluri, MarcAurelio Ranzato, Trevor Darrell, Lubomir Bourdev CVPR, June 2014 Code: https://github.com/facebook/pose-aligned-deep-networks Topic(s): Attribute prediction

  • 8. Computing the Stereo Matching Cost with a Convolutional Neural Network

Jure ˇ Zbontar, Yann LeCun (arXiv:1409.4326 ), Sep 2014 Code: https://github.com/jzbontar/mc-cnn Topic(s): Stereo estimation

  • 9. FlowNet: Learning Optical Flow with Convolutional Networks

Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Husser, Caner Hazrba?, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox (arXiv:1504.06852), April 2015 Code: http://lmb.informatik.uni-freiburg.de/resources/software.php Topic(s): Flow estimation

  • 10. Visual Tracking with Fully Convolutional Networks

Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu ICCV, 2015 Topic(s): Tracking

  • 11. Two-Stream Convolutional Networks for Action Recognition in Videos

Karen Simonyan, Andrew Zisserman NIPS, Dec 2014 Topic(s): Action recognition

  • 12. Dense Optical Flow Prediction from a Static Image

Jacob Walker, Abhinav Gupta, Martial Hebert ICCV, 2015 Topic(s): Flow prediction from a monocular image

  • 13. Designing Deep Networks for Surface Normal Estimation

Xiaolong Wang, David Fouhey, Abhinav Gupta CVPR, 2015 Topic(s): Surface estimation from a monocular image

  • 14. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

David Eigen, Christian Puhrsch, Rob Fergus NIPS, 2014 Project page: http://www.cs.nyu.edu/~deigen/depth/ 3

slide-4
SLIDE 4

Presented together with: Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus ICCV, 2015 Project page: http://www.cs.nyu.edu/~deigen/dnl/ Topic(s): Depth estimation from a monocular image

  • 15. Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Saurabh Gupta, Ross Girshick, Pablo Arbelaez, Jitendra Malik ECCV, 2014 Project page: https://github.com/s-gupta/rcnn-depth Topic(s): Object detection and segmentation in RGB-D

  • 16. Aligning 3D Models to RGB-D Images of Cluttered Scenes

Saurabh Gupta, Pablo Arbelez, Ross Girshick, Jitendra Malik CVPR, 2015 Topic(s): Aligning CAD models in RGB-D

  • 17. Monocular Object Instance Segmentation and Depth Ordering with CNNs

Ziyu Zhang, Alexander G. Schwing, Sanja Fidler, Raquel Urtasun ICCV, 2015 Topic(s): Class-instance segmentation

  • 18. Where to Buy It: Matching Street Clothing Photos in Online Shops
  • M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg

ICCV, 2015 Project page: http://www.tamaraberg.com/street2shop/ Topic(s): Instance recognition

  • 19. Where are they looking?

Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba NIPS, 2015 Project page: http://gazefollow.csail.mit.edu/index.html Topic(s): Gaze prediction

  • 20. DeepStereo: Learning to Predict New Views from the World’s Imagery

John Flynn, Ivan Neulander, James Philbin, Noah Snavely arXiv:1506.06825, June 2015 Topic(s): View synthesis

  • 21. Learning to Generate Chairs, Tables and Cars with Convolutional Networks

Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox arXiv:1411.5928, Nov 2014 Code: http://lmb.informatik.uni-freiburg.de/resources/software.php Topic(s): Image generation

  • 22. Learning to Deblur

Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Scholkopf arXiv:1411.5928, Nov 2014 Topic(s): De-blurring 4

slide-5
SLIDE 5
  • 23. Explaining and Harnessing Adversarial Examples

Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy ICLR, 2015 Blog: http://karpathy.github.io/2015/03/30/breaking-convnets/

https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture

Read together with: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecogniz- able Images Anh Nguyen, Jason Yosinski, Jeff Clune CVPR, 2015 Topic(s): Fooling neural nets, adversarial training

  • 24. Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions

Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov ICCV, Dec 2015 Topic(s): Zero-shot learning of visual models from text

  • 25. A Neural Algorithm of Artistic Style

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge (arXiv:1508.06576), Aug 2015 Code: https://github.com/jcjohnson/neural-style Topic(s): Changing the style of images

  • 26. We Are Humor Beings: Understanding and Predicting Visual Humor

Arjun Chandrasekaran, Ashwin K Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh (arXiv:1512.04407), Dec 2015 Topic(s): Humor (doesn’t have much of NN flavor, but fun)

  • 27. Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep

Neural Network Andrew J.R. Simpson, Gerard Roma, Mark D. Plumbley (arXiv:1504.04658), April 2015 Topic(s): Blind-deconvolution with NNs

  • 28. Fast Algorithms for Convolutional Neural Networks

Andrew Lavin, Scott Gray (arXiv:1509.09308), Sep 2015 Topic(s): Improving the speed of CNNs

  • 29. Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean NIPS, 2013 Code: https://code.google.com/p/word2vec/ Topic(s): Word2vec (vector representation of words)

  • 30. Skip-Thought Vectors

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler 5

slide-6
SLIDE 6

NIPS, 2015 Code: https://github.com/ryankiros/skip-thoughts Code (story-telling): https://github.com/ryankiros/neural-storyteller Topic(s): Sent2vec (vector representation of sentences)

  • 31. Order-Embeddings of Images and Language

Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun arXiv:1511.06361, Nov 2015 Project page: https://github.com/ivendrov/order-embedding Topic(s): Semantic representations of text and images

  • 32. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel TACL, 2015 Project page: https://github.com/ryankiros/visual-semantic-embedding Topic(s): Image captioning

  • 33. DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Justin Johnson, Andrej Karpathy, Li Fei-Fei (arXiv:1511.07571), Nov 2015 Topic(s): Image captioning

  • 34. Generating Images from Captions with Attention

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov (arXiv:1511.02793), Nov 2015 Topic(s): Image generation via sentences

  • 35. Aligning Books and Movies:

Towards Story-like Visual Explanations by Watching Movies and Reading Books Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, An- tonio Torralba, Sanja Fidler ICCV, 2015 Project page: http://www.cs.toronto.edu/~mbweb/ Topic(s): Video-text alignment

  • 36. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Mateusz Malinowski, Marcus Rohrbach, Mario Fritz ICCV, 2015 Topic(s): Visual question-answering

  • 37. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

Mengye Ren, Ryan Kiros, Richard Zemel (arXiv:1505.02074), May 2015 Topic(s): Visual question-answering

  • 38. MovieQA: Understanding Stories in Movies through Question-Answering

Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urta- sun, Sanja Fidler (arXiv:1512.02902), Dec 2015 Project: http://movieqa.cs.toronto.edu/home/ Topic(s): Movie question-answering 6

slide-7
SLIDE 7
  • 39. Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le NIPS, Dec 2014 Topic(s): Machine translation

  • 40. Unsupervised Learning of Video Representations using LSTMs

Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov (arXiv:1502.04681), Feb 2015 Topic(s): Video representations

  • 41. Jointly modeling deep video and compositional text to bridge vision and language in a

unified framework

  • R. Xu, C. Xiong, W. Chen, and J. J. Corso

AAAI, 2015 Topic(s): Joint visual and text representations

  • 42. Robobarista: Learning to Manipulate Novel Objects via Deep Multimodal Embedding

Jaeyong Sung, Seok Hyun Jin, Ian Lenz, Ashutosh Saxena ISSR, 2015 Topic(s): Joint visual and text representations

  • 43. Visualizing and Understanding Recurrent Networks

Andrej Karpathy, Justin Johnson, Li Fei-Fei (arXiv:1506.02078), June 2015 Code: https://github.com/karpathy/char-rnn Blog: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Topic(s): RNN Visualization

  • 44. Hierarchical Neural Network Generative Models for Movie Dialogues

Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau (arXiv:1507.04808), Nov 2015 Topic(s): Dialogue generation

  • 45. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action

Sequences Hongyuan Mei, Mohit Bansal, Matthew R. Walter (arXiv:1506.04089), June 2015 Topic(s): Navigation with Text

  • 46. Learning to Execute

Wojciech Zaremba, Ilya Sutskever arXiv:1410.4615, Oct 2014 Topic(s): NN to evaluate short programs

  • 47. Learning to Discover Efficient Mathematical Identities

Wojciech Zaremba, Karol Kurach, Rob Fergus NIPS, 2014 Topic(s): Discovery of efficient mathematical identities

  • 48. Unsupervised Representation Learning with Deep Convolutional Generative Adversar-

ial Networks Alec Radford, Luke Metz, Soumith Chintala 7

slide-8
SLIDE 8

(arXiv:1511.06434), Jan 2016 Topic(s): Image generation

  • 49. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus (arXiv:1506.05751), June 2015 Topic(s): Image generation

  • 50. Neural GPUs Learn Algorithms

Lukasz Kaiser, Ilya Sutskever (arXiv:1511.08228), Jan 2016 Topic(s): Learning algorithms

  • 51. Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller (arXiv:1312.5602 ), Dec 2013 Topic(s): Reinforcement learning

  • 52. Giraffe: Using Deep Reinforcement Learning to Play Chess

Matthew Lai (arXiv:1509.01549), Sep 2015 Topic(s): Reinforcement learning

  • 53. End-to-End Training of Deep Visuomotor Policies

Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel (arXiv:1504.00702), April 2015 Topic(s): Reinforcement learning

  • 54. Scalable Bayesian Optimization Using Deep Neural Networks

Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat, Ryan P. Adams (arXiv:1502.05700), Feb 2015 Topic(s): Bayesian optimization with NNs 8