CNN wrapup and Visual attributes Thurs April 26 Kristen Grauman - PDF document

CS 376: Computer Vision - lecture 26 4/26/2018 CNN wrapup and Visual attributes Thurs April 26 Kristen Grauman UT Austin Last time • Evaluation • Scoring an object detector • Scoring a multi-class recognition system • Spatial pyramid match kernel • (Deep) Neural networks Today • Convolutional neural networks • Attributes 1

CS 376: Computer Vision - lecture 26 4/26/2018 Learning a Hierarchy of Feature Extractors • Each layer of hierarchy extracts features from output of previous layer • All the way from pixels  classifier • Layers have the (nearly) same structure Labels Image/video Image/Video Simple Pixels Layer 1 Layer 1 Layer 2 Layer 2 Layer 3 Layer 3 Classifier • Train all layers jointly Slide: Rob Fergus Significant recent impact on the field Big labeled Deep learning datasets ImageNet top-5 error (%) 30 25 20 GPU technology 15 10 5 0 1 2 3 4 5 6 Slide credit: Dinesh Jayaraman Convolutional Neural Networks (CNN, ConvNet, DCN) • CNN = a multi-layer neural network with – Local connectivity: • Neurons in a layer are only connected to a small region of the layer before it – Share weight parameters across spatial positions: • Learning shift-invariant filter kernels Image credit: A. Karpathy Jia-Bin Huang and Derek Hoiem, UIUC 2

CS 376: Computer Vision - lecture 26 4/26/2018 LeNet [LeCun et al. 1998] Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998] LeNet-1 from 1993 Jia-Bin Huang and Derek Hoiem, UIUC Convolution • Weighted moving sum . . . Feature Activation Map Input slide credit: S. Lazebnik Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik 3

CS 376: Computer Vision - lecture 26 4/26/2018 Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity . . Convolution . (Learned) Input Feature Map Input Image slide credit: S. Lazebnik Convolutional Neural Networks Feature maps Normalization Rectified Linear Unit (ReLU) Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik Convolutional Neural Networks Feature maps Normalization Max pooling Spatial pooling Non-linearity Max-pooling: a non-linear down-sampling Convolution (Learned) Provide translation invariance Input Image slide credit: S. Lazebnik 4

CS 376: Computer Vision - lecture 26 4/26/2018 Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik SIFT Descriptor Lowe [IJCV 2004] Image Apply Pixels oriented filters Spatial pool (Sum) Feature Normalize to unit Vector length slide credit: R. Fergus Spatial Pyramid Matching Lazebnik, Schmid, SIFT Ponce Filter with Features [CVPR 2006] Visual Words Max Multi-scale spatial pool Classifier (Sum) slide credit: R. Fergus 5

CS 376: Computer Vision - lecture 26 4/26/2018 Visualizing what was learned • What do the learned filters look like? Typical first layer filters Individual Neuron Activation RCNN [Girshick et al. CVPR 2014] Jia-Bin Huang and Derek Hoiem, UIUC Individual Neuron Activation RCNN [Girshick et al. CVPR 2014] Jia-Bin Huang and Derek Hoiem, UIUC 6

CS 376: Computer Vision - lecture 26 4/26/2018 Individual Neuron Activation RCNN [Girshick et al. CVPR 2014] Jia-Bin Huang and Derek Hoiem, UIUC https://www.wired.com/2012/06/google-x-neural-network/ Application: ImageNet • ~14 million labeled images, 20k classes • Images gathered from Internet • Human labels via Amazon Turk [Deng et al. CVPR 2009] Slide: R. Fergus https://sites.google.com/site/deeplearningcvpr2014 7

CS 376: Computer Vision - lecture 26 4/26/2018 AlexNet • Similar framework to LeCun’98 but: • Bigger model (7 hidden layers, 650,000 units, 60,000,000 params) More data (10 6 vs. 10 3 images) • • GPU implementation (50x speedup over CPU) • Trained on two GPUs for a week A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 Jia-Bin Huang and Derek Hoiem, UIUC ImageNet Classification Challenge AlexNet http://image-net.org/challenges/talks/2016/ILSVRC2016_10_09_clsloc.pdf Industry Deployment • Used in Facebook, Google, Microsoft • Image Recognition, Speech Recognition, …. • Fast at test time T aigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14 Slide: R. Fergus 8

CS 376: Computer Vision - lecture 26 4/26/2018 Recap so far • Neural networks / multi-layer perceptrons – View of neural networks as learning hierarchy of features • Convolutional neural networks – Architecture of network accounts for image structure – “End-to-end” recognition from pixels – Together with big (labeled) data and lots of computation  major success on benchmarks, image classification and beyond Beyond classification • Detection • Segmentation • Regression • Pose estimation • Matching patches • Synthesis and many more… Jia-Bin Huang and Derek Hoiem, UIUC R-CNN: Regions with CNN features • Trained on ImageNet classification • Finetune CNN on PASCAL RCNN [Girshick et al. CVPR 2014] Jia-Bin Huang and Derek Hoiem, UIUC 9

CS 376: Computer Vision - lecture 26 4/26/2018 CNN for Regression DeepPose [Toshev and Szegedy CVPR 2014] Jia-Bin Huang and Derek Hoiem, UIUC Today • Convolutional neural networks • Attributes What are visual attributes? • Mid-level semantic properties shared by objects • Human-understandable and machine-detectable high outdoors metallic flat heel brown has- red ornaments four-legged indoors o Material, Appearance, Function/affordance, Parts… o Adjectives o Statements about visual concepts [Oliva et al. 2001, Ferrari & Zisserman 2007, Kumar et al. 2008, Farhadi et al. 2009, Lampert et al. 2009, Endres et al. 2010, Wang & Mori 2010, Berg et al. 2010, Branson et al. 2010, Parikh & Grauman 2011, …] 10

CS 376: Computer Vision - lecture 26 4/26/2018 Examples: Binary Attributes Facial properties “Smiling Asian Men With Glasses” Kumar et al. 2008 Examples: Binary Attributes Object parts and shapes Farhadi et al. 2009 Examples: Binary Attributes Shopping descriptors Berg et al. 2010 11

CS 376: Computer Vision - lecture 26 4/26/2018 Attributes for search and recognition Language-based attributes give human way to o Teach novel categories with description o Communicate search queries o Give feedback in interactive search o Assist in interactive recognition Slide credit: Kristen Grauman Why attributes? • Why would a robot need to recognize a scene? Can I walk around here? Is this walkable? Slide credit: Devi Parikh Why attributes? • Why would a robot need to recognize an object? How hard should I grip this? Is it brittle? Slide credit: Devi Parikh 12

CS 376: Computer Vision - lecture 26 4/26/2018 Why attributes? • How do people naturally describe visual concepts? I want elegant Image search silver sandals with high heels Semantic Zebras have “teaching” stripes. Slide credit: Devi Parikh Relative attributes Idea : represent visual comparisons between classes, images, and their properties. Brighter than Image Image Properties Bright Bright Properties Properties [Parikh & Grauman, ICCV 2011] How to teach relative visual concepts? How much is the person smiling? 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 13

CS 376: Computer Vision - lecture 26 4/26/2018 How to teach relative visual concepts? How much is the person smiling? 1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 4 How to teach relative visual concepts? How much is the person smiling? 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 How to teach relative visual concepts?  ? Less More 14

CS 376: Computer Vision - lecture 26 4/26/2018 Learning relative attributes For each attribute, use ordered image pairs to train a ranking function: Ranking function = …, Image features [Parikh & Grauman, ICCV 2011; Joachims 2002] Learning relative attributes Max-margin learning to rank formulation Rank margin w m Image Relative attribute score Joachims, KDD 2002 Slide credit: Devi Parikh Relating images Rather than simply label images with their properties, Not bright Smiling Not natural [Parikh & Grauman, ICCV 2011] 15

CS 376: Computer Vision - lecture 26 4/26/2018 Relating images Now we can compare images by attribute’s “strength” bright smiling natural [Parikh & Grauman, ICCV 2011] Interactive visual search Feedback Results • Iteratively refine the set of retrieved images based on user feedback on results so far • Potential to communicate more precisely the desired visual content Slide credit: Adriana Kovashka How is interactive search done today? Keywords + binary relevance feedback relevant irrelevant black high heels • Traditional binary feedback is imprecise • Coarse communication between user and system [Rui et al. 1998, Zhou et al. 2003, Tong & Chang 2001, Cox et al. 2000, Ferecatu & Geman 2007, …] 16

CNN wrapup and Visual attributes Thurs April 26 Kristen Grauman - PDF document

CS 376: Computer Vision - lecture 26 4/26/2018 CNN wrapup and Visual attributes Thurs April 26 Kristen Grauman UT Austin Last time Evaluation Scoring an object detector Scoring a multi-class recognition system Spatial pyramid

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Beyond Semantics Wrapup Annual meeting of the DGfS AG 1 Gttingen, 2011 Wrapup Bonnie

Frontend Wrapup COMP 520: Compiler Design (4 credits) Alexander Krolik

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

Introduction to Data Science: Principles ordered categorical data do not have magnitude

From E/R Diagrams to Relations Entity set relation Attributes attributes

The language of visual attributes Kristen Grauman Facebook AI Research University of Texas at

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

Attributes Sept 28, 2016 Kristen Grauman UT Austin What are visual attributes? Mid-level

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

Negative Effects of Shopping Malls Sena zgi 020040523 Reyhan Ate 020070804 Cem Soyer

Developing Technology for Ratchet and Clank Future: Tools of Destruction Mike Acton, Engine

2 Theme: The Holy Spirit is leading Pauls journey. Talking Points: Paul received a vision in

Pride Rock, Temple Mountain PHYSICAL LITERACY Find your shoes, Grow into your shoes*, And use

Relative Attributes Experiments Sanmit Narvekar Department of Computer Science The University

UNCs Disaster Response efforts Board of Visitors Meeting September 6, 2019 The Carolina

WELCOME! Mens Fellowship Breakfast April 24, 2020 Messag Me age and Stru ructure of Ma

1 & 2 Samuel Series Lesson #092 May 16, 2017 Dean Bible Ministries www.deanbibleministries.org

CNN wrapup and Visual attributes Thurs April 26 Kristen Grauman - PDF document

CS 376: Computer Vision - lecture 26 4/26/2018 CNN wrapup and Visual attributes Thurs April 26 Kristen Grauman UT Austin Last time Evaluation Scoring an object detector Scoring a multi-class recognition system Spatial pyramid

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Beyond Semantics Wrapup Annual meeting of the DGfS AG 1 Gttingen, 2011 Wrapup Bonnie

Frontend Wrapup COMP 520: Compiler Design (4 credits) Alexander Krolik

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

Introduction to Data Science: Principles ordered categorical data do not have magnitude

From E/R Diagrams to Relations Entity set relation Attributes attributes

The language of visual attributes Kristen Grauman Facebook AI Research University of Texas at

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

Attributes Sept 28, 2016 Kristen Grauman UT Austin What are visual attributes? Mid-level

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

Negative Effects of Shopping Malls Sena zgi 020040523 Reyhan Ate 020070804 Cem Soyer

Developing Technology for Ratchet and Clank Future: Tools of Destruction Mike Acton, Engine

2 Theme: The Holy Spirit is leading Pauls journey. Talking Points: Paul received a vision in

Pride Rock, Temple Mountain PHYSICAL LITERACY Find your shoes, Grow into your shoes*, And use

Relative Attributes Experiments Sanmit Narvekar Department of Computer Science The University

UNCs Disaster Response efforts Board of Visitors Meeting September 6, 2019 The Carolina

WELCOME! Mens Fellowship Breakfast April 24, 2020 Messag Me age and Stru ructure of Ma

1 &amp; 2 Samuel Series Lesson #092 May 16, 2017 Dean Bible Ministries www.deanbibleministries.org

1 & 2 Samuel Series Lesson #092 May 16, 2017 Dean Bible Ministries www.deanbibleministries.org