16-824:Visual Learning and Recognition Many slides from A. Farhadi, - - PowerPoint PPT Presentation

16 824 visual learning and recognition
SMART_READER_LITE
LIVE PREVIEW

16-824:Visual Learning and Recognition Many slides from A. Farhadi, - - PowerPoint PPT Presentation

16-824:Visual Learning and Recognition Many slides from A. Farhadi, A. Efros Course Information Time: Monday, Wednesday 1:30-2:50 Location: NSH 1305 Office Hours: Email me for appointments Contact: abhinavg@cs ,


slide-1
SLIDE 1

16-824:Visual Learning and Recognition

Many slides from A. Farhadi, A. Efros

slide-2
SLIDE 2

Course Information

  • Time:

– Monday, Wednesday 1:30-2:50

  • Location:

– NSH 1305

  • Office Hours:

– Email me for appointments

  • Contact:

– abhinavg@cs , EDSH 213

  • Website:

– http://graphics.cs.cmu.edu/courses/ 16-824/2016_spring/

slide-3
SLIDE 3
  • Abhinav Gupta
  • Ph.D. 2009, University of Maryland

People - Instructor

slide-4
SLIDE 4
slide-5
SLIDE 5
  • Abhinav Gupta
  • Ph.D. 2009, University of Maryland
  • Postdoctoral Fellow, Carnegie Mellon University, 2009-11

People

slide-6
SLIDE 6

Original Image

All results and Code: http://www.cs.cmu.edu/~abhinavg/blocksworld

sky Ground

High High

  • Prob. Med.
  • Prob. Med.

Prob. Med.

Point- supported Point- supported

Infront Infront supported supported above above above

above

3D Parse Graph

  • Prob. Med.

blocks world revisited

slide-7
SLIDE 7
slide-8
SLIDE 8

People

  • David Fouhey
  • Ph.D. Student, Robotics Institute
slide-9
SLIDE 9

Input Image Surface Connection Graph

slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

People

  • David Fouhey
  • Ph.D. Student, Robotics Institute
  • Research Interests

– 3D Scene Understanding – Understanding Humans

slide-13
SLIDE 13
  • Xiaolong Wang
  • PhD Student, Robotics Institute
  • Working with me
  • Research Interests:

– Learning Visual Representation via ConvNets – Representing actions via ConvNets

People - TA

slide-14
SLIDE 14
  • Rohit Girdhar
  • MS Student, Robotics Institute
  • Working with me
  • Research Interests:
  • 3D Understanding
  • Affordances

People - TA

slide-15
SLIDE 15

What is this course about?

16-824: Learning-based Methods in Vision

slide-16
SLIDE 16

What is the goal of Computer Vision?

Systems that can “understand” Visual Data

slide-17
SLIDE 17

understanding visual data

slide-18
SLIDE 18

understanding visual data

slide-19
SLIDE 19

understanding visual data

slide-20
SLIDE 20

What does it mean to understand?

slide-21
SLIDE 21

The Vision Story Begins…

“What does it mean, to see? The plain man's answer (and Aristotle's, too). would be, to know what is where by looking.”

  • - David Marr, Vision (1982)

Slide Credit: Alyosha Efros

slide-22
SLIDE 22

Vision: a split personality

“What does it mean, to see? The plain man's answer (and Aristotle's, too). would be, to know what is where by looking. In other words, vision is the process of discovering from images what is present in the world, and where it is.” Answer #1: pixel of brightness 243 at position (124,54) …and depth .7 meters Answer #2: looks like flat sittable surface of the couch Which do we want? Is the difference just a matter of scale or is there some fundamental difference?

slide-23
SLIDE 23

Measurement vs. Perception

slide-24
SLIDE 24

Brightness: Measurement vs. Perception

Slide Credit: Alyosha Efros

slide-25
SLIDE 25

Brightness: Measurement vs. Perception

Proof!

Slide Credit: Alyosha Efros

slide-26
SLIDE 26

Measurement

Length

http://www.michaelbach.de/ot/sze_muelue/index.html

Müller-Lyer Illusion

Slide Credit: Alyosha Efros

slide-27
SLIDE 27

Measurement Capturing physical quantities like pixel brightness, depth, etc. Perception/Understanding

  • a high-level representation that captures the

semantic structure of the scene and its constituent objects.

  • Subjective – Depends on Task and Agent
  • Intersection of what you see and what you believe

(prior knowledge)

slide-28
SLIDE 28

Vision as Measurement Device

Real-time stereo on Mars Structure from Motion Physics-based Vision Virtualized Reality Slide Credit: Alyosha Efros

slide-29
SLIDE 29

…but why do we care about perception?

The goals of computer vision (what + where) are in terms of what humans care about.

slide-30
SLIDE 30

So what do humans care about?

slide-31
SLIDE 31

Living Room Image Classification/ Scene Recognition

slide-32
SLIDE 32

Couch Table

Object Detection

slide-33
SLIDE 33

Couch Table

Object Segmentation/Categorization

slide-34
SLIDE 34

3D Understanding

slide-35
SLIDE 35

Can Sit Can Walk Can Move Can Push

Functional Understanding

slide-36
SLIDE 36

Pose Estimation:

slide-37
SLIDE 37

Activity Recognition: What is he doing?

What is he doing?

slide-38
SLIDE 38

Why are these problems hard?

slide-39
SLIDE 39

Challenges 1: view point variation

Michelangelo 1475-1564

slide by Fei Fei, Fergus & Torralba

slide-40
SLIDE 40

Challenges 2: illumination

slide credit: S. Ullman

slide-41
SLIDE 41

Challenges 3: occlusion

Magritte, 1957

slide by Fei Fei, Fergus & Torralba

slide-42
SLIDE 42

Challenges 4: scale

slide by Fei Fei, Fergus & Torralba

slide-43
SLIDE 43

Challenges 5: deformation

Xu, Beihong 1943

slide by Fei Fei, Fergus & Torralba

slide-44
SLIDE 44

Challenges 6: background clutter

Klimt, 1913

slide by Fei Fei, Fergus & Torralba

slide-45
SLIDE 45

Challenges 7: object intra-class variation

slide by Fei-Fei, Fergus & Torralba

slide-46
SLIDE 46

Challenges 8: local ambiguity

slide by Fei-Fei, Fergus & Torralba

slide-47
SLIDE 47

Challenges 9: the world behind the image

Slide Credit: Alyosha Efros

slide-48
SLIDE 48

ill-posed

from [Sinha and Adelson 1993]

  • EXAMPLE:
  • Recovering 3D geometry from

single 2D projection

  • Infinite number of possible

solutions!

slide-49
SLIDE 49

How do we solve it?

slide-50
SLIDE 50

Data to Rescue !!

slide-51
SLIDE 51
  • Data to build observation models..
  • Data to build priors about the visual world.
  • Use the models and prior information to infer..

Machine-Learning!

slide-52
SLIDE 52

In this course, we will:

Take a few baby steps…

slide-53
SLIDE 53

Data Tasks Learning

slide-54
SLIDE 54

Technical Challenges

slide-55
SLIDE 55

Technical Challenges

slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60

What to expect in the class?

slide-61
SLIDE 61

Describing Visual Scenes using Transformed Dirichlet Processes.

  • E. Sudderth, A. Torralba, W. Freeman,

and A. Willsky. NIPS, Dec. 2005.

Graphical Models

slide-62
SLIDE 62

Learning as a tool to exploit big data, build prior models etc.
 
 Not formulate problem in complicated manner…


slide-63
SLIDE 63

But that said…

  • We will still look at the learning methods

which give the state of the art performance on these tasks.

  • For example, most focus this year will be
  • n deep learning – Convolutional Neural

Networks (CNN)..

slide-64
SLIDE 64

Is this a research course?

  • One year ago – YES!
  • But times have changed: Computer Vision

is a hot topic in industry now..

  • 2012 – Resurgence of Deep Networks

(CNNs)

slide-65
SLIDE 65

2014 – Deep Learning is Everywhere

  • Google, Facebook, Baidu, Apple

– Strong deep learning groups hiring everywhere.. – Beyond Research: Development

  • Image Search
  • Automated Driving

Startups Sold Everyday

  • Vision Factory, EuVision, Flutter….

Come Back to this in Next Class!

slide-66
SLIDE 66

Course Outline

slide-67
SLIDE 67

Goals

  • Read some interesting papers together

– Learn something new: both you and us!

  • Get up to speed on big chunk of vision research

– understand 70% of CVPR papers!

  • Use learning-based vision in your own work
  • Learn how to speak
  • Learn how think critically about papers
slide-68
SLIDE 68

Course Organization

  • Requirements:
  • 1. Class Participation (15%)
  • Keep annotated bibliography
  • Post on the Class Blog before each class
  • Ask questions / debate / flight / be involved!
  • 2. Presentation (20 %)
  • 3. Project (25%)
  • 4. Assignment (2x20%)
slide-69
SLIDE 69

Class Participation

  • Keep annotated bibliography of papers you read

(always a good idea!). The format is up to you. At least, it needs to have:

– Summary of key points – A few Interesting insights, “aha moments”, keen observations, etc. – Weaknesses of approach. Unanswered questions. Areas of further investigation, improvement.

  • Submit a comment on the Class Blog

– ask a question, answer a question, post your thoughts, praise, criticism, start a discussion, etc.

slide-70
SLIDE 70

Presentation

1. Pick a topic from the list 2. Understand it as if you were the author

– If there is code, understand the code completely

3. Prepare an amazing 15min presentation

– Discuss with me/David before the presentation, 5 days before the presentation

slide-71
SLIDE 71

Class Assignment

Two assignments to get you familiar with deep learning.

Toolboxes

  • CAFFE
  • TORCH

Fine-tuning and Learning-from-scratch

slide-72
SLIDE 72

Class Project

Opportunity to work on the crazy idea which your advisor would not let you do ! (Group of 2-3)

Merit Criteria 1.Crazy (the more different it sounds the better it is) 2.Amount of Work/Results. 3.Report/Presentation

Failure/Success has no points! An idea with interesting failure results is a successful project!

slide-73
SLIDE 73

End of Semester Awards

  • We will vote for:

– Best Project – Best Presentation

slide-74
SLIDE 74

Logistics

  • Waitlist - Class size restricted to 51

students

  • Talk to me after class!