Object vision (Chapter 4) Lecture 9 Jonathan Pillow Sensation - - PowerPoint PPT Presentation

object vision chapter 4
SMART_READER_LITE
LIVE PREVIEW

Object vision (Chapter 4) Lecture 9 Jonathan Pillow Sensation - - PowerPoint PPT Presentation

Object vision (Chapter 4) Lecture 9 Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325) Princeton University, Fall 2017 1 Introduction What do you see? 2 Introduction What do you see? 3 Introduction What do you see? 4


slide-1
SLIDE 1

Object vision (Chapter 4)

Lecture 9 Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325) 
 Princeton University, Fall 2017

1

slide-2
SLIDE 2

Introduction What do you see?

2

slide-3
SLIDE 3

Introduction What do you see?

3

slide-4
SLIDE 4

Introduction What do you see?

4

slide-5
SLIDE 5

Introduction

How did you recognize that all 3 images were of houses? How did you know that the 1st and 3rd images showed the same house? This is the problem of object recognition, which is solved in visual areas beyond V1.

5

slide-6
SLIDE 6

house-detector receptive field?

Unfortunately, we still have no idea how to solve this problem. Not easy to see how to make Receptive Fields for houses the way we combined LGN receptive fields to make V1 receptive fields!

6

slide-7
SLIDE 7

Viewpoint Dependence View-dependent model - a model that will only recognize particular views of an object

  • template-based model

e.g. Problem: need a neuron (or “template”) for every possible view of the object

  • quickly run out of neurons!

“house” template

7

slide-8
SLIDE 8

Middle Vision Middle vision: – after basic features have been extracted and before

  • bject recognition and scene understanding
  • Involves perception of edges and surfaces
  • Determines which regions of an image should be

grouped together into objects

8

slide-9
SLIDE 9

Finding edges

  • How do you find the edges of objects?
  • Cells in primary visual cortex have small

receptive fields

  • How do you know which edges go

together and which ones don’t?

9

slide-10
SLIDE 10

Middle Vision Computer-based edge detectors are not as good as humans

  • Sometimes computers find too many edges
  • “Edge detection” is another failed theory (along with Fourier

analysis!) of what V1 does.

10

slide-11
SLIDE 11

Middle Vision Computer-based edge detectors are not as good as humans

  • Sometimes computers find too few edges

11

slide-12
SLIDE 12

Figure 4.5 This “house” outline is constructed from illusory contours

“Kanizsa Figure” illusory contour: a contour that is perceived even though no luminance edge is present

12

slide-13
SLIDE 13

Gestalt Principles

  • Gestalt: In German, “form” or “whole”
  • Gestalt psychology: “The whole is greater than the sum
  • f its parts.”
  • Opposed to other schools of thought (e.g., structuralism)

that emphasize the basic elements of perception structuralists:

  • perception is built up from “atoms” of

sensation (color, orientation)

  • challenged by cases where perception

seems to go beyond the information available (eg, illusory contours)

13

slide-14
SLIDE 14

Gestalt Principles

Gestalt grouping rules: a set of rules that describe when elements in an image will appear to group together

14

slide-15
SLIDE 15

Gestalt Principles Good continuation: A Gestalt grouping rule stating that two elements will tend to group together if they lie on the same contour

15

slide-16
SLIDE 16

Gestalt Principles Good continuation: A Gestalt grouping rule stating that two elements will tend to group together if they lie on the same contour

16

slide-17
SLIDE 17

Gestalt Principles Gestalt grouping principles: § Similarity § Proximity

17

slide-18
SLIDE 18

Dynamic grouping principles § Common fate: Elements that move in the same direction tend to group together § Synchrony: Elements that change at the same time tend to group together (See online demonstration: book website) Gestalt Principles http://sites.sinauer.com/wolfe4e/wa04.01.html

18

slide-19
SLIDE 19

Figure/Ground Segregation: Face/Vase Illusion “ambiguous figure”

19

slide-20
SLIDE 20

Gestalt figure–ground assignment principles:

  • Surroundedness: The surrounding region is likely

to be ground

  • Size: The smaller region is likely to be figure
  • Symmetry: A symmetrical region tends to be seen

as figure

  • Parallelism: Regions with parallel contours tend to

be seen as figure

  • Extremal edges: If edges of an object are shaded

such that they seem to recede in the distance, they tend to be seen as figure Gestalt Principles

20

slide-21
SLIDE 21
  • Accidental viewpoint:

produces a regularity in the visual image that is not present in the world

  • Visual system will not

adopt interpretations that assume an accidental viewpoint!

21

slide-22
SLIDE 22
  • non-accidental

viewpoint
 


“typical” viewpoint: interpretation won’t change if you move the camera a little bit

22

slide-23
SLIDE 23

Accidental Viewpoints

  • Belivable 3-d figure:

23

slide-24
SLIDE 24

Accidental Viewpoints

  • Unbelievable figure

You could build a 3D

  • bject that would lead to

this 2D image, but would need to take the picture from a very specific viewpoint

24

slide-25
SLIDE 25

Impossible triangle (Perth, Australia)

25

slide-26
SLIDE 26

Impossible triangle (Perth, Australia)

26

slide-27
SLIDE 27

Accidental Viewpoints in street art

27

slide-28
SLIDE 28

28

slide-29
SLIDE 29

29

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

36

slide-37
SLIDE 37

...one more argument against Naive Realism:

West Vancouver

Speed Bumps of the Future: Children

37

slide-38
SLIDE 38

Speed Bumps of the Future: Children

“the girl’s elongated form appears to rise from the ground as cars approach, reaching 3D realism at around 100 feet, and then returning to 2D distortion once cars pass that ideal viewing distance. Its designers created the image to give drivers who travel at the street’s recommended 18 miles per hour (30 km per hour) enough time to stop before hitting Pavement Patty–acknowledging the spectacle before they continue to safely roll over her.”

  • Joseph Calamia (Discover magazine blog)

Speed Bumps of the Future: Children

“It’s a static image. If a driver can’t respond to this appropriately, that person shouldn’t be driving….” - David Duane, BCAA Traffic Safety Foundation

http://tinyurl.com/358r46p

38

slide-39
SLIDE 39

Nonaccidental feature: features that do not depend on the exact (or accidental) viewing position of the observer T junctions: indicate occlusion Y junctions: indicate corners facing the observer Arrow Junctions: corners facing away from observer

  • these feature are still present

if object is shifted, scaled or rotated by a small amount

39

slide-40
SLIDE 40

Problems with view-invariant theories: Object recognition = not completely viewpoint-invariant!

Viewpoint affects object recognition § The farther an object is rotated away from a learned view, the longer it takes to recognize “greebles” (1998)

40

slide-41
SLIDE 41

Viewpoint-invariance in the nervous system

Quiroga et al 2005: single-electrode recordings in humans! Inferotemporal (IT) cortex

  • high selectivity to people / things, independent of viewpoint
  • e.g., “Jennifer Anniston neuron”

41

slide-42
SLIDE 42

Face Recognition

42

slide-43
SLIDE 43

Face Recognition: not entirely viewpoint-invariant!

43

slide-44
SLIDE 44

Conclusion:

  • object recognition is somewhat but not entirely

viewpoint invariant

  • observers do seem to store certain preferred

views of objects. Makes sense from an evolutionary standpoint: We generate representations that are as invariant as we need them to be for practical applications

44

slide-45
SLIDE 45

Two facts that constrain any models of

  • bject recognition in the visual system

45

slide-46
SLIDE 46
  • 1. Visual processing divided into two cortical streams:

Dorsal stream (“where” pathway) Ventral stream (“what” pathway) V1

  • Separate pathways for

“what” and “where” information Text

46

slide-47
SLIDE 47
  • 2. Object recognition is fast. (100-200 ms)

Suggests operation of a feed-forward process.

(Still debated, but it’s agreed there’s not much time for feedback).

Feed-forward process: computation carried out one neural step after another, without need for feedback from a later stage

(5 frames /s)

47

slide-48
SLIDE 48

pandemonium model

  • Oliver Selfridge’s (1959) simple model of letter

recognition

  • Perceptual committee made up of “demons”
  • Demons loosely represent neurons
  • Each level is a different brain area
  • Pandemonium simulation:


http://sites.sinauer.com/wolfe4e/wa04.02.html

Models of Object Recognition

Models of Object Recognition

48

slide-49
SLIDE 49

Models of Object Recognition

49

slide-50
SLIDE 50

Models of Object Recognition

50

slide-51
SLIDE 51

Models of Object Recognition

51

slide-52
SLIDE 52
  • Hierarchical “constructive” models of perception:
  • Explicit description of how parts are combined to

form representation of a whole

Models of Object Recognition

Metaphor: “committees” forming consensus from a group of specialized members

  • perception results from the consensus that emerges

52

slide-53
SLIDE 53

rapid progress in “deep learning” methods for

  • bject recognition & scene understanding

http://www.nytimes.com/2014/11/18/science/researchers-announce- breakthrough-in-content-recognition-software.html

Researchers Announce Advance in Image- Recognition Software (NY Times, Nov 2015)

53

slide-54
SLIDE 54

Human: “A group of men playing Frisbee in the park.” Computer model: “A group of young people playing a game of Frisbee.”

Captioned by Human and by Google’s Experimental Program

54

slide-55
SLIDE 55

Human: “Three different types of pizza on top of a stove.” Computer: “A pizza sitting on top of a pan on top of a stove.”

Captioned by Human and by Google’s Experimental Program

55

slide-56
SLIDE 56

Human: “Elephants of mixed ages standing in a muddy landscape.” Computer: “A herd of elephants walking across a dry grass field.”

Captioned by Human and by Google’s Experimental Program

56

slide-57
SLIDE 57

Human: “A green monster kite soaring in a sunny sky.” Computer: “A man flying through the air while riding a snowboard.”

Captioned by Human and by Google’s Experimental Program

57