Perceptive Context Trevor Darrell Vision Interface Group MIT CSAIL - - PowerPoint PPT Presentation

perceptive context
SMART_READER_LITE
LIVE PREVIEW

Perceptive Context Trevor Darrell Vision Interface Group MIT CSAIL - - PowerPoint PPT Presentation

Perceptive Context Trevor Darrell Vision Interface Group MIT CSAIL Perceptive Context Awareness of the User -- Visual Conversation Cues: Interfaces (kiosks, agents, robots) are currently blind to usersmachines should be aware of


slide-1
SLIDE 1

Perceptive Context

Trevor Darrell Vision Interface Group MIT CSAIL

slide-2
SLIDE 2

Perceptive Context

Awareness of the User -- Visual Conversation Cues: Interfaces (kiosks, agents, robots…) are currently blind to users…machines should be aware of presence, pose, expression, and non-verbal dialog cues… Awareness of the Environment -- Perceptive Devices: Mobile devices (cellphones, PDAs, laptops) bring computing and communications with us wherever we go, but they are blind to their environment…they should be able to see things of interest in the environment just as we do…

slide-3
SLIDE 3

Today

  • Visually aware conversational interfaces (“read my body

language!”)

  • head modeling and pose estimation
  • articulated body tracking
  • Mobile devices that can see their environment (“what’s

that thing there?”)

  • mobile location specification
  • image-based mobile web browsing
slide-4
SLIDE 4

Head modeling and pose tracking

slide-5
SLIDE 5

3D Head Pose Tracker

Current frame Reference frame Stereo camera

rigid stereo motion estimation

intensity range

slide-6
SLIDE 6

Face aware interfaces

  • Agent should know when it’s being attended to
  • Turn-taking discourse cues: who is talking to whom?
  • Model attention of user
  • Agreement: head nod and shake gestures
  • Grounding: shared physical reference
slide-7
SLIDE 7

Face cursor

slide-8
SLIDE 8

Subject not looking at SAM: ASR turned off SAM Pose tracker

Face-responsive agent

slide-9
SLIDE 9

Subject looking at SAM: ASR turned on SAM Pose tracker

Face-responsive agent

slide-10
SLIDE 10

Subject not looking at SAM: ASR turned off SAM Pose tracker

Face-responsive agent

slide-11
SLIDE 11

Subject looking at SAM: ASR turned on SAM Pose tracker

Face-responsive agent

slide-12
SLIDE 12

Subject looking at SAM: ASR turned on SAM Pose tracker

Face-responsive agent

  • General conversational turn-taking
  • Agreement (Nod/Shake)
  • Grounding / Object reference…
slide-13
SLIDE 13

Room Range Foreground Plan view

Room tracking for Location Context

Location is an important cue for pervasive computing applications…

  • Location context should provide a finer scale cue than room-ID,

but more abstract than 3-space position and orientation.

  • Regions (“zones”) should be learned from observing actual user

behavior.

slide-14
SLIDE 14

Room Range Foreground Plan view

Learning activity zones

Motion Clustering

Activity zones Zone map formed from

  • bserving user behavior
slide-15
SLIDE 15

Room Range Foreground Plan view

Using activity zones

zone 4 prefs

Activity zones Current zone determines application context

[Koile, Darrell, et. al, UBICOMP 03]

slide-16
SLIDE 16

Articulated pose sensing

slide-17
SLIDE 17

Model-based Approach

depth image

ICP with articulation constraint

model

  • 1. Find closest points
  • 2. Update poses
  • 3. Constrain…
slide-18
SLIDE 18

Interactive Wall

slide-19
SLIDE 19

Multimodal studio

slide-20
SLIDE 20

Articulated Pose from a single image?

Model based approach difficult with more impoverished

  • bservations:
  • contours
  • edge features
  • texture
  • (noisy stereo…)

hard to fit a single image reliably! Example-based learning paradigm

slide-21
SLIDE 21

Example-based matching

  • Match 2-D features against large corpus of 2-D to 3-D

example mappings

  • Fast hashing for approximate nearest neighbor search
  • Feature selection using paired classification problem
  • Data collection: use motion capture data, or exploit

synthetic (but realistic) models

slide-22
SLIDE 22

Parameter sensitive hashing

slide-23
SLIDE 23

2D->3D with Parameter sensitive hashing

slide-24
SLIDE 24

Today

  • Visually aware conversational interfaces -- read my body

language!

  • head modeling and pose estimation
  • articulated body tracking
  • Mobile devices that can see their environment -- what’s

that thing there?

  • mobile location specification
  • image-based mobile web browsing
slide-25
SLIDE 25

Physical awareness

How can device be aware of what user is looking at?

User

slide-26
SLIDE 26

Physical awareness

Asking a friend, “What’s this?”

User

MIT Dome

Human Expert

What is this?

slide-27
SLIDE 27

Instead, use CBIR (Content-based Image Retrieval) system:

User CBIR System

What is this? http://mit.edu/..

IDeixis

slide-28
SLIDE 28
  • Use image (or video) query to database.
  • For place recognition, many current matching

methods can be successful

  • PCA
  • Gobal orientation histograms [Torralba et al.]
  • Local features (Affine-invariant detectors/descriptors

[Schmid], SIFT [Lowe], etc.)

…where to get the database?

CBIR: Content-based Image Retrieval

slide-29
SLIDE 29

The Web

  • Many location images can be found on the web
slide-30
SLIDE 30

First Prototype

  • 2. Send image via

MMS

  • 3. View search result

(matching location images)

  • 4. Browse a relevant

webpage

  • 1. Take an Image
slide-31
SLIDE 31

Images -> keywords (-> images)

  • Hard to compile an image database of entire web!
  • But given matches in subset of web:
  • Extract salient keywords
  • Keyword-based image search
  • Apply content-based filter to keyword-matched pages
  • And/or allow direct keyword search
  • Weighted term/bigram frequency sufficient for early

experiments…

slide-32
SLIDE 32

Bootstrap image web search

Web

Effiel Tower

(2) (4)

CBIR

Bootstrap Image Database

(1)

CBIR

(3)

slide-33
SLIDE 33

Advantages

  • Recognizing distant location (by taking photo)
  • Infrastructure free (by using the web)
  • Large-scale image-based web search (by bootstrapping

keywords)

  • With advances in segmentation, can apply to many other
  • bject recognition problems

– mobile signs – appliance – product packaging

slide-34
SLIDE 34

Visual Interfaces and Devices

Interfaces (kiosks, agents, robots…) are currently blind to users…machines should be aware of presence, pose, expression, and non-verbal dialog cues… Mobile devices (cellphones, PDAs, laptops) bring computing and communications with us wherever we go, but they are blind to their environment…they should be able to see things of interest in the environment just as we do…

slide-35
SLIDE 35

Acknowledgements

David Demirdjian Kimberlie Koile Louis Morency Greg Shakhnarovich Mike Siracusa Konrad Tollmar Tom Yeh & many others…

slide-36
SLIDE 36

END