Sensory Integration Module Vision Seeing and Hearing Events - - PDF document

sensory integration module
SMART_READER_LITE
LIVE PREVIEW

Sensory Integration Module Vision Seeing and Hearing Events - - PDF document

Seeing, Hearing, and Touching: Moving to multimodality Putting It All Together Sensory Integration Module Vision Seeing and Hearing Events Fisher Force & tactile Touching, Seeing, and Hearing MacLean Virtual feedback


slide-1
SLIDE 1

1

Seeing, Hearing, and Touching: Putting It All Together

Sensory Integration Module

  • Seeing and Hearing Events

Fisher

  • Touching, Seeing, and Hearing

MacLean

  • Integrating Applications: Tight Coupling & Physical Metaphors MacLean
  • Integrating Applications: Designing for Intimacy

Fels

The morning talks gave a perspective on how vision science can be used to inform the design of visually complex interfaces such as those used in information visualization

  • systems. The second half of the course looks at intersensory interactions and how they

can inform the move to multimodal environments. These environments combine visual, auditory, and haptic displays with a richer set of inputs from users, including speech, gesture,and Biopotentials. 2

Seeing and Hearing Events (Fisher) 2

Moving to multimodality

Vision Virtual interaction model Force & tactile feedback

Psychophysics of vision, sound, and touch will change when environment is multimodal

Hearing Force display technology works by using mechanical actuators to apply forces to the

  • user. By simulating the physics of the user’s virtual world, we can compute these

forces in real-time, and then send them to the actuators so that the user feels them/ 3

Seeing and Hearing Events (Fisher) 3

Intersensory Interactions

  • Intro and metacognitive gap
  • Integrating Cognitive Science in design
  • Cognitive Architecture

– Modularity and multimodal interaction

  • Information hiding-- conflict resolution
  • Cognitive impenetrability
  • Performance differences between modules
  • Recalibration

– Spatial indexes in complex environments

  • Multimodal cue matching within modules

We begin with a justification for an increased role for theory in the design of these more complex interfaces. I will argue that the combination of a large design space and the structural inability of humans to introspect at the level of sensory and attentional processes makes conventional design techniques inadequate in these situations. This is followed by a brief discussion of the challenges of taking information from Psychology, Kinesiology and other disciplines that may fall under the broad banner of Cognitive Science into account in designing interactive applications. 4

Seeing and Hearing Events (Fisher) 4

Vision systems to multimodality

  • Ron: Vision systems and subsystems

– Pre-attentive vision (gist, layout, events) – Attention (grab ~5 objects for processing) – Combine for “virtual representation”

  • Extend system concept to modalities

– Some are similar across modalities – Some are multimodal – Some are task-dependent

Extending the visual perception studies described by Ron and applied by Tamara to multimodal interaction is conceptually simple, since vision is composed of separate channels that can be thought of as modalities. The move to multimodality is similar to the move to multiple channels, or systems in vision.

slide-2
SLIDE 2

5

Seeing and Hearing Events (Fisher) 5

Extending to complex worlds

  • Lab studies: few events, visual or

auditory

  • In contrast to multimodal interfaces

– Virtual worlds – Augmented reality – Ubiquitous computing

  • How are multiple multimodal events

dealt with in the brain?

The previous studies looked a relatively simple environments by our standards (but complex from the standpoint of psychophysics!). How can we extend these methods to more complex environments? 6

Seeing and Hearing Events (Fisher) 6

Some systems are multimodal

Example: Cross-modal speech system

  • Reduces cognitive load

– Fast, effortless information processing

  • Near-optimal information integration

between cues and sensory modalities

– Fuzzy logic cue integration – Bayesian categorization

Modularity of processing has some advantages-- fast, effortless processing of multiple sensory channels. It comes at a cost of lack of cognitive control and lack of access to early-stage representations. 7

Seeing and Hearing Events (Fisher) 7

Illusory conjunctions occur in artificial multimodal environments

Example: Movie theatre

  • The McGurk effect (face influences

sound)

– Dubbed movie

  • The ventriloquist effect (vision

captures sound location)

– Sound seems to come from actor

Another area where we have conducted research deals with a basic question in multimodal perception-- how do the different sensory channels decide what stimuli get “matched” between vision, hearing, and touch? And, once matched, how are they integrated and resolved into a multimodal or transmodel percept? If, as we have maintained, much of our perceptual processing takes place in systems, stimulus matching and fusion have to take place in multiple systems in parallel-- so how do they come up with the same answer? 8

Seeing and Hearing Events (Fisher) 8

Rensink

Feedback from higher-level areas allows a small number of proto-objects to be stabilized. [Note: These links may be related to (or even the same as?) the FINSTs in Brian Fisher’s talk]

slide-3
SLIDE 3

9

Seeing and Hearing Events (Fisher) 9

Study: Pointing to sounds

  • Cognitive location good
  • Pointing shows visual

capture

– Aware of visual and auditory locations, but point to visual

  • No effect of

phoneme/viseme fit on ventriloquism

  • Slow recalibration of

auditory space if offset constant “Ba” What was sound? Where was source? (point or describe) Looking at the impact of this on perception in complex display environments with multiple visual and auditory events that contain errors in location and category fit generates some counter-intuitive findings. Motor performance is typically found to be less sensitive to illusions than cognitive processes, however it seems that that is not the case for auditory

  • localization. It seems that the dorsal system has a greater drive to combine visual

and auditory events, leading to a greater tolerance for location errors for reaching than for voice measures. In pointing to visual targets in the presence of a visual distractor, we found few illusory errors when users were not allowed to see their hands (such as when using a head-mounted display), but closed loop pointing had errors similar to those observed with vocal interactions. Giving them a visible cursor on the screen hurt performance, and adding a lag to the response of the cursor actually aided performance. 10

Seeing and Hearing Events (Fisher) 10

Our interpretation: 2 systems at work

  • Different multimodal systems solve

feature assignment problem differently

– Motor system: high visual dominance – Cognitive system: low visual dominance – Phoneme/viseme mismatch doesn’t help

  • Vision can recalibrate spatial sound “map”

11

Seeing and Hearing Events (Fisher) 11

Rensink

Feedback from higher-level areas allows a small number of proto-objects to be stabilized. [Note: These links may be related to (or even the same as?) the FINSTs in Brian Fisher’s talk] 12

Seeing and Hearing Events (Fisher) 12

Attentional pointers in systems?

Cognitive processing

Action (motor space)

Auditory localization Visual localization If we look at the impact of these attentional tokens, or FINSTs in multimodal perception in rich sensory environments, we can see two views of how they might

  • work. The naïve view is that the information from two events is “tagged” by a FINST

and is reassembled in cognition after the sensory processes have done their work.

slide-4
SLIDE 4

13

Seeing and Hearing Events (Fisher) 13

Displays and multimodal perception

  • Immersive environments must display a

complex multimodal world

– Virtual Reality must provide entire world – Augmented Reality must blend with real world

  • Multimodal displays have errors

– Location of events is not precise (esp. in depth) – Timing is not precise – Graphics can be low-fidelity

  • What will be the impact of these errors on

users?

Logically, we would assume that correspondence in time and location are critical, and goodness-of-fit (how plausible the match is conceptually) might play a role as well. However these are precisely the areas where errors in scene rendering, stereo conflicts, and poor synchrony might cause errors. 14

Seeing and Hearing Events (Fisher) 14

Disadvantage: Each module must solve feature assignment problem.

  • Modules can’t accept information from
  • ther modules: Information encapsulation
  • Different modules should have access to a

different set of matching cues.

  • Illusory conjunctions can occur in

multimodal environments:

– Phoneme perception: The McGurk effect – Auditory localization: The ventriloquist effect Another area where we have conducted research deals with a basic question in multimodal perception-- how do the different sensory channels decide what stimuli get “matched” between vision, hearing, and touch? And, once matched, how are they integrated and resolved into a multimodal or transmodel percept? If, as we have maintained, much of our perceptual processing takes place in modules, stimulus matching and fusion have to take place in multiple modules in parallel-- so how do they come up with the same answer? 15

Seeing and Hearing Events (Fisher) 15

Study: Impact of display errors on multimodal perception

  • Immersive environments typically

have display errors

– Location of events is not precise – Timing is not precise – Graphics can be low-fidelity

  • As immersive environments add

sound and touch, what will be the impact of these errors?

Logically, we would assume that correspondence in time and location are critical, and goodness-of-fit (how plausible the match is conceptually) might play a role as well. However these are precisely the areas where errors in scene rendering, stereo conflicts, and poor synchrony might cause errors. 16

Seeing and Hearing Events (Fisher) 16

Recalibration by pairing (Epstein, 1975)

  • Individual senses adapt to display
  • Sensory modalities calibrate each other: haptics,

vision, sound

– Observed actions calibrate visual space (space constancy) – Vision calibrates hearing for the location of a multimodal event – Sound calibrates vision for the time of a multimodal event

  • Result is an after-effect: remapping of auditory

(visual, haptic) space One promising theory examines how the basic characteristics of space and time are compared between modalities in order to calibrate them against themselves. This fundamentally depends on the consistencies of events in the real world. How will virtual worlds affect this process?

slide-5
SLIDE 5

17

Seeing and Hearing Events (Fisher) 17

Impact of information encapsulation

  • Multimodal environment with errors in

timing and location

  • The same event might give rise to a single

multimodal construct in one task, and two unimodal events for another.

– Vary location of visual and auditory phonemes in a simple teleconferencing-style video display – Vary information carried by using synthetic speech stimuli (5 levels). We can intentionally vary these errors to see how observers respond. 18

Seeing and Hearing Events (Fisher) 18

Ventriloquism meets the McGurk effect.

  • Vary location of visual and auditory phonemes in a

simple teleconferencing-style video display

  • Vary information carried by using synthetic speech

stimuli (5 levels).

  • Subjects report sound location and syllable heard,
  • Analyses included testing a variety of

mathematical models of information integration by fitting free parameters with STEPIT. We intentionally introduce errors in category fit by using ambiguous synthetic speech, and multiple loudspeakers hidden behind a curved screen. So we can vary space and fit at will. This is a much more complex set of stimuli than are typically used in perception research, but our interests are in rich sensory environments not psychophysics and sensory primitives. The data that this kind of test produces is difficult to analyze with Psychology statistics such as ANOVA. Instead we use a model fitting analysis 19

Seeing and Hearing Events (Fisher) 19

Use of mathematical modeling tools allow us to address

  • Sensory input from a number of channels

simultaneously

  • How stimuli from multiple channels are

matched and partitioned into mental representations

  • How information from multiple senses is

integrated to give rise to trans-modal mental events

Modeling lets us extend our experiments to more realistic situations. 20

Seeing and Hearing Events (Fisher) 20

Results:

  • Visual capture of auditory source location, resulting in a

shifting of unimodal auditory location estimation (ventriloquism after-effect).

  • No effect of location difference on phoneme perception as

measured by statistical or modeling tests.

  • No correlation between errors in the two tasks (i.e. subjects

could not selectively attend to the auditory phoneme on trials when visual capture failed).

  • Overall, modularity of phoneme perception is supported.

The results supported our hypotheses, but also showed promise in another way-- each user who ran in the test generated an individual set of values that describe their unique set of weightings of the perceptual stimuli within the context of the more complex environment and task.

slide-6
SLIDE 6

21

Seeing and Hearing Events (Fisher) 21

Changes in task interact with modules

  • 2 visual systems—“ventral stream” for

recognition and “dorsal stream” for action.

  • Where vs how
  • Different impact of illusions
  • Lesion data

One set of modules that is difficult to understand conceptually comes from the neuroanatomy of vision. Research has shown that there are separate brain areas that support motor performance and scene understanding. In mammals, a phylogenetically older dorsal visual system deals primarily with motor activities. It is a relatively inaccurate system, but it is robust to changes in head, eye, and body positions,and has access to proprioceptive information to aid in coordinating perception with action. A second ventral visual system deals with small field operations that require superior

  • precision. While it allows for fine discrimination between stimuli, it sacrifices the

proprioceptive input and eye, head and body position information, and the ability to co-ordinate unseen body parts with visual information. 22

Seeing and Hearing Events (Fisher) 22

Functional Neuroanatomy of perception for action.

2 visual systems—“ventral stream” for cognition and “dorsal stream” for motor performance. 23

Seeing and Hearing Events (Fisher) 23

2 visual systems lesion evidence

lesion performance deficits spared abilities V1 (blindsight) detection and identification pointing Ventrolateral

  • ccipital (DF)

identification, shape recognition, object

  • rientation
  • bject manipulation

(orientation matching, grip scaling) Posterior parietal (RV)

  • bject manipulation

(orientation matching, grip scaling) identification, shape recognition, object

  • rientation

Evidence from brain-damaged patients support this disassociation in humans. 24

Seeing and Hearing Events (Fisher) 24

2 visual system illusions

stimuli deficits spared abilities

Tichner circles size report grip scaling displacement during saccade detection of displacement, location report pointing Moving or off- centre frame induced motion, location report pointing

slide-7
SLIDE 7

25

Seeing and Hearing Events (Fisher) 25

Study: Pointing in large displays (Po)

  • Tell me where the

target is

  • Point with no

feedback

  • Point with visual

feedback (cursor)

  • Point with delayed

visual feedback

Looking at the impact of this on perception in complex display environments with multiple visual and auditory events that contain errors in location and category fit generates some counter-intuitive findings. Motor performance is typically found to be less sensitive to illusions than cognitive processes, however it seems that that is not the case for auditory

  • localization. It seems that the dorsal system has a greater drive to combine visual

and auditory events, leading to a greater tolerance for location errors for reaching than for voice measures. In pointing to visual targets in the presence of a visual distractor, we found few illusory errors when users were not allowed to see their hands (such as when using a head-mounted display), but closed loop pointing had errors similar to those observed with vocal interactions. Giving them a visible cursor on the screen hurt performance, and adding a lag to the response of the cursor actually aided performance. 26

Seeing and Hearing Events (Fisher) 26

Findings

1. Can you tell if a target is on the left or right?

  • 3 out of 7 males, 7 out of 7 females made errors

2. Can you point to it with no visual feedback?

  • 6 out of 10 who failed #1 were correct

3. Are you better with a (simulated) laser pointer?

  • Out of 6 who point accurately in 2, all fail

4. Will pointing accuracy be affected if visible pointer lags pointing?

  • 3 of the 6 who failed #3 succeed

All results predicted by 2 visual systems hypothesis 27

Seeing and Hearing Events (Fisher) 27

Research with videoconferencing and abstract displays

  • Targeting sound: cognitive better than

motor

– Subs aware of visual and auditory locations, but point to visual

  • Targeting vision with context: Less

feedback is better

– Pointing with no visual feedback better – Lagged cursor better than unlagged

Looking at the impact of this on perception in complex display environments with multiple visual and auditory events that contain errors in location and category fit generates some counter-intuitive findings. Motor performance is typically found to be less sensitive to illusions than cognitive processes, however it seems that that is not the case for auditory

  • localization. It seems that the dorsal system has a greater drive to combine visual

and auditory events, leading to a greater tolerance for location errors for reaching than for voice measures. In pointing to visual targets in the presence of a visual distractor, we found few illusory errors when users were not allowed to see their hands (such as when using a head-mounted display), but closed loop pointing had errors similar to those observed with vocal interactions. Giving them a visible cursor on the screen hurt performance, and adding a lag to the response of the cursor actually aided performance. 28

Seeing and Hearing Events (Fisher) 28

Interpreting pointing studies

  • Pointing studies counterintuitive, but

predicted by response characteristics

  • f neurons in dorsal/ventral to visual

and auditory stimuli

See our Smart Graphics 03 paper for more on this study.

slide-8
SLIDE 8

29

Seeing and Hearing Events (Fisher) 29

Extending to complex worlds

  • Previous studies in simple worlds, with a

few visual and auditory events

  • Multimodal environments are complex

– Virtual worlds – Augmented reality – Ubiquitous computing

  • How are multiple multimodal events dealt

with in the cognitive architecture?

The previous studies looked a relatively simple environments by our standards (but complex from the standpoint of psychophysics!). How can we extend these methods to more complex environments? 30

Seeing and Hearing Events (Fisher) 30

Indexical cognition (Pylyshyn)

31

Seeing and Hearing Events (Fisher) 31

Mental representations of complex worlds

  • Cognitive architecture perspective requires

that links be established between lower level perceptual qualities and cognitive symbols—i.e. a pointer, called a FINST.

  • FINSTing allows us to interact with

perceptual objects and events without the need for mental images per se.

  • Symbolic representation + pointers makes

different predictions than intuitive picture- in-the-head

As Ron demonstrated for you earlier, our mental models of complex scenes are not as complete as we think. He suggested that much of what we think of as our mental representation is actually in the world, and we sample from it in real time as needed. The FINST hypothesis is one theory about how we might do that. 32

Seeing and Hearing Events (Fisher) 32

Indexical cognition (Pylyshyn)

According to this theory, we have a limited number of places in the scene that receive a high level of processing. The rest of the scene is processed to a much more limited extent, and if a change is masked as in Ron’s demos, it will go unnoticed.

slide-9
SLIDE 9

33

Seeing and Hearing Events (Fisher) 33

Naïve view of FINSTs in Cognitive Arch

Phoneme perception Voice recognition Auditory localization Cognitive processing

Action (motor space) FINSTs

If we look at the impact of these attentional tokens, or FINSTs in multimodal perception in rich sensory environments, we can see two views of how they might

  • work. The naïve view is that the information from two events is “tagged” by a FINST

and is reassembled in cognition after the sensory processes have done their work. 34

Seeing and Hearing Events (Fisher) 34

Another view of FINSTs

Phoneme perception Voice recognition Auditory localization Cognitive processing

Action (motor space) FINSTs

An alternative view would avoid the assembly process, and simply use the labels. 35

Seeing and Hearing Events (Fisher) 35

Multimodal representations are virtual

All modalities store little info in memory: instead they take up information as needed

– Vision-- attention, eye, head and body movements change view – Haptics-- active exploration of space with hands – Hearing-- uses body and head movements to localize sound and improve quality 36

Seeing and Hearing Events (Fisher) 36

Mental representations of complex environments

  • Cognitive architecture perspective requires that

links be established between lower level perceptual qualities and cognitive symbols—i.e. a pointer, called a FINST.

  • FINSTing allows us to interact with perceptual
  • bjects and events without the need for mental

images per se.

  • Symbolic representation + pointers makes

different predictions than intuitive picture-in-the- head

  • Coping with spatial transformations in complex

data spaces I will conclude with a review of key aspects of the talk, and then ask for questions.

slide-10
SLIDE 10

37

Seeing and Hearing Events (Fisher) 37

More about FINSTs

  • FINSTs Link mind & perceptual world

– Visual routines: (collinear, inside, subitizing) – History of an object – Object-centred, “sticky” – Drawn to salient changes-- onsets, luminance increments, oddballs – Finite number ~ 4-7 – FINSTs + ANCHORs for motor behaviour Whichever model is true, there are some repercussions to FINSTing an object. 38

Seeing and Hearing Events (Fisher) 38

More about ANCHORs

  • ANCHORs link mind & action

– Remembered locations for eye movements – Direct interaction with items off the retina – Fast, robust motor performance by action routines – Affordances for action

A second mechanism that we will not be able to spend much time on is a pointer in motor space called an ANCHOR. This mediates skilled motor performance by downloading much of the tasks to low-level perceptuo-motor routines. 39

Seeing and Hearing Events (Fisher) 39

Multimodal events support adaptation

  • Individual senses adapt to display
  • Modalities use multimodal events for

cross-calibration

– Observed actions calibrate visual space – Vision calibrates sound location – Sound calibrates vision for time

  • Result includes after-effect: a

remapping of perceptual space

(Epstein, 1975)

One promising theory examines how the basic characteristics of space and time are compared between modalities in order to calibrate them against themselves. This fundamentally depends on the consistencies of events in the real world. How will virtual worlds affect this process? 40

Seeing and Hearing Events (Fisher) 40

Research question: Role of focal attention?

Are attentional resources shared between senses?

  • Will adding sound and haptics impact

visual attention?

– Or, will it offload processing from vision?

  • Does a shift in one modality cause

complementary attention shifts in

  • thers?
  • Does recalibration require attention?

As the need for interfaces making extremely efficient use of limited perceptual resources, sharing of attention becomes something we need to understand better. There’s been quite a bit of study about attentional distribution within vision; less with audition, and virtually none with touch. Even less studied is attention as shared among

  • senses. If, for example, we plan to offload the visual sense by delivering information

haptically, we better know whether this transfer of work will actually unload total attention required – or make the situation even worse. A group at UBC is working on this problem right now.

slide-11
SLIDE 11

41

Seeing and Hearing Events (Fisher) 41

Research Topic: Pointers for action?

  • Attentional pointers link mind and world
  • Do “action pointers” link mind &

muscles?

– Remembered locations for eye movements – Direct interaction with items off the retina – Fast, robust motor performance by action routines

A second mechanism that we will not be able to spend much time on is a pointer in motor space called an ANCHOR. This mediates skilled motor performance by downloading much of the tasks to low-level perceptuo-motor routines. 42

Seeing and Hearing Events (Fisher) 42

Research Topic: Individual differences

  • Perceptual rules are the same
  • Impact differ over time and for individuals

– e.g. sensitivity to stereo depth & spatial sound cues – Ability to adapt to new cue combinations

  • Perceptual customization may help

– For individuals: “personal equation” for interaction – In real time, through attentive computing

The“personal equation” was an invention of Freidrich Bessels who died in 1846 Modern astronomy of precision is essentially Bessels
  • creation. In astronomy the personal equation is the amount by which a measurement made by a particular individual differs from a
standard (usually the mean of other observer’s measurements). It is essentially a fudge factor that compensates for the characteristic deviations of a particular individual. This controls for the constant part of measurement error (between subject error) , leaving trial-by- measurement errors (within subject error). The concept of a personal equation was important component of Wundt’s Psychophysics. A personal equation of interaction can be thought of as solving the personal equation for the individual: instead of modifying the measurement to better match objective reality, we modify reality (or its simulation) to better fit the individual’s perceptual, attentive and cognitive characteristics

43

Seeing and Hearing Events (Fisher) 43

Module disadvantages

  • Coordination

– Distortions in location, timing, and category-relevant information may lead to the formation of conflicting representations in different modules.

  • Processing inflexibility

– Errors and conflicts within a module can create errors and increase cognitive load. (CRT flicker example)

  • Information hiding

Cognitive impenetrability of modules makes it difficult for

  • perators to determine the reasons for their poor

performance.

44

Seeing and Hearing Events (Fisher) 44

Future challenges

  • Perception, cognition, & action in multimodal environment

with many event, and actors

  • Applications in entertainment, cognition, communication
  • Blend of virtual and real spaces… with seams

– Are the rules consistent? – Can users shift between them? – Can frames support rule shifts?

Thus, large screen, multimodal and virtual environments, augmented reality, ubiquitous computing etc. pose difficult problems for designers.

slide-12
SLIDE 12

45

Seeing and Hearing Events (Fisher) 45

Opportunities for creative design

  • Environments: Affordances for exploration

– Spatial cognition, human space constancy theory

  • Support for creative & logical thinking

– Problem solving, embodied cognition models

  • Media-based communication & collaboration

– Metacognition, distributed cognition

  • Experience (Kansei) engineering: Moving beyond

usability 46

Seeing and Hearing Events (Fisher) 46

What to expect in the next talk

  • More on haptics
  • Other senses

– Neuromuscular,GSR, heart rate, brain, other biopotentials

  • Applications

– Displays, input, & sensing technologies – Design examples – Virtual environments

  • Communicating human experience: information, emotion,

environment – Intimacy and embodiment – Sources of aesthetics