Daily Activity Recognition Combining Gaze Motion and Visual - - PowerPoint PPT Presentation

daily activity recognition combining gaze motion and
SMART_READER_LITE
LIVE PREVIEW

Daily Activity Recognition Combining Gaze Motion and Visual - - PowerPoint PPT Presentation

Daily Activity Recognition Combining Gaze Motion and Visual Features Yuki Shiga, Takumi Toyama, Yuzuko Utsumi, Andreas Dengel, Koichi Kise Outline Introduction Proposed Method Experiment Conclusion Outline Introduction


slide-1
SLIDE 1

Daily Activity Recognition Combining Gaze Motion and Visual Features

Yuki Shiga, Takumi Toyama, Yuzuko Utsumi, Andreas Dengel, Koichi Kise

slide-2
SLIDE 2
  • Introduction
  • Proposed Method
  • Experiment
  • Conclusion

Outline

slide-3
SLIDE 3
  • Introduction
  • Proposed Method
  • Experiment
  • Conclusion

Outline

slide-4
SLIDE 4

Gaze Motion Vision

Focus

  • Activity recognition draws public attention
  • Focus on vision-based and Gaze motion-based method
  • These methods deal with activities that involve eye

movements

slide-5
SLIDE 5

Eye Tracker

  • An eye tracker is useful for recognizing activities that involve

eye movements

  • Record a scene image video as well as the gaze position data

Scene Image Gaze Position (Where the User Fixates)

slide-6
SLIDE 6

Related Works

  • Gaze motion-based activity recognition:
  • Bulling et al., “Eye movement analysis for activity recognition

using electrooculography.”[1]

  • Vision-based activity recognition:
  • Hipny et al., “Recognizing Egocentric Activities from Gaze

Regions with Multiple-Voting Bag of Words.”[2]

They used only each modality (Motion or Vision)

[2] Hipiny IM, Mayol-Cuevas W. Recognising Egocentric Activities from Gaze Regions with Multiple-Voting Bag of Words. CSTR-12-003. 2012. [1] Bulling, Andreas, Ward, Jamie, Gellersen, Hans, and Töster, Gerhard. Eye movement analysis for activity recognition using electrooculography. IEEE transactions on pattern analysis and machine intelligence, 33, 4 (2011), 741-53.!

slide-7
SLIDE 7

Purpose

Activity can also be expressed by "what eyes see” can be expressed by "how eyes move” We use both vision-based and gaze motion-based modality for activity recognition

slide-8
SLIDE 8
  • Propose a method combining gaze motion-based method

and vision-based method

  • Verify the hypothesis: 


Both combination of vision and gaze motion can improve recognizing activities that involve eye movements

Purpose

slide-9
SLIDE 9
  • Introduction
  • Proposed Method
  • Experiment
  • Conclusion

Outline

slide-10
SLIDE 10

Gaze Motion Feature

Overview

Visual Feature Classifier Classifier

Eye Tracker Record Gaze Points and Scene Images

Fusion Result

Output Output

slide-11
SLIDE 11

Gaze Motion Feature

Overview

Visual Feature Classifier Classifier

Eye Tracker Record Gaze Points and Scene Images

Fusion Result

Output Output

slide-12
SLIDE 12

Gaze Motion Feature

[1] Bulling, Andreas, Ward, Jamie, Gellersen, Hans, and Töster, Gerhard. Eye movement analysis for activity recognition using electrooculography. IEEE transactions on pattern analysis and machine intelligence, 33, 4 (2011), 741-53.!

Fixation Saccade

Representing Size and Direction of Saccade

Convert

R r r r R L L r r r R R

  • The method proposed by Bulling et al.

Statistical Fature

N-gram method

slide-13
SLIDE 13

Gaze Motion Feature

Overview

Visual Feature Classifier Classifier

Eye Tracker Record Gaze Points and Scene Images

Fusion Result

Output Output

slide-14
SLIDE 14

Visual Feature

Crop a region around gaze points to remove a irrelevant region

slide-15
SLIDE 15

Visual Feature

Crop a region around gaze points to remove a irrelevant region

slide-16
SLIDE 16

Local Feature Extraction

Intrest Points by Dense Sampling Extract Local Features (PCA-SIFT) From Each Point

slide-17
SLIDE 17

Convert to Global Feature

Learning Image k-means clustering k centroids (visual words) … Test Image Nearest Neighbor Search to visual words

Global Feature

slide-18
SLIDE 18

Gaze Motion Feature

Overview

Visual Feature Classifier Classifier

Eye Tracker Record Gaze Points and Scene Images

Fusion Result

Output Output

slide-19
SLIDE 19

Classifier

Read Write Type ~ Feature Vector For Learning

  • SVM with Probability Estimation
  • Two classifiers are made for visual and gaze motion features
slide-20
SLIDE 20

Classifier

Read Write Type ~ Feature Vector for Test

slide-21
SLIDE 21

Classifier

Read Write Type Type Write Read

Probability

slide-22
SLIDE 22

Gaze Motion Feature

Overview

Visual Feature Classifier Classifier

Eye Tracker Record Gaze Points and Scene Images

Fusion Result

Output Output

slide-23
SLIDE 23

Fusion

Read Type Write Read

Probability from gaze motion

Type Write Read

Probability from vision

slide-24
SLIDE 24

Fusion

Type Write Read

Probability from gaze motion

Type Write Read

Probability from vision

Type Write Read

Combined probability

Average

slide-25
SLIDE 25
  • Introduction
  • Proposed Method
  • Experiment
  • Conclusion

Outline

slide-26
SLIDE 26

Experiments

  • Baseline: 


Whether combined method performs better than individual vision-based and gaze motion-based method

  • Cross-scene:


Whether the combined method performs when target

  • bjects are different between training and test data
  • Cross-user:


Whether the combined method performs when test data contains a person different from training data

Target Objects / Environments User

Baseline

Same Same

Cross-scene

Different Same

Cross-user

Same Different

slide-27
SLIDE 27
  • Sampling rate of the eye tracker: 30 Hz
  • Resolution of the scene camera: 


1280 × 960 Pixels

  • Visual features are extracted from 


300 × 300 pixels around gaze points

  • Gaze motion features are extracted from 


700 gaze samples

Condition of All Experiments

slide-28
SLIDE 28

Activity List

Watch a video Write text Read text Type text Have a chat Walk

slide-29
SLIDE 29

Baseline Experiment

Wach a video Write text Read Text Type text Have a chat Walk Scene 1 Scene 2 Scene 3 Scene 4

  • 1 person
  • Contains 4 different scenes
  • The dataset was divided into 2 parts
slide-30
SLIDE 30

Baseline Experiment

Acuracy(%)

25 50 75 100 Watch Write Read Type Chat Walk Avg.

Gaze motion Visual Proposed

  • The accuracy of the proposed method was the best
slide-31
SLIDE 31

Cross-scene Experiment

Wach a video Write text Read Text Type text Have a chat Walk Scene 1 Scene 2 Scene 3 Scene 4

  • 3 people
slide-32
SLIDE 32

Cross-scene Experiment

Wach a video Write text Read Text Type text Have a chat Walk Scene 1 Scene 2 Scene 3 Scene 4

  • 3 people
  • Leave-one-out cross validation

Leave Out for Test Data

slide-33
SLIDE 33

Cross-scene Experiment

Acuracy(%) 25 50 75 100 Watch Write Read Type Chat Walk Avg.

Proposed(Baseline) Propsed(Cross-scene)

  • The recognition rate of Cross-scene is lower than Baseline
slide-34
SLIDE 34

Cross-scene Experiment

Acuracy(%) 25 50 75 100 Watch Write Read Type Chat Walk Avg.

Gaze motion(Baseline) Gaze motion(Cross-scene)

Acuracy(%) 25 50 75 100 Watch Write Read Type Chat Walk Avg.

Visual(Baseline) Visual(Closs-scene)

  • Both of recognition rates dropped
  • Gaze motion also depends on targets or environments
slide-35
SLIDE 35

Cross-user Experiment

Wach a video Write text Read Text Type text Have a chat Walk Scene 1 Scene 2

× 7 people 1 person: test The rest 6 people: training

slide-36
SLIDE 36

Cross-user Experiment

Acuracy(%) 25 50 75 100 Watch Write Read Type Chat Walk Avg.

Proposed(Baseline) Proposed(Cross-user)

  • The recognition rate of Cross-user is lower than Baseline
slide-37
SLIDE 37

Cross-user Experiment

Acuracy(%) 25 50 75 100 Watch Write Read Type Chat Walk Avg.

Gaze motion(Baseline) Gaze motion(Cross-user)

  • Gaze motions are different between people
  • Gaze motions of “Read” activity are similar between different

people

slide-38
SLIDE 38
  • Introduction
  • Proposed Method
  • Experiment
  • Conclusion

Outline

slide-39
SLIDE 39
  • Combined gaze motion feature and visual feature to

recognize daily activities that involve eye movements

  • The results from the experiments show that the

recognition accuracy is higher when we combine vision- based method and gaze motion-based method

Conclusion

slide-40
SLIDE 40

Daily Activity Recognition Combining Gaze Motion and Visual Features

Yuki Shiga, Takumi Toyama, Yuzuko Utsumi, Andreas Dengel, Koichi Kise

slide-41
SLIDE 41

Cross-User Experiment

Acuracy(%) 25 50 75 100 Watch Write Read Type Chat Walk Avg.

Visual(Baseline) Visual(Closs-user)