The experiments for You -Do, I- Learn Presenter: Wenguang Mao - - PowerPoint PPT Presentation

the experiments for
SMART_READER_LITE
LIVE PREVIEW

The experiments for You -Do, I- Learn Presenter: Wenguang Mao - - PowerPoint PPT Presentation

The experiments for You -Do, I- Learn Presenter: Wenguang Mao Instructor: Kristen Grauman Author for the paper: Dima Damen Recap of the Paper Gaze attention Gaze point Clustering position Clustering TRO MOI Gaze area appearance


slide-1
SLIDE 1

The experiments for “You-Do, I-Learn”

Presenter: Wenguang Mao Instructor: Kristen Grauman Author for the paper: Dima Damen

slide-2
SLIDE 2

Recap of the Paper

Gaze point position Gaze area appearance Gaze attention Neighbor frames TRO MOI

Clustering Clustering

slide-3
SLIDE 3

Experiment Setup

  • Dataset: Bristol Egocentric Object Interactions Dataset
slide-4
SLIDE 4

Experiment Setup

  • Dataset: Bristol Egocentric Object Interactions Dataset
  • Egocentric videos at 6 locations
  • Gaze point on each frame
  • Gaze positions in 3D space
  • Gaze fixation on each frame
  • Ground truth positions of TROs
  • 3D map for each location, 3D positions of the camera for each frame, ……
  • Code: VLFeat, Matlab toolboxes, and programs written by myself
slide-5
SLIDE 5

Why Need Gaze Info

  • Given an egocentric image, which part of the image do you think I am

focusing on?

  • Center of image?
  • Blue point: center of image
  • Red point: gaze point
slide-6
SLIDE 6

Why Need Gaze Info

  • The distance between the center and the gaze point

(a) Desk (b) Door

slide-7
SLIDE 7

Why Need Gaze Info

  • The distance between the center and the gaze point

(a) Desk (b) Door

Center of image is not good approximation for the gaze point

slide-8
SLIDE 8

Why Need Gaze Info

  • The distance between the center and the gaze point (during gaze

fixation)

(a) Desk (b) Door

slide-9
SLIDE 9

Why Need Gaze Info

  • The distance between the center and the gaze point (during gaze

fixation)

(a) Desk (b) Door

Center of image is not good approximation for the gaze point Even during attention period

slide-10
SLIDE 10

How Gaze Fixation Helps

  • Do you think there is any TRO in the video clips
  • Red dot: gaze point
slide-11
SLIDE 11

How Gaze Fixation Helps

  • Do you think there is any TRO in the video clips
  • Red dot: gaze point

Gaze fixation helps identify a TRO

slide-12
SLIDE 12

How Gaze Fixation Helps

  • Do you think there is any TRO in the video clips
  • Red dot: gaze point
slide-13
SLIDE 13

How Gaze Fixation Helps

  • Do you think there is any TRO in the video clips
  • Red dot: gaze point

Gaze fixation alone is far from enough to find TROs

slide-14
SLIDE 14

How 3D Positions of Gaze Help

  • Blue circles: 3D positions of gazes in a video
  • Red cross: ground truth positions of TRO

(a) Without gaze fixation filtering (a) With gaze fixation filtering

slide-15
SLIDE 15

How 3D Positions of Gaze Help

  • Blue circles: 3D positions of gazes in a video
  • Red cross: ground truth positions of TRO

(a) Without gaze fixation filtering (a) With gaze fixation filtering

3D gaze positions are very helpful to identify TROs

slide-16
SLIDE 16

Clustering for Gaze 3D positions

  • Right number of clusters (kmeans)
  • Yellow square: cluster center

(a) Without gaze fixation filtering (b) With gaze fixation filtering

slide-17
SLIDE 17

Clustering for Gaze 3D positions

  • Right number of clusters (kmeans)
  • Yellow square: cluster center

(a) Without gaze fixation filtering (b) With gaze fixation filtering

With the knowledge of right number

  • f TROs, they can be easily identified

using 3D gaze positions

slide-18
SLIDE 18

Clustering for Gaze 3D positions

  • Too less clusters
  • Yellow square: cluster center

(a) Without gaze fixation filtering (b) With gaze fixation filtering

slide-19
SLIDE 19

Clustering for Gaze 3D positions

  • Too less clusters
  • Yellow square: cluster center

(a) Without gaze fixation filtering (b) With gaze fixation filtering

If underestimating the number, low precision and low recall for identifying TROs

slide-20
SLIDE 20

Clustering for Gaze 3D positions

  • Too much clusters
  • Yellow square: cluster center

(a) Without gaze fixation filtering (b) With gaze fixation filtering

slide-21
SLIDE 21

Clustering for Gaze 3D positions

  • Too much clusters
  • Yellow square: cluster center

(a) Without gaze fixation filtering (b) With gaze fixation filtering

If overestimating the number, high recall and low precision

slide-22
SLIDE 22

Spectral Clustering

  • Right number of clusters

(a) kmeans (b) spectral

slide-23
SLIDE 23

Spectral Clustering

  • Right number of clusters

(a) kmeans (b) spectral

Same with K-means

slide-24
SLIDE 24

Spectral Clustering

  • Too less clusters

(a) kmeans (b) spectral

slide-25
SLIDE 25

Spectral Clustering

  • Too less clusters

(a) kmeans (b) spectral

Same with k-means

slide-26
SLIDE 26

Spectral Clustering

  • Too much clusters

(a) kmeans (b) spectral

slide-27
SLIDE 27

Spectral Clustering

  • Too much clusters

(a) kmeans (b) spectral

Outperform k-means, high precision and high recall.

slide-28
SLIDE 28

What is the Limitation of Gaze Positions

  • Can we only use 3D gaze positions?
  • No, because of moving TRO
  • How to solve this problem?
  • Appearance
slide-29
SLIDE 29

Appearance

  • How HoG features represent an image
slide-30
SLIDE 30

Appearance

  • How HoG features represent an image

HoG is good to describe the boundary

slide-31
SLIDE 31

Identify TROs based on Appearance

  • Extract HoG from the region near the gaze point for each frame
  • Generate BoW representation for each frame
  • Perform clustering on frames
  • Use the frame closest to the center to represent each cluster
  • Compare the appearance of center frames with the ground truth
slide-32
SLIDE 32

Appearance

  • Five TROs around the desk

charger tape box screwdriver socket

slide-33
SLIDE 33

Results

Success (box) Success (tape) Duplicated (box) Success (charger) Failure

slide-34
SLIDE 34

Results

Success (box) Success (tape) Duplicated (box) Success (charger) Failure

Missing two TROs, the appearance is not as effective as the position

slide-35
SLIDE 35

Using Neighbor frames

Success (box) Success (charger) Success (tape) Success (driver) Failure

slide-36
SLIDE 36

Using Neighbor frames

Success (box) Success (charger) Success (tape) Success (driver) Failure

Missing one TRO, using neighbor frames is helpful to improve performance

slide-37
SLIDE 37

Over-Estimating No. of Clusters

Failure Success (charger) Success (box) Duplicated (box) Duplicated (box) Success (tape) Success (driver) Duplicated (driver)

slide-38
SLIDE 38

Over-Estimating No. of Clusters

Failure Success (charger) Success (box) Duplicated (box) Duplicated (box) Success (tape) Success (driver) Duplicated (driver)

Missing one TROs, over-estimating is helpful to identify more TROs

slide-39
SLIDE 39

Also Using Neighbor frames

Failure Success (charger) Duplicated (box) Success (box) Success (tape) Success (driver) Success (socket) Duplicated (socket)

slide-40
SLIDE 40

Also Using Neighbor frames

Failure Success (charger) Duplicated (box) Success (box) Success (tape) Success (driver) Success (socket) Duplicated (socket)

Finding all TROs

slide-41
SLIDE 41

Conclusion

  • Gaze information is important and necessary for egocentric videos, and

the center of image is not a good approximation

  • Gaze fixation is helpful for identifying TROs, but itself is not enough
  • 3D positions of gaze give rich information for TROs, but clustering

method and the estimation on the number of TROs is critical

  • Use spectral clustering and do not worry about overestimating
  • Appearance is another important feature for identifying TROs
  • Using neighbor frames is beneficial to improve performance
  • Over-estimating No. of TROs is helpful to reduce false negative