Egocentric Videos Yair Poleg Chetan Arora Shmuel Peleg CVPR - - PowerPoint PPT Presentation

egocentric videos
SMART_READER_LITE
LIVE PREVIEW

Egocentric Videos Yair Poleg Chetan Arora Shmuel Peleg CVPR - - PowerPoint PPT Presentation

Temporal Segmentation of Egocentric Videos Yair Poleg Chetan Arora Shmuel Peleg CVPR 2014 Presenter: Hsin-Ping Huang Egocentric Video Policeman UN Inspectors in Syria Google Glass Browsing long unstructured


slide-1
SLIDE 1

Temporal Segmentation of Egocentric Videos

Yair Poleg Chetan Arora Shmuel Peleg CVPR 2014

Presenter: Hsin-Ping Huang

slide-2
SLIDE 2
  • Browsing long unstructured videos is

time consuming!

  • Video

Egocentric Video

Policeman UN Inspectors in Syria Google Glass

slide-3
SLIDE 3

Video credit: HUJI EgoSeg Dataset

slide-4
SLIDE 4

Related Work

Understanding Objects and Activities Unsupervised Segmentation

Clustering: no semantic meanings Hard to generalize Short-term: seconds Long-term: minutes/hours [Fathi et al., ICCV 2011] [Ryoo et al., CVPR 2013] [Kitani et al., CVPR 2011]

slide-5
SLIDE 5

Related Work

Story-Driven Summarization

[Lu et al., CVPR 2013]

slide-6
SLIDE 6

Contribution

  • Do temporal segmentation into hierarchy of

motion classes

  • Detect fixation of wearer’s gaze
slide-7
SLIDE 7

Difficulty

  • Two sources of information

– Motion of the wearer – objects and activities

  • Hard to find ego-motion

– Head rotation – Depth variations – Dynamic objects

Feature Tracking Optical Flow Image credit: Voodoo Camera Tracker (top)

slide-8
SLIDE 8

Classification of Wearer’s Motion

slide-9
SLIDE 9

Instantaneous Displacement (ID)

  • Compute the ID at patches

Instantaneous Displacement of One Patch forward motion Motion Detector

slide-10
SLIDE 10

Cumulative Displacement (CD)

  • Compute the CD by integrating the ID

horizontal

  • utside scene: expanding curve

inside scene: horizontal expanding curve right of focus left of focus

slide-11
SLIDE 11

Motion Vector and Radial Projection Response

Focus of expansion

  • Compute motion vectors as the slopes of smoothed CDs
  • Compute radial projection response
  • Video

< φ ?

slide-12
SLIDE 12

Video credit: Shmuel Peleg

slide-13
SLIDE 13

large radially outwards mix small Global Motion Head Motion Instantaneous Displacement Vectors Motion Vectors

Walking Standing Riding Bus

Motion Vector and Radial Projection Response

Radial Projection Response low high low Outside Region

slide-14
SLIDE 14

Feature

  • AVG of top/bottom 6% motion vectors
  • DIFF of top/bottom 6% motion vectors
  • AVG of motion vectors
  • Motion vectors
  • # of successful flow computation
  • AVG and SD of instantaneous displacements
  • Radial projection response
slide-15
SLIDE 15
  • Train SVM classifiers for each binary classification

task in the proposed class hierarchy

Classifier

slide-16
SLIDE 16

Detecting Period of Gaze Fixation

slide-17
SLIDE 17

Gaze

Cumulative Displacement

left motion right motion Smoothed CD Curve Original CD Curve

slide-18
SLIDE 18

Cumulative Difference

Motion Detector Threshold > 1 standard deviation higher peaks Gaze Gaze Hypothesis Threshold > 80%

  • Compute the cumulative difference

positive negative

+

slide-19
SLIDE 19

Experiment

slide-20
SLIDE 20

Dataset

  • > 65 hours egocentric videos
  • Manually annotated as one of the leaf classes
  • Video
slide-21
SLIDE 21

Video credit: HUJI EgoSeg Dataset

slide-22
SLIDE 22

Classification of Wearer’s Motion

leaf node accuracy inner node accuracy Sitting vs Standing Bus vs Standing Average: 70% Best: 97%

slide-23
SLIDE 23
  • Valid gaze fixation: a head fixation > 5 seconds

Detecting Period of Gaze Fixation

slide-24
SLIDE 24

Conclusion

slide-25
SLIDE 25
  • Mixed features from adjacent activities

– Short-term sitting when riding

Weakness

slide-26
SLIDE 26
  • Mixed activities
  • Ambiguity in gaze fixation

– A left and right turn in quick succession – A person turns in place

Weakness

Waiting in line = Standing + Walking Riding an open train = Open or Riding ? Standing while coming into the station = Static or Box ?

slide-27
SLIDE 27

Strength

  • Simple, efficient and robust
  • Use only the recorded video
  • Make no assumptions on the scene structure
  • Focus on long-term activities to prevent over-

segmentation of the video

slide-28
SLIDE 28

Extension

  • Use bilateral filter to find long-term trends
  • Use a regularization framework like MRF on

the classification results

  • Handle the ambiguity in gaze fixation
  • Combine with external sources such as GPS

and inertial sensors

  • Generalize to detect short-term activities
  • Aid video summarization