Detecting Activities of Daily Living in First-person Camera Views - - PowerPoint PPT Presentation

detecting activities of daily living in first person
SMART_READER_LITE
LIVE PREVIEW

Detecting Activities of Daily Living in First-person Camera Views - - PowerPoint PPT Presentation

Detecting Activities of Daily Living in First-person Camera Views Hamed Pirsiavash, Deva Ramanan Computer Science Department, UC Irvine 1 Motivation A sample video of Activities of Daily Living 2 Applications Tele-rehabilitation Long-term


slide-1
SLIDE 1

1

Detecting Activities of Daily Living in First-person Camera Views

Hamed Pirsiavash, Deva Ramanan Computer Science Department, UC Irvine

slide-2
SLIDE 2

2

Motivation

A sample video of Activities of Daily Living

slide-3
SLIDE 3

3

Applications

Tele-rehabilitation

  • Kopp et al,, Arch. of Physical Medicine and Rehabilitation. 1997.
  • Catz et al, Spinal Cord 1997.

Long-term at-home monitoring

slide-4
SLIDE 4

Applications

Life-logging

  • Gemmell et al, “MyLifeBits: a personal database for everything.” Communications of the ACM 2006.
  • Hodges et al, “SenseCam: A retrospective memory aid”, UbiComp, 2006.

So far, mostly “write-only” memory! This is the right time for computer vision community to get involved.

4

slide-5
SLIDE 5

There are quite a few video benchmarks for action recognition. Collecting interesting but natural video is surprisingly hard. It is difficult to define action categories outside “sports” domain

Related work: action recognition

5 UCF sports, CVPR’08 UCF Youtube, CVPR’08 KTH, ICPR’04 Hollywood, CVPR’09 Olympics sport, BMVC’10 VIRAT, CVPR’11 5

slide-6
SLIDE 6

6

Wearable ADL detection

It is easy to collect natural data

6

slide-7
SLIDE 7

7

Wearable ADL detection

ADL actions derived from medical literature on patient rehabilitation It is easy to collect natural data

7

slide-8
SLIDE 8
  • Challenges

– What features to use? – Appearance model – Temporal model

  • Our model

– “Active” vs “passive” objects – Temporal pyramid

  • Dataset
  • Experiments

8

Outline

8

slide-9
SLIDE 9

Challenges

What features to use?

Low level features (Weak semantics) High level features (Strong semantics) Human pose Difficulties of pose:

  • Detectors are not accurate enough
  • Not useful in first person camera views

Space-time interest points Laptev, IJCV’05

9

slide-10
SLIDE 10

Challenges

What features to use?

Low level features (Weak semantics) High level features (Strong semantics) Human pose Object-centric features Space-time interest points Laptev, IJCV’05 Difficulties of pose:

  • Detectors are not accurate enough
  • Not useful in first person camera views

10

slide-11
SLIDE 11

Challenges

Occlusion / Functional state

“Classic” data

slide-12
SLIDE 12

Challenges

Occlusion / Functional state

Wearable data “Classic” data

12

slide-13
SLIDE 13

Challenges

long-scale temporal structure

“Classic” data: boxing

slide-14
SLIDE 14

Challenges

long-scale temporal structure

time Start boiling water Do other things (while waiting) Pour in cup Drink tea

Difficult for HMMs to capture long-term temporal dependencies

Wearable data: making tea “Classic” data: boxing

slide-15
SLIDE 15
  • Challenges

– What features to use? – Appearance model – Temporal model

  • Our model

– “Active” vs “passive” objects – Temporal pyramid

  • Dataset
  • Experiments

15

Outline

15

slide-16
SLIDE 16

“Passive” vs “active” objects

Passive Active

16

slide-17
SLIDE 17

“Passive” vs “active” objects

Passive Active

17

slide-18
SLIDE 18

“Passive” vs “active” objects

Passive Active Better object detection (visual phrases CVPR’11) Better features for action classification (active vs passive)

18

slide-19
SLIDE 19

Appearance feature: bag of objects

Bag of detected objects

fridge TV stove fridge TV stove

SVM classifier Video clip

19

slide-20
SLIDE 20

Appearance feature: bag of objects

Bag of detected objects SVM classifier Video clip

Active fridge Active stove Passive fridge Active fridge Active stove Passive fridge

20

slide-21
SLIDE 21

Inspired by “Spatial Pyramid” CVPR’06 and “Pyramid Match Kernels” ICCV’05

Temporal pyramid

Coarse to fine correspondence matching with a multi-layer pyramid

Temporal pyramid descriptor

Video clip

SVM classifier

time

21

slide-22
SLIDE 22
  • Challenges

– What features to use? – Appearance model – Temporal model

  • Our model

– “Active” vs “passive” objects – Temporal pyramid

  • Dataset
  • Experiments

22

Outline

22

slide-23
SLIDE 23

Sample video with annotations

23

slide-24
SLIDE 24

24

Wearable ADL data collection

  • 20 persons
  • 20 different apartments
  • 10 hours of HD video
  • 170 degrees of viewing angle
  • Annotated

– Actions – Object bounding boxes – Active-passive objects – Object IDs

24

Prior work:

  • Lee et al, CVPR’12
  • Fathi et al, CVPR’11, CVPR’12
  • Kitani et al, CVPR’11
  • Ren et al, CVPR’10
slide-25
SLIDE 25

Average object locations

Active Passive Passive Active Passive

25

Active objects tend to appear on the right hand side and closer

– Right-handed people are dominant – We cannot mirror-flip images in training

slide-26
SLIDE 26
  • Challenges

– What features to use? – Appearance model – Temporal model

  • Our model

– “Active” vs “passive” objects – Temporal pyramid

  • Dataset
  • Experiments

26

Outline

26

slide-27
SLIDE 27

Experiments

Low level features High level features

Our model

Object-centric features 24 object categories

Baseline

Space-time interest points (STIP) Laptev et al, BMVC’09

27

slide-28
SLIDE 28

28

Accuracy on 18 action categories

  • Our model: 40.6%
  • STIP baseline: 22.8%
slide-29
SLIDE 29

29

Accuracy on 18 action categories

  • Our model: 40.6%
  • STIP baseline: 22.8%
slide-30
SLIDE 30

30

Classification accuracy

  • Temporal model helps

30

slide-31
SLIDE 31

31

  • Temporal model helps
  • Our object-centric features outperform STIP

Classification accuracy

31

slide-32
SLIDE 32

32

  • Temporal model helps
  • Our object-centric features outperform STIP
  • Visual phrases improves accuracy

Classification accuracy

32

slide-33
SLIDE 33

33

  • Temporal model helps
  • Our object-centric features outperform STIP
  • Visual phrases improves accuracy
  • Ideal object detectors double the performance

Classification accuracy

33

slide-34
SLIDE 34

34

  • Temporal model helps
  • Our object-centric features outperform STIP
  • Visual phrases improves accuracy
  • Ideal object detectors double the performance

Results on temporally continuous video and taxonomy loss are included in the paper

Classification accuracy

34

slide-35
SLIDE 35

35

Summary

Data and code will be available soon!

slide-36
SLIDE 36

36

Summary

Data and code will be available soon!

slide-37
SLIDE 37

37

Summary

Data and code will be available soon!

slide-38
SLIDE 38

38

Thanks!