Much Ado About Time: Exhaustive Annotation of Temporal Data Gunnar - - PowerPoint PPT Presentation

much ado about time exhaustive annotation of temporal data
SMART_READER_LITE
LIVE PREVIEW

Much Ado About Time: Exhaustive Annotation of Temporal Data Gunnar - - PowerPoint PPT Presentation

Much Ado About Time: Exhaustive Annotation of Temporal Data Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta Datasets drive computer vision progress Need: Computer vision capabilities (1) Dense, detailed,


slide-1
SLIDE 1

Much Ado About Time: Exhaustive Annotation

  • f Temporal Data

Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta

slide-2
SLIDE 2

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

Datasets drive computer vision progress

Caltech 101

[Fei-Fei ‘04]

Algorithms: [Berg ’05], [Grauman ’05], [Zhang ’06], [Lazebnik ’06], [Jain ’08], [Boiman ’08], [Yang ’09], [Maji ’09] [Wang ’10], [Zhou ’10], [Feng ’11], [Jiang ’11], …

PASCAL VOC

[Everingham ’07]

Algorithms: [Chum ’07], [Felzenszwalb ’08], [Wang ’09], [Harzallah ’09], [Bourdev ’09], [Vedaldi ’09], [Lin ’09], [Lampert ’09], [Carreira ’10], [Wang ’10], [Song ’11], [vanDeSande ’11], … Algorithms: [Deng ’10], [Sanchez ’11], [Lin ’11], [Krizhevsky ’12], [Zeiler ’13], [Wang ’13], [Sermanet ’13], [Simonyan ’14], [Lin ’14],[Girshick ’14], [Szegedy ’14], [He ’15], …

ImageNet

[Deng ’09]

Need: (1) Dense, detailed, multi-label annotations (2) Large-scale annotated video datasets

Dataset scale and complexity Computer vision capabilities

slide-3
SLIDE 3

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

Multi-label video annotation

  • pens

book puts book

  • n shelf

walks turns on stove eats sits down sneezes

  • +
  • +

+

  • +
  • +
  • +
  • +
  • +
  • +
  • +
  • 100-200

labels 10,000 videos

slide-4
SLIDE 4

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

Multi-label video annotation

  • pens

book puts book on shelf walks turns on stove eats sits down sneezes

?

  • +
  • ?

+ +

  • +

?

  • +
  • +
  • ?
  • +
  • ?
  • +
  • +
slide-5
SLIDE 5

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

Multi-label video annotation

  • pens

book puts book on shelf walks turns on stove eats sits down sneezes

? ? ? ? ? ? ?

  • +

+

  • +
  • +
  • +
  • +
  • +
  • +
  • +
slide-6
SLIDE 6

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

One-label All-labels

☐ Opens book ☐ Opens book ☐ Puts book on shelf ☐ Walks ☐ Turns on stove ☐ Eats ☐ Sits down …

Repeat N times for N labels

vs

Expect better annotation accuracy Expect better annotation time

Which interface is better?

slide-7
SLIDE 7

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

Data: 140 videos, each ~30 secs long Labels: 52 human actions Charades dataset of [Sigurdsson ECCV 2016] Experiment on Amazon Mechanical Turk

Which interface is better?

Many-labels is better

Time Accuracy

Few-labels is better

One-label

☐ Opens book

Repeat N times for N labels

All-labels

☐ Opens book ☐ Puts book on shelf ☐ Walks ☐ Turns on stove …

[Miller PsychologyReview 1956]

slide-8
SLIDE 8

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

Play video at 2x speed [Lasecki UIST 2014]

Improving annotation time

Consistency in the few-labels setting Ask same worker about the same actions for multiple videos => 13.6% reduction in annotation time Semantic hierarchy of labels [Deng CHI 2014]

☐ Opens book ☐ Opens book ☐ Opens book ☐ Opens book ☐ Walks ☐ Sits down

vs

Many-labels is better

Worker 1: Worker 1:

slide-9
SLIDE 9

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

Improving recall

Video summary Request a 20-word description of the video => no effect on recall, 40% slower Forced response Request a yes/no response for every label => actually drops recall! (annoys workers?) Consensus annotation Rely on multiple rounds of annotation with different workers => recall improves from 58.0% to 83.3% with 3 rounds

Many-labels

☐ Opens book ☐ Puts book on shelf ☐ Walks ☐ Turns on stove ☐ Eats ☐ Sits down ☐ Sneezes ☐ Picks up a cup ☐ Holds a dish …

[Krishna CHI 2016]

Few-labels is better

slide-10
SLIDE 10

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

5 10

Average time to an

50 60 70 80 90 100 5 10

Average time to ann

70 75 80 85 90 95 100

Cumulative time [min] Cumulative time [min] Recall Precision

Many-label interface (26) Few-label interface (5)

Data: 1,815 videos, each ~30 secs long, 2x speed Labels: 157 human actions, organized into a hierarchy with 52 high-level actions Charades dataset of [Sigurdsson ECCV 2016] Experiments on Amazon Mechanical Turk Label is positive if >= 1 worker marks it as positive

3 rounds 7 rounds 1st round 1st round 3 rounds 7 rounds

Bringing it all together

slide-11
SLIDE 11

MUCH ADO ABOUT TIME: EXHAUSTIVE ANNOTATION OF TEMPORAL DATA

HTTP://ALLENAI.ORG/PLATO/CHARADES/

  • Quantitative analysis of multi-label video annotation
  • Many-labels interface is better than the few-labels interface
  • Annotated of 157 human actions on 9,848 videos (incl. temporal extent)

Conclusions

Actions Video (3x speed)

Download dataset at http://allenai.org/plato/charades