George Awad National Institute of Standards and Technology Dakota - - PowerPoint PPT Presentation

george awad national institute of standards and
SMART_READER_LITE
LIVE PREVIEW

George Awad National Institute of Standards and Technology Dakota - - PowerPoint PPT Presentation

1 TRECVID 2016 TRECVID-2016 Concept Localization : Overview George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID 2016 Goal Make concept detection more precise in time and space than current


slide-1
SLIDE 1

TRECVID-2016 Concept Localization : Overview

George Awad National Institute of Standards and Technology Dakota Consulting, Inc

1

TRECVID 2016

slide-2
SLIDE 2
  • Goal
  • Make concept detection more precise in time and space than

current shot-level evaluation.

  • Encourage context independent concepts design to increase their

reusability.

  • Task set up
  • For each of the 10 new test concepts, NIST provided set of ≈1000

shots.

  • Any shot may or may not contain the target concept.
  • Task
  • For each I-Frame within the shot that contains the target, return

the x,y coordinates of the (UL,LR) vertices of a bounding rectangle containing all of the target concept and as little more as possible.

  • Systems were allowed to submit more than 1 bounding box per I-

frame but only the ones with maximum f-score were scored.

2

TRECVID 2016

slide-3
SLIDE 3

3

10 New evaluated concepts

TRECVID 2016

Non action concepts New action concepts Animal Bicycling Boy Dancing Baby Instrumental_musician Running Sitting_down Skier Explosion_fire

slide-4
SLIDE 4

NIST Evaluation framework

  • Testing data
  • IACC.2.A-C (600 h, used between 2013 to 2015 in semantic indexing

task).

  • About 1000 shots per concept were sampled from the ground truth (with

true positive (TP) clips of max = 300, avg = 178, min = 12).

  • Total of 9 587 shots and 2 205 140 i-frames were distributed to systems.
  • Human assessors were given all the i-frames (total of 55 789 images) of

all TP shots to create the ground truth (drawing bounding box around the concept if it exists).

  • Human assessors had to watch the video clips of the images to verify the

concepts.

4

TRECVID 2016

slide-5
SLIDE 5

Evaluation metrics

  • Temporal localization: precision, recall and f-score

based on the judged I-frames.

  • Spatial localization: precision, recall and f-score

based on the located pixels representing the concept.

  • An average of precision, recall and f-score for

temporal and spatial localization across all I-frames for each concept and for each run.

5

TRECVID 2016

slide-6
SLIDE 6

Participants (Finishers: 3 out of 21)

  • 3 teams submitted 11 runs
  • TokyoTech (4 runs)
  • Tokyo Institute of Technology
  • NII_Hitachi_UIT (3 runs)
  • National Institute of Informatics; Hitachi, Ltd; University of Information Technology
  • UTS_CMU_D2DCRC (4 runs)
  • University of Technology, Sydney; Carnegie Mellon University; D2DCRC

6

TRECVID 2016

slide-7
SLIDE 7

Temporal localization results by run (sorted by F-score)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Mean per run across all concepts I-frame F-score I-frame Precision I-frame Recall

7

TRECVID 2016

slide-8
SLIDE 8

TRECVID 2016

8

0.2 0.4 0.6 0.8 1

Mean per run across all concepts

2013 0.2 0.4 0.6 0.8 1

Mean per run across all concepts

2014

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CCNY_sub1.result.txt CCNY_sub2.result.txt CCNY_sub3.result.txt CCNY_sub4.result.txt insightdcu.DCU_Loc MediaMill_Qualcomm MediaMill_Qualcomm MediaMill_Qualcomm MediaMill_Qualcomm PicSOM.PicSOM_LO PicSOM.PicSOM_LO PicSOM.PicSOM_LO PicSOM.PicSOM_LO TokyoTech.run_tokyo TokyoTech.run_tokyo TokyoTech.run_tokyo TokyoTech.run_tokyo Trimps_1.txt Trimps_2_NEG_04.tx Trimps_3_NEG_NOC Trimps_3_NOC_015.

Mean per run across all concepts

2015 2016 (mainly action) >> 2013 & 2014 (mainly objects) ONLY TP shots were given to systems to localize. Temporal Localization results

slide-9
SLIDE 9

Spatial Localization results by run (sorted by F-score)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Mean per run across all concepts

Mean Pixel F-score Mean Pixel Precision Mean Pixel Recall

9

Harder than temporal localization

TRECVID 2016

slide-10
SLIDE 10

TRECVID 2016

10

0.2 0.4 0.6 0.8 1

Mean per run across all concepts

2013 0.2 0.4 0.6 0.8 1

Mean per run across all concepts

2014 0.2 0.4 0.6 0.8 1

CCNY_sub1.result.txt CCNY_sub2.result.txt CCNY_sub3.result.txt CCNY_sub4.result.txt insightdcu.DCU_Loca MediaMill_Qualcomm MediaMill_Qualcomm MediaMill_Qualcomm MediaMill_Qualcomm PicSOM.PicSOM_LO PicSOM.PicSOM_LO PicSOM.PicSOM_LO PicSOM.PicSOM_LO TokyoTech.run_tokyot TokyoTech.run_tokyot TokyoTech.run_tokyot TokyoTech.run_tokyot Trimps_1.txt Trimps_2_NEG_04.tx Trimps_3_NEG_NOC Trimps_3_NOC_015.t Mean per run across all concepts

2015 2016 (actions) > 2013 (objects) 2016 (actions) ~ 2014 (objects) ONLY TP shots were given to systems to localize. Spatial Localization results

slide-11
SLIDE 11

Results per concept top 10 runs

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 F-score

Median 10 9 8 7 6 5 4 3 2 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Mean F-score

Median 10 9 8 7 6 5 4 3 2 1

Temporal localization Spatial localization Most concepts perform better in temporal compared to spatial localization A lot of resemblance between same concepts

11

TRECVID 2016

slide-12
SLIDE 12

Results per concept across all runs

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall Precision

Temporal localization

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Mean Recall Mean precision

Spatial localization

12

submitted bounding boxes approximate the size of ground truth boxes and overlap with them. Many systems are good in finding the real box sizes.

Many systems submi-ed a lot

  • f non-target I-frames, while

few found a good balance.

TRECVID 2016 baby

Inst_musi

bicycling

slide-13
SLIDE 13

General Observations

  • Consistent observations in the last 4 years

ü Temporal localization is easier than spatial localization. ü Systems report approximate G.T box sizes.

  • Performance of action/dynamic concepts are higher

than object concepts tested in 2013 to 2014.

  • Assessment of action/dynamic concepts proved to be

challenging in many cases to the human assessors.

  • Lower finishing% of teams compared to signups.

13

TRECVID 2016

slide-14
SLIDE 14

Next team talks

  • TokyoTech
  • UTS_CMU_D2DCRC

TRECVID 2016

14