TRECVID 2017 AD-HOC VIDEO SEARCH TASK : OVERVIEW Georges Qunot - - PowerPoint PPT Presentation

trecvid 2017
SMART_READER_LITE
LIVE PREVIEW

TRECVID 2017 AD-HOC VIDEO SEARCH TASK : OVERVIEW Georges Qunot - - PowerPoint PPT Presentation

TRECVID 2017 AD-HOC VIDEO SEARCH TASK : OVERVIEW Georges Qunot Laboratoire d'Informatique de Grenoble George Awad Dakota Consulting, Inc National Institute of Standards and Technology Disclaimer The identification of any commercial product


slide-1
SLIDE 1

TRECVID 2017

AD-HOC VIDEO SEARCH TASK : OVERVIEW Georges Quénot Laboratoire d'Informatique de Grenoble George Awad Dakota Consulting, Inc National Institute of Standards and Technology

Disclaimer

The identification of any commercial product or trade name does not imply endorsement or recommendation by the National Institute of Standards and Technology.

slide-2
SLIDE 2

Table of contents

  • Task Definition
  • Video Data
  • Topics (Queries)
  • Participating teams
  • Evaluation & results
  • General observation

TRECVID 2017 2 12/19/2017

slide-3
SLIDE 3

Ad-hoc Video Search Task Definition

  • Goal: promote progress in content-based retrieval based on end

user ad-hoc queries that include persons, objects, locations, activities and their combinations.

  • Task: Given a test collection, a query, and a master shot

boundary reference, return a ranked list of at most 1000 shots (out of 335 944) which best satisfy the need.

  • Testing data: 4593 Internet Archive videos (IACC.3), 600 total

hours with video durations between 6.5 min to 9.5 min.

  • Development data: ≈1400 hours of previous IACC data used

between 2010-2015 with concept annotations.

TRECVID 2017

3

12/19/2017

slide-4
SLIDE 4

Query Development

  • Test videos were viewed by 10 human assessors hired by the

National Institute of Standards and Technology (NIST).

  • 4 facet description of different scenes were used (if

applicable):

  • Who : concrete objects and being (kind of persons, animals, things)
  • What : are the objects and/or beings doing ? (generic actions,

conditions/state)

  • Where : locale, site, place, geographic, architectural
  • When : time of day, season
  • In total assessors watched ≈35% of the IACC.3 videos
  • 90 Candidate queries chosen from human written descriptions

to be used between 2016-2018.

TRECVID 2017

4

12/19/2017

slide-5
SLIDE 5

TV2017 Queries by complexity

  • Person + Action + Object + Location
  • Find shots of one or more people eating food at a table indoors
  • Find shots of one or more people driving snowmobiles in the snow
  • Find shots of a man sitting down on a couch in a room
  • Find shots of a person talking behind a podium wearing a suit outdoors during daytime
  • Find shots of a person standing in front of a brick building or wall
  • Person + Action + Location
  • Find shots of children playing in a playground
  • Find shots of one or more people swimming in a swimming pool
  • Find shots of a crowd of people attending a football game in a stadium
  • Find shots of an adult person running in a city street

12/19/2017 TRECVID 2017

5

slide-6
SLIDE 6

TV2017 Queries by complexity

  • Person + Action/state + Object
  • Find shots of a person riding a horse including horse-drawn carts
  • Find shots of a person wearing any kind of hat
  • Find shots of a person talking on a cell phone
  • Find shots of a person holding or operating a tv or movie camera
  • Find shots of a person holding or opening a briefcase
  • Find shots of a person wearing a blue shirt
  • Find shots of person holding, throwing or playing with a balloon
  • Find shots of a person wearing a scarf
  • Find shots of a person holding, opening, closing or handing over a box
  • Person + Action
  • Find shots of a person communicating using sign language
  • Find shots of a child or group of children dancing
  • Find shots of people marching in a parade
  • Find shots of a male person falling down

12/19/2017 TRECVID 2017

6

slide-7
SLIDE 7

TV2017 Queries by complexity

  • Person + Object + Location
  • Find shots of a man and woman inside a car
  • Person + Location
  • Find shots of a chef or cook in a kitchen
  • Find shots of a blond female indoors
  • Person + Object
  • Find shots of a person with a gun visible
  • Object + Location
  • Find shots of a map indoors
  • Object
  • Find shots of vegetables and/or fruits
  • Find shots of a newspaper
  • Find shots of at least two planes both visible

12/19/2017 TRECVID 2017

7

slide-8
SLIDE 8

12/19/2017 TRECVID 2017

8

Training and run types

Four training data types:

✓ A – used only IACC training data (0 runs) ✓ D – used any other training data (40 runs) ✓ E – used only training data collected automatically using

  • nly the query text (12 runs)

✓ F – used only training data collected automatically using a

query built manually from the given query text (0 runs)

Two run submission types:

Manually-assisted (M): Query built manually (19 runs)

Fully automatic (F): System uses official query directly(33 runs)

slide-9
SLIDE 9

12/19/2017 TRECVID 2017

9

Finishers : 10 out of 20

Team Organization M F

INF Renmin University; Shandong Normal University; Chongqing university of posts and telecommunications; Carnegie Mellon University

  • 4

kobe_nict_siegen Kobe University, Japan Center for Information and Neural Networks, National Institute of Information and Communications Technology (NICT), Japan Pattern Recognition Group, University of Siegen, Germany 3

  • ITI_CERTH Information Technologies Institute, Centre for Research and

Technology Hellas

  • 4

ITEC_UNIKLU Klagenfurt University 4 4 NII_Hitachi_UIT National Institute of Informatics, Japan (NII); Hitachi, Ltd; University of Information Technology, VNU-HCM, Vietnam (HCM-UIT)

  • 4

MediaMill University of Amsterdam

  • 4

Waseda_Meisei Waseda University; Meisei University 4 4 VIREO City University of Hong Kong 4 4 EURECOM EURECOM

  • 4

FIU_UM Florida International University, University of Miami 4

slide-10
SLIDE 10

12/19/2017 TRECVID 2017

10

Evaluation

Each query assumed to be binary: absent or present for each master reference shot. NIST sampled ranked pools and judged top results from all submissions. Metrics: inferred average precision per query. Compared runs in terms of mean inferred average precision across the 30 queries.

slide-11
SLIDE 11

12/19/2017 TRECVID 2017

11

Mean Extended Inferred Average Precision (XInfAP)

2 pools were created for each query and sampled as:

✓ Top pool (ranks 1 to 150) sampled at 100 % ✓ Bottom pool (ranks 151 to 1000) sampled at 2.5 % ✓ % of sampled and judged clips from rank 151 to 1000 across all runs

and topics (min= 2 %, max = 64.4 %, mean = 29 %)

Judgment process: one assessor per query, watched complete shot while listening to the audio. infAP was calculated using the judged and unjudged pool by sample_eval tool

30 queries 89 435 total judgments 9611 total hits 7209 hits at ranks (1 to100) 2013 hits at ranks (101 to 150) 389 hits at ranks (151 to 1000) > TV2016 >> TV2016

slide-12
SLIDE 12

12/19/2017 TRECVID 2017

12

Inferred frequency of hits varies by query

1000 2000 3000 4000 5000 6000 531 533 535 537 539 541 543 545 547 549 551 553 555 557 559

  • Inf. Hits / query

1 % of test shots Queries

  • Inf. hits
slide-13
SLIDE 13

12/19/2017 TRECVID 2017

13

Total true shots contributed uniquely by team

10 20 30 40 50 60 70 80 90 100 Number of true shots

slide-14
SLIDE 14

12/19/2017 TRECVID 2017

14

2017 run submissions scores (19 Manually-assisted runs)

0.05 0.1 0.15 0.2 0.25 Mean Inf. AP Median = 0.12 (>> TV2016 : 0.04)) Max = 0.216 (>> TV2016 : 0.177))

slide-15
SLIDE 15

12/19/2017 TRECVID 2017

15

2017 run submissions scores (33 Fully automatic runs)

0.05 0.1 0.15 0.2 0.25

MediaMill.17_1 MediaMill.17_2 MediaMill.17_4 Waseda_Meisei.17_1 MediaMill.17_3 Waseda_Meisei.17_4 Waseda_Meisei.17_3 Waseda_Meisei.17_2 VIREO.17_2 VIREO.17_4 VIREO.17_3 ITI_CERTH.17_3 EURECOM.17_3 VIREO.17_1 ITI_CERTH.17_4 ITI_CERTH.17_1 EURECOM.17_1 EURECOM.17_2 ITI_CERTH.17_2 NII_Hitachi_UIT.17_1 NII_Hitachi_UIT.17_2 ITEC_UNIKLU.17_4 ITEC_UNIKLU.17_3 INF.17_2 ITEC_UNIKLU.17_2 NII_Hitachi_UIT.17_5 INF.17_1 NII_Hitachi_UIT.17_3 ITEC_UNIKLU.17_1 INF.17_3 EURECOM.17_4 INF.17_4 NII_Hitachi_UIT.17_4

Mean Inf. AP Median = 0.092 (> TV2016 : 0.024) Max = 0.206 (>> TV2016 : 0.054))

slide-16
SLIDE 16

12/19/2017 TRECVID 2017

16

Top 10 infAP scores by query (Fully automatic)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 531 533 535 537 539 541 543 545 547 549 551 553 555 557 559

  • Inf. AP

10 9 8 7 6 5 4 3 2 1 Median Topics

People driving snowmobiles in snow Chef or cook in kitchen Person wearing any kind of hat Person standing in front of brick building or wall Adult running in city street Person holding,

  • pening, closing or

handing over a box Male person falling down

slide-17
SLIDE 17

12/19/2017 TRECVID 2017

17

Top 10 infAP scores by queries (Manually-Assisted)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 531 533 535 537 539 541 543 545 547 549 551 553 555 557 559

  • Inf. AP

10 9 8 7 6 5 4 3 2 1 Median Queries

Example of a query where manual improved over auto

slide-18
SLIDE 18

Which topics where easy or difficult overall ?

Top 10 Easy (sorted by count of runs with InfAP >= 0.7) Top 10 Hard (sorted by count of runs with InfAP < 0.7) a person wearing any kind of hat an adult person running in a city street a chef or cook in a kitchen person standing in front of a brick building or wall

  • ne or more people driving snowmobiles in the snow

person holding, opening, closing or handing over a box

  • ne or more people swimming in a swimming pool

a male person falling down a man and woman inside a car child or group of children dancing a crowd of people attending a football game in a stadium children playing in a playground a newspaper person talking on a cell phone a person communicating using sign language person holding or opening a briefcase a person wearing a scarf

  • ne or more people eating food at a table indoor

a person riding a horse including horse-drawn carts person talking behind a podium wearing a suit

  • utdoors during daytime

12/19/2017 TRECVID 2017

18 More action and dynamics in hard queries

slide-19
SLIDE 19

12/19/2017 TRECVID 2017

19

Statistical significant differences among top 10 “M” runs (using randomization test, p < 0.05)

Run Mean Inf. AP score D_Waseda_Meisei.17_1 0.216 + D_Waseda_Meisei.17_3 0.207 + D_Waseda_Meisei.17_2 0.204 + D_Waseda_Meisei.17_4 0.189 + D_VIREO.17_4 0.164 ! D_VIREO.17_2 0.164 ! D_FIU_UM.17_2 0.147 # D_FIU_UM.17_4 0.145 # D_VIREO.17_1 0.124 * D_VIREO.17_3 0.120 *

D_Waseda_Meisei.17_1 ➢ D_VIREO.17_4 ➢ D_VIREO.17_1 ➢ D_VIREO.17_3 ➢ D_VIREO.17_2 ➢ D_VIREO.17_1 ➢ D_VIREO.17_3 ➢ D_FIU_UM.17_2 ➢ D_FIU_UM.17_4 D_Waseda_Meisei.17_3 ➢ D_VIREO.17_4 ➢ D_VIREO.17_1 ➢ D_VIREO.17_3 ➢ D_VIREO.17_2 ➢ D_VIREO.17_1 ➢ D_VIREO.17_3 ➢ D_FIU_UM.17_2 ➢ D_FIU_UM.17_4 D_Waseda_Meisei.17_2 ➢ D_VIREO.17_1 ➢ D_VIREO.17_3 ➢ D_FIU_UM.17_2 ➢ D_FIU_UM.17_4 D_Waseda_Meisei.17_4 ➢ D_VIREO.17_1 ➢ D_VIREO.17_3 ➢ D_FIU_UM.17_4

+!#* : no significant difference among each set of runs ➢ Runs higher in the hierarchy are significantly better than runs more indented.

slide-20
SLIDE 20

12/19/2017 TRECVID 2017

20

Statistical significant differences among top 10 “F” runs (using randomization test, p < 0.05)

Run Mean Inf. AP score D_MediaMill.17_1 0.206 + D_MediaMill.17_2 0.205 + D_MediaMill.17_4 0.177 D_Waseda_Meisei.17_1 0.159 D_MediaMill.17_3 0.150 D_Waseda_Meisei.17_4 0.143 # D_Waseda_Meisei.17_3 0.141 # D_Waseda_Meisei.17_2 0.125 D_VIREO.17_2 0.120 * D_VIREO.17_4 0.116 * D_VIREO.17_3 0.116 *

D_MediaMill.17_1 ➢ D_MediaMill.17_4 ➢ D_VIREO.17_2 ➢ D_VIREO.17_3 ➢ D_VIREO.17_4 ➢ D_Waseda_Meisei.17_1 ➢ D_Waseda_Meisei.17_2 ➢ D_Waseda_Meisei.17_3 ➢ D_Waseda_Meisei.17_4 D_MediaMill.17_2 ➢ D_MediaMill.17_4 ➢ D_VIREO.17_2 ➢ D_VIREO.17_3 ➢ D_VIREO.17_4 ➢ D_Waseda_Meisei.17_1 ➢ D_Waseda_Meisei.17_2 ➢ D_Waseda_Meisei.17_3 ➢ D_Waseda_Meisei.17_4

+#* : no significant difference among each set of runs ➢ Runs higher in the hierarchy are significantly better than runs more indented.

slide-21
SLIDE 21

Good and fast

12/19/2017 TRECVID 2017

21

Processing time vs Inf. AP (“M” runs) Across all topics and runs

1 10 100 0.2 0.4 0.6 0.8 1 Time (s)

  • Inf. AP

Waseda _Meisei Kobe_nict_siegen

slide-22
SLIDE 22

Good and fast

12/19/2017 TRECVID 2017

22

Processing time vs Inf. AP (“F” runs) Across all topics and runs

1 10 100 1000 0.2 0.4 0.6 0.8 Time (s)

  • Inf. AP

Vireo NII_Hitachi_UIT

slide-23
SLIDE 23

12/19/2017 TRECVID 2017

23

2017 Main Approaches

  • Concept bank with automatic or manual mapping with query

terms

  • Combination of concept scores from Boolean operators
  • Work on Query Understanding
  • Rectified Linear Score Normalization
  • Use of Video-To-Text techniques on shots
  • Query expansion / term matching techniques
  • Use of unified text-image vector space
slide-24
SLIDE 24

12/19/2017 TRECVID 2017

24

2017 Observations

  • Ad-hoc search is more difficult than simple concept-based

tagging.

  • Max and Median scores are better than TV2016 for both M

and F runs.

  • Manually-assisted runs performed slightly better than fully-

automatic.

  • Most systems are not real-time (slower systems were not

necessarily effective).

  • Some systems reported 0 time!!! (or didn’t measure it!)
  • There was 0 A and F runs submitted compared to D and E
slide-25
SLIDE 25

12/19/2017 TRECVID 2017

25

Continued at MMM2018

  • 10 Ad-Hoc Video Search (AVS) tasks, 5 of which are a random subset
  • f the 30 AVS tasks of TRECVID 2017 and 5 will be chosen directly by

human judges as a surprise. Each AVS task has several/many target shots that should be found.

  • 10 Known-Item Search (KIS) tasks, which are selected completely

random on site. Each KIS task has only one single 20 s long target segment.

  • Registration for the task is now closed
slide-26
SLIDE 26

12/19/2017 TRECVID 2017

26

9:20 - 11:40 : Ad-hoc Video Search

  • 9:40 - 10:00, Query understanding is key for zero-example video search

(MediaMill - University of Amsterdam)

  • 10:00 - 10:20, Waseda_Meisei at TRECVID 2017: Ad-hoc video search

(Waseda_Meisei - Waseda University; Meisei University)

10:20 - 10:40, Break with refreshments

  • 10:40 - 11:00, FIU-UM@TRECVID 2017: Rectified Linear Score Normalization and

Weighted Integration for Ad-hoc Video Search (FIU_UM - Florida International University, University of Miami)

  • 11:00 - 11:20, Interactive Video Search at VBS (ITEC_UNIKLU -Institute of

Information Technology, Klagenfurt University)

  • 11:20 - 11:40, AVS discussion
slide-27
SLIDE 27

12/19/2017 TRECVID 2017

27

2017 Questions

  • Was the task/queries realistic enough?!
  • Do we need to change/add/remove anything from the task in

2018 ?

  • Is there any specific reason why systems did not submit any

“F” runs? (training data collected automatically using a query built

manually from the given query text)

  • Did any team run their 2017 system on TV2016 topics or

2016 system on this year’s topics ?

  • Should we consider new dataset in 2019 to continue working
  • n Ad-hoc ? (e.g YouTube, Vimeo, etc)