TRECVID 2016
Video to Text Description NEW Showcase / Pilot Task(s)
Alan Smeaton Dublin City University Marc Ritter T echnical University Chemnitz George Awad NIST ; Dakota Consulting, Inc
1
TRECVID 2016
Goals and Motivations Measure how well an automatic system can - - PowerPoint PPT Presentation
1 TRECVID 2016 TRECVID 2016 Video to Text Description NEW Showcase / Pilot Task(s) Alan Smeaton Dublin City University Marc Ritter T echnical University Chemnitz George Awad NIST ; Dakota Consulting, Inc 2 TRECVID 2016 Goals and
1
TRECVID 2016
ü Measure how well an automatic system can describe a video in
ü Measure how well an automatic system can match high-level
ü Transfer successful image captioning technology to the video
ü Video summarization ü Supporting search and browsing ü Accessibility - video description to the blind ü Video event prediction
2
TRECVID 2016
Ø 2000 URLs of Twitter vine videos. Ø 2 sets (A and B) of text descriptions for each of 2000 videos.
3
TRECVID 2016
4
TRECVID 2016
§ Bad video quality § A lot of simple scenes/events with
repeating plain descriptions
§ A lot of complex scenes containing
too many events to be described
§ Clips sometimes appear too short
for a convenient description
§ Audio track relevant for description
but has not been used to avoid semantic distractions
§ Non-English Text overlays/subtitles
hard to understand
§ Cultural differences in reception of
events/scene content
§ Finding a neutral scene description
appears as a challenging task
§ Well-known people in videos may
have influenced (inappropriately) the description of scenes
§ Specifying time of day (frequently)
impossible for indoor-shots
§ Description quality suffers from long
annotation hours
§ Some offline vines were detected § A lot of vines with redundant or
even identical content
TRECVID 2016
5
TRECVID 2016
6
TRECVID 2016
7
UID # annotations Ø (sec) (sec) (sec) # time (hh:mm:ss) 700 62.16 239.00 40.00 12:06:12 1 500 84.00 455.00 13.00 11:40:04 2 500 56.84 499.00 09.00 07:53:38 3 500 81.12 491.00 12.00 11:16:00 4 500 234.62 499.00 33.00 32:35:09 5 500 165.38 493.00 30.00 22:58:12 6 500 57.06 333.00 10.00 07:55:32 7 500 64.11 495.00 12.00 08:54:15 8 200 82.14 552.00 68.00 04:33:47 total 4400 98.60 552.00 09.00 119:52:49
TRECVID 2016
8
TRECVID 2016
9
A B a dog jumping onto a couch a dog runs against a couch indoors at daytime in the daytime, a driver let the steering wheel of car and slip
street
moving car and use the slide on cargo area
an asian woman turns her head an asian young woman is yelling at another
a woman sings outdoors a woman walks through a floor at daytime a person floating in a wind tunnel a person dances in the air in a wind tunnel
10
TRECVID 2016
TRECVID 2016
11
TRECVID 2016
12
TRECVID 2016
13
Matching & Ranking Description Generation DCU ü ü INF(ormedia) ü ü Mediamill (AMS) ü ü NII (Japan + Vietnam) ü ü Sheffield_UETLahore ü ü VIREO (CUHK) ü Etter Solutions ü
14
TRECVID 2016
Total of 46 runs Total of 16 runs
15
TRECVID 2016
Person reading newspaper outdoors at daytime Three men running in the street at daytime Person playing golf outdoors in the field Two men looking at laptop in an office x 2000 x 2000 type A … and ... X 2000 type B
16
TRECVID 2016
0.02 0.04 0.06 0.08 0.1 0.12
mediamill_task1_set.B.run2.txt mediamill_task1_set.B.run4.txt mediamill_task1_set.B.run3.txt mediamill_task1_set.B.run1.txt mediamill_task1_set.A.run2.txt mediamill_task1_set.A.run3.txt mediamill_task1_set.A.run1.txt mediamill_task1_set.A.run4.txt vireo_fusing_all.B.txt vireo_fusing_flat.A.txt vireo_fusing_flat.B.txt vireo_fusing_average.B.txt vireo_fusing_all.A.txt vireo_fusing_average.A.txt vireo_concept.B.txt vireo_concept.A.txt etter_mandr.B.1 etter_mandr.B.2 etter_mandr.A.1 DCU.adapt.bm25.B.swaped.txt etter_mandr.A.2 DCU.adapt.bm25.A.swaped.txt INF.ranked_list.B.no_score.txt DCU.adapt.fusion.B.swaped.txt.txt DCU.adapt.fusion.A.swaped.txt.txt INF.ranked_list.A.no_score.txt INF.ranked_list.A.new.txt INF.ranked_list.B.new.txt NII.run-2.A.txt DCU.vines.textDescription.A.testing NII.run-4.A.txt NII.run-1.B.txt NII.run-1.A.txt DCU.vines.textDescription.B.testing DCU.fused.B.txt NII.run-3.B.txt NII.run-3.A.txt NII.run-2.B.txt Sheffield_UETLahore.ranklist.B.test Sheffield_UETLahore.ranklist.A.test NII.run-4.B.txt DCU.fused.A.txt Sheffield_UETLahore.ranklist.B.test Sheffield_UETLahore.ranklist.A.test INF.epoch-38.B.txt INF.epoch-38.A.txt
Mean Inverted Rank Submitted runs MediaMill Vireo Etter DCU INF(ormedia) NII Sheffield
17
TRECVID 2016
0.02 0.04 0.06 0.08 0.1 0.12
mediamill_task1_set.B.run2.txt mediamill_task1_set.B.run4.txt mediamill_task1_set.B.run3.txt mediamill_task1_set.B.run1.txt mediamill_task1_set.A.run2.txt mediamill_task1_set.A.run3.txt mediamill_task1_set.A.run1.txt mediamill_task1_set.A.run4.txt vireo_fusing_all.B.txt vireo_fusing_flat.A.txt vireo_fusing_flat.B.txt vireo_fusing_average.B.txt vireo_fusing_all.A.txt vireo_fusing_average.A.txt vireo_concept.B.txt vireo_concept.A.txt etter_mandr.B.1 etter_mandr.B.2 etter_mandr.A.1 DCU.adapt.bm25.B.swaped.txt etter_mandr.A.2 DCU.adapt.bm25.A.swaped.txt INF.ranked_list.B.no_score.txt DCU.adapt.fusion.B.swaped.txt.txt DCU.adapt.fusion.A.swaped.txt.txt INF.ranked_list.A.no_score.txt INF.ranked_list.A.new.txt INF.ranked_list.B.new.txt NII.run-2.A.txt DCU.vines.textDescription.A.testing NII.run-4.A.txt NII.run-1.B.txt NII.run-1.A.txt DCU.vines.textDescription.B.testing DCU.fused.B.txt NII.run-3.B.txt NII.run-3.A.txt NII.run-2.B.txt Sheffield_UETLahore.ranklist.B.test Sheffield_UETLahore.ranklist.A.test NII.run-4.B.txt DCU.fused.A.txt Sheffield_UETLahore.ranklist.B.test Sheffield_UETLahore.ranklist.A.test INF.epoch-38.B.txt INF.epoch-38.A.txt
Mean Inverted Rank Submitted runs ‘B’ runs (colored/ team) seem to be doing better than ‘A’ MediaMill Vireo Etter DCU INF(ormedia) NII Sheffield
18
TRECVID 2016
100 200 300 400 500 600 700 800 900 1 2 3 4 5 6 7 8 9 10 Matches not found by runs Number of runs that missed a match
5 runs didn’t find any of 805 matches
19
TRECVID 2016
100 200 300 400 500 600 700 800 1 10 19 28 37 46 55 64 73 82 91 100 Number of matches Rank 1 - 100
100 200 300 400 500 600 700 800 1 10 19 28 37 46 55 64 73 82 91 100 Number of matches Rank 1 - 100
Very similar rank distribution
20
TRECVID 2016
1 10 100 1000 10000 Rank Videos
626 1816 1339 1244 1006 527 1201 1387 1271 324
#Video Id
10 000
21
TRECVID 2016
1 10 100 1000 Rank Videos
1387 (Top 3) 1271 (Top 2) 324 (Top1)
#Video Id
TRECVID 2016
22 #1271 a woman and a man are kissing each other #1387 a dog imitating a baby by crawling on the floor in a living room #324 a dog is licking its nose
23
TRECVID 2016
1 10 100 1000 10000 Rank Videos
220 732 1171 481 1124 579 754 443 1309 1090
#Video Id
10 000
24
TRECVID 2016
1 10 100 1000 10000 Rank Videos
220 732 1171
#Video Id
10 000
TRECVID 2016
25 #1171 3 balls hover in front of a man #220 2 soccer players are playing rock-paper-scissors
#732 a person wearing a costume and holding a chainsaw
26
TRECVID 2016
1 10 100 1000 10000 Rank Videos
1128 40 374 752 955 777 1366 1747 387 761
#Video Id
10 000
27
TRECVID 2016
1 10 100 1000 10000 Rank Videos
1747 387 761
#Video Id
10 000
TRECVID 2016
28 #761 White guy playing the guitar in a room #387 An Asian young man sitting is eating something yellow #1747 a man sitting in a room is giving baby something to drink and it starts laughing
29
TRECVID 2016
1 10 100 1000 10000 Rank Videos
1460 674 79 345 1475 605 665 414 1060 144
#Video Id
10 000
30
TRECVID 2016
1 10 100 1000 10000 Rank Videos
414 1060 144
#Video Id
10 000
TRECVID 2016
31 #144 A man touches his chin in a tv show #1060 A man piggybacking another man outdoors #414 a woman is following a man walking on the street at daytime trying to talk with him
TRECVID 2016
32
TRECVID 2016
33 “a dog is licking its nose”
Given a video Generate a textual description
Who ? What ? Where ? When ?
0.005 0.01 0.015 0.02 0.025 BLEU score
TRECVID 2016
34
INF(ormedia) Sheffield NII MediaMill DCU
0.2 0.4 0.6 0.8 1 1.2 BLEU score
Min Max Median
TRECVID 2016
35
0.05 0.1 0.15 0.2 0.25 0.3 METEOR score
TRECVID 2016
36
INF(ormedia) Sheffield NII MediaMill DCU
0.2 0.4 0.6 0.8 1 1.2 METEOR score
Min Max Median
TRECVID 2016
37
0.2 0.4 0.6 0.8 1 1.2 STS score
Min Max Median TRECVID 2016
38 ‘A’ runs seems to be doing better than ‘B’ Mediamill(A) INF(A) Sheffield_UET(A) NII(A) DCU(A)
TRECVID 2016
39
1.
2.
3.
4.
5.
6.
7.
TRECVID 2016
40
TRECVID 2016
41
TRECVID 2016
42
TRECVID 2016
43
TRECVID 2016
44
TRECVID 2016
45
TRECVID 2016
46
TRECVID 2016
47
TRECVID 2016
48
ranking while ‘A’ did better than ‘B’ in the semantic similarity.
participants did)
MT metrics or semantic similarity ? Which metric measures real system performance in a realistic application ?
ImageNet, YouTube2Text, MS-VD .. Some trained with AMT (MSR-VTT-10k has 10,000 videos, 41.2 hours and 20 annotations each !)
annotations on each?
TRECVID 2016
49
TRECVID 2016
50