TRECVID-2005: Search Task Alan Smeaton Dublin City University - - PowerPoint PPT Presentation
TRECVID-2005: Search Task Alan Smeaton Dublin City University - - PowerPoint PPT Presentation
TRECVID-2005: Search Task Alan Smeaton Dublin City University & Tzveta Ianeva NIST Search Task Definition o Given a test collection, a multimedia statement of information need (topic) and a common shot boundary reference, return a
TRECVID 2005 2
Search Task Definition
- Given a test collection, a multimedia statement of
information need (topic) and a common shot boundary reference, return a ranked list of at most 1,000 shots which best satisfy the need;
- Goal: promote progress in content-based retrieval from
digital video via open, metrics-based evaluation;
- Many thanks to
n Christian Petersohn (Fraunhofer Institute) for master shot reference n DCU team for formatting and selecting keyframes n Jonathan Lasko for the shot boundary truth data creation n CMU & Randy Paul for getting a government contractor to provide MT/ASR
TRECVID 2005 3
Search Task Definition
- NIST created topics based on a number of basic search
types: generic/specific and person/thing/event where there are multiple relevant shots coming from more than one video;
- Videos were viewed by NIST personnel (with sound turned
- ff), notes taken on content, and candidates emerged and
were chosen;
- Interactive search participants were asked to have their
subjects complete pre, post-topic and post-search questionnaires;
- Each result for a topic can come from only 1 user search;
but the same searcher does not need to be used for all topics in a run.
TRECVID 2005 4
Overarching Goals
- Previous TRECVids show huge benefit from using text
(ASR, closed captions, video OCR);
- TRECVid 2005 data is (deliberately) text-noisy with video
from English language, Arabic & Chinese broadcasts;
- Text is derived from speech recognition and then machine
translation, thus poorer quality than previously ?
- Net outcome is that task is harder, more emphasis on
visual and less on text ?
TRECVID 2005 5
2005: Search task participants (20, up from 16)
Bilkent University Turkey Carnegie Mellon University USA Columbia University USA Dublin City University Ireland Fudan University China FX Palo Alto Laboratory USA Helsinki University of Technology Finland IBM USA Imperial College London UK Language Computer Corporation (LCC) USA Lowlands Team (CWI, Twente, U. of Amsterdam) Netherlands Mediamill Team (Univ. of Amsterdam and TNO) Netherlands National University of Singapore (NUS) Singapore Queen Mary University of London UK SCHEMA-Univ. Bremen Team EU Tsinghua University China University of Central Florida / University of Modena USA,Italy University of Iowa USA University of North Carolina USA University of Oulu / MediaTeam Finland
TRECVID 2005 6
Search Types: Automatic, Manual and Interactive
Number of runs: 42 automatic (up from 23) 26 manual (down from 52) 44 interactive (down from 61)
TRECVID 2005 7
24 Topics [ number of image, video examples and relevant found]
149. Find shots of Condoleeza Rice [3, 6, 116] 150. Find shots of Iyad Allawi, the former prime minister of Iraq [3, 6, 13] 151. Find Find shots of Omar Karami, the former prime minister of Lebannon [2, 5, 301] 152. Find shots of Hu Jintao, president of the People’s Republic of China [2, 9, 498] 153. Find shots of Tony Blair. [2, 4, 42] 154. Find shots of Mahmoud Abbas, also known as Abu Mazen, prime minister of the Palestinian Authority. [2, 9, 93] 155. Find shots of a graphic map of Iraq, location of Bagdhad marked – not a weather map [4, 10, 54] 156. Find shots of tennis players on the court – both players visible at the same time [2, 4, 55] 157. Find shots of people shaking hands [4, 10, 470] 158. Find shots of a helicopter in flight [2, 8, 63] 159. Find shots of George Bush entering or leaving a vehicle (e.g., car, van, airplane, helicopter, etc), he and vehicle both visible at the same time [2, 7, 29] 160. Find shots of something (e.g., vehicle, aircraft, building, etc.) on fire with flames and smoke visible [2, 9, 169]
TRECVID 2005 8
24 Topics [number of image, video examples and relevant found]
- 161. Find shots of people with banners or signs [2, 6,1245 ]
- 162. Find shots of one or more people entering or leaving a building [5, 8, 385]
- 163. Find shots of a meeting with a large table and more than two people [2, 5, 1160]
- 164. Find shots of a ship or boat [3, 7, 214]
- 165. Find shots of basketball players on the court [2, 8, 254]
- 166. Find shots of one or more palm trees [2, 6, 253]
- 167. Find shots of an airplane taking off [2, 5, 19]
- 168. Find shots of a road with one or more cars [2, 5, 1087]
- 169. Find shots of one or more tanks or other military vehicles [3, 8, 493]
- 170. Find shots of tall building (with more than 5 floors above the ground) [3, 6, 543]
- 171. Find shots of a goal being made in a soccer match [1, 7, 49]
- 172. Find shots of an office setting, i.e., one or more desks/tables and one or more
computers and one or more people [3, 8, 790]
TRECVID 2005 9
Some statistics
- 2005:
n Number of shots in test collection: 45.765 n ~18.3% relevant shots found: 8.395
- 2004
n Number of shots in test collection: 33.367 n ~5.4% relevant shots found: 1.800
- 2003
n Number of shots in test collection: 32.318 n ~6.5% relevant shots found: 2.114
TRECVID 2005 10
5 10 15 20 25 30 35 40 45 50 CMU Columbia University Fudan University Imperial College London Lowlands team National University of Singapure University of Oulu Helsinki University of Technology Shema-Univ. Bremen of Modena Tsinghua University IBM U.of Central Florida / U. of Modena University of Iowa Mediamil team Dublin City University Queen Mary University of London
Number of unique, relevant shots
2005: 16 sites contributed one or more unique, relevant shots (8 last year)
TRECVID 2005 11
CMU Columbia Univ. Fudan Univ. Imperial College Lowlands team NUS
- Univ. of Oulu
Helsinki U. of Technology SHEMA-U. Bremen Tsinghua Univ. IBM
- U. of Central Florida / U. of Modena
- Univ. of Iowa
Dublin City Univ. Queen Mary Univ. Mediamil team
1 4 9 ( 1 1 6 ) 1 5 ( 1 3 ) 1 5 1 ( 3 1 ) 1 5 2 ( 4 9 8 ) 1 5 3 ( 4 2 ) 1 5 4 ( 9 3 ) 1 5 5 ( 5 4 ) 1 5 6 ( 5 5 ) 1 5 7 ( 4 7 ) 1 5 8 ( 6 3 ) 1 5 9 ( 2 9 ) 1 6 ( 1 6 9 ) 1 6 1 ( 1 2 4 5 ) 1 6 2 ( 3 8 5 ) 1 6 3 ( 1 1 6 ) 1 6 4 ( 2 1 4 ) 1 6 5 ( 2 5 4 ) 1 6 6 ( 2 5 3 ) 1 6 7 ( 1 9 ) 1 6 8 ( 1 8 7 ) 1 6 9 ( 4 9 3 ) 1 7 ( 5 4 3 ) 1 7 1 ( 4 9 ) 1 7 2 ( 7 9 )
2 2 3 1 23 4 9 9 2 5 2 2 2 2 2 4 6 3 1 2 1 1 1 2 1 1 1 1 6 4 2 4 1 2 6 2 7 2 1 2 3 1 11 1 4 9 1 1 5 4 1 25 4 2 6 5 1 2 4 3 19 1 2 1 1 1 1 1 5 2 1 2 1 1 1 1 2 4 6 8 10
Number of unique true shots Group
Topic (total relevant)
2005: Rel shots contrib. uniquely per topic by team
161, 163, 168 have 1000+ 170, 172 have 500+
TRECVID 2005 12
2005: Interactive runs - top 10 MAP (of 49)
(mean elapsed time for all == ~15 mins/topic)
- B_2_UvA-MM_1
A_2_CMU.MotoX_6 B_2_CMU_Mon_1 A_2_CMU.Snowboarding_S A_1_FXPAL1LCN_2 A_1_FXPAL0LN_1 A_1_FXPAL4LC_5 B_2_UvA-MM_4 B_2_UvA-MM_2 A_1_FXPAL2RAN_3
TRECVID 2005 13
2004: Interactive runs - top 10 MAP (of 62)
(mean elapsed time for all == ~15 mins/topic)
- B_2_UvA-MM_1
C_2_CMU1I_1 A_1_FXPAL_2_5 A_1_FXPAL_1_4 A_1_FXPAL_3_6 A_2_IBM.Interactive_2_ARC_7 A_2_IBM.Interactive_1_ARC_1 A_1_FXPAL_1_7 A_1_FXPAL_2_8 A_1_FXPAL_3_9
DATA IS DIFFERENT SYSTEMS ARE DIFFERENT ONLY THE METRICS ARE THE SAME
TRECVID 2005 14
2005: Manual runs - top 10 MAP (of 26)
(mean human effort (mins) / topic)
- M_A_2_CMU.Manu.ExpECA.QC04CR.PU_5 (15)
M_A_2_CMU.Manu.ExpE.QC05U_7 (15) M_A_2_PicSOM-M3_2 (0.93) M_A_2_FD_MM_BC_1 (11.1) M_A_2_OUMT_M7TE_7 (5.06) M_A_2_OUMT_M6TS_6 (5.02) M_A_2_PicSOM-M2_4 (0.87) M_A_2_FD_AOH_LR_ONLINE_3 (11.1) M_A_1_OUMT_M5T_5 (5.01) M_A_1_dcu_manual_text_img_6 (3)
TRECVID 2005 15
- A_2_NUSVID2004_1 (3.92)
A_2_NUSVID2004_2 (3.93) A_2_NUSVID2004_3 (3.71) A_2_IBM.Manual_ARC_5 (15) A_1_IBM.SpeechBaseline_ARC_9 (15) C_2_CMU09M_9 (15) C_2_CMU07M_7 (15) A_2_NUSVID2004_4 (3.76) C_2_CMU05M_5 (15) A_2_LL-M-dyn-sel-ASR-RR_5 (3)
2004: Manual runs - top 10 MAP (of 52)
(mean human effort (mins) / topic) DATA IS DIFFERENT SYSTEMS ARE DIFFERENT ONLY THE METRICS ARE THE SAME
TRECVID 2005 16
2005: Automatic runs (pilot) - top 10 MAP (of 23)
(mean elapsed time (mins) / topic)
- F_B_2_NUS_PRIS_1 (0.55)
F_A_2_TJW_VM_4 (15) F_A_2_TJW_TVM_2 (15) F_A_2_TJW_V_3 (15) F_B_2_NUS_PRIS_2 (0.56) F_A_2_TJW_TV_5 (15) F_A_2_NUS_PRIS_3 (0.3) F_C_2_ColumbiaA2_5 (15) F_B_2_UvA-MM_6 (0.7) F_A_2_PicSOM-F2_3 (0.14)
TRECVID 2005 17
2004: Automatic runs (pilot) - top 10 MAP (of 23)
(mean elapsed time (mins) / topic)
- C_2_DCU_AUTOLM6_6 (3.43)
C_2_DCU_AUTOLM7_7 (3.12) C_2_CMUS5A_S (15) A_2_LL-F-stat-allvidim-ASR-RR_S (3) A_2_LL-F-stat-allimvid-ASR-RR_S (3) A_2_LL-F-dyn-allvidim-ASR-RR_S (3) A_2_LL-F-dyn-allimvid-ASR-RR_S (3) C_2_DCU_AUTOLM1_1 (0.007) A_1_LL-F-ASR-full_S (0?) C_2_DCU_AUTOLM3_3 (0.008)
DATA IS DIFFERENT SYSTEMS ARE DIFFERENT ONLY THE METRICS ARE THE SAME
TRECVID 2005 18
2005: Text-only versus Text-plus searches by group (using only common training data)
0.1 0.2 L C C L o w l a n d s N U S H U T T s i n g h u a I B M U i o w a Text only Text plus
0.1 0.2 F u d a n L o w l a n d s O u l u H U T Text only Text plus
Automatic searches Manual searches
TRECVID 2005 19
2005: Mean avg. precision by topic
- 149
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172
Topic number Mean average precision
Interactive max Manual max Automatic max Interactive median Manual median Automatic median
Tennis player Tony Blair Soccer match goal People entering/leaving a building
TRECVID 2005 20
2005: Interactive runs’ median average precision by topic
0.56 0.546 0.486 0.405 0.389 0.339 0.336 0.286 0.275 0.274 0.27 0.258 0.195 0.138 0.098 0.097 0.096 0.074 0.067 0.067 0.065 0.057 0.044 0.013
0.1 0.2 0.3 0.4 0.5 0.6 156 153 171 149 151 165 154 155 158 150 152 164 159 161 163 168 169 167 166 157 170 160 172 162
Interactive median AP
156: Tennis players on the court – both players visible at the same time 153: Tony Blair 171: Goal being made in a soccer match 149: Condoleeza Rice 151: Omar Karami
TRECVID 2005 21
2005: Manual runs’ median average precision by topic
0.255 0.2 0.153 0.128 0.076 0.07 0.056 0.053 0.048 0.04 0.037 0.032 0.029 0.02 0.016 0.015 0.013 0.009 0.007 0.005 0.004 0.004 0.002 0.002
0.1 0.2 0.3 0.4 0.5 0.6 151 152 153 171 164 154 161 165 156 158 149 169 168 155 150 163 160 172 159 170 157 166 162 167
Manual median AP
151: Omar Karami, the former PM of Iraq 152: Hu Jintao, President of the People’s Republic of China 153: Tony Blair 171: tall building 164: ship or boat
TRECVID 2005 22
2005: Automatic runs’ median average precision by topic
0.166 0.165 0.157 0.154 0.084 0.05 0.042 0.039 0.037 0.038 0.034 0.032 0.028 0.009 0.008 0.008 0.007 0.004 0.004 0.002 0.001 0.001
0.1 0.2 0.3 0.4 0.5 0.6 171 151 153 152 164 154 168 156 149 158 169 161 165 163 150 172 160 166 170 157 155 162 167 159
Automatic median AP
171: Goal being made in a soccer match 151: Omar Karami, the former PM of Iraq 153: Tony Blair 152: Hu Jintao 164: ship or boat
TRECVID 2005 23
2005: Mean average precision (interactive max) vs total number relevant
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
5 1 1 5 2 2 5 3 3 5 4 4 5 5 5 5 6 6 5 7 7 5 8 8 5 9 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5
Total number of relevant Mean average precision
TRECVID 2005 24
Who did what ?
- Speaker slots to follow:
n Carnegie Mellon University n IBM Research n MediaMill (University of Amsterdam and TNO) n National University of Singapore n University of Oulu/MediaTeam
- No papers from:
n Bilkent University n QMUL n SCHEMA - University of Bremen
- Demos ?
- Posters ?
TRECVID 2005 25
Columbia University
- Interactive search tool developed with
n Text search, CBIR search, Story segmentation and story-level browsing, 39 visual concepts from LSCOM-Lite, near- duplicate detection, query-class dependent weights and cue-X re-ranking;
- Manual run with
n Text, CBIR and visual concepts;
- Automatic runs with
n Query-class dependent weights of some of the above;
Interactive Manual Automatic
TRECVID 2005 26
Dublin City University
- Interactive search used a DiamondTouch collaborative
tabletop interface from MERL, to text and image-based video searching;
- 2 versions
n Increase user’s awareness of other user thus forcing the collaboration; n More like “leave me alone” searching support for efficient solo searching;
- Aim was to explore user-user collaborative search;
- Findings are that group awareness benefits retrieval;
- Also did manual and automatic runs - exploring text-only vs.
text+image searching;
Interactive Manual Automatic
TRECVID 2005 27
Fudan University
- Submitted manual runs and explored multi-modal
fusion;
- Found that relation expression fusion better than
linear fusion using a variety of retrieval modalities;
n Text; n 14 x visual concepts; n Pseudo relevance feedback; n Logistic regression
- Also explored training weights online vs. training
weights offline
Interactive Manual Automatic
TRECVID 2005 28
FX Palo Alto Laboratory
- Participated in interactive search;
- Enhanced the 2004 system for efficient browsing
and enhanced visualisation, by adding 29 concepts/semantic features;
- Story-level browsing, keyframe thumbnails, text
dialogue overlays, story timelines;
- Query is text a/o image;
- Text-only search is as good as text+others
(because the browser and visualisation is very strong ?);
Interactive Manual Automatic
TRECVID 2005 29
Helsinki University of Technology
- Automatic, manual and interactive runs;
- Addressed text-only vs. text+mult-imodal
querying;
- Multi-modal better than text-only !
- Interactive search used relevance feedback only
with no “search” or shot browsing so very dynamic user control;
Interactive Manual Automatic
TRECVID 2005 30
Imperial College London
- Content-based search + NNk browsing in a 2D
GUI map browser;
- Enhanced 2004 system with new kind of
relevance feedback;
- Text-based search, content-based search with
relevance feedback and temporal browsing integrated into a unified interface;
- Emphasis on supporting user task;
Interactive Manual Automatic
TRECVID 2005 31
Language Computer Corporation (LCC)
- Participated in automatic search;
- Used combinations of ASR text search (language
modelling), image features, high-level features, alone and in combination;
- Image features used blobs;
- Text search alone was best-performing
Interactive Manual Automatic
TRECVID 2005 32
Lowlands (CWI, Twente, U. of Amsterdam)
- Manual and automatic search runs;
- Visual and text searching
- Weibull models and Gaussian mixture models for
visual features, language modeling for text;
- No clear results differentiation;
- First steps towards developing parameterised
search engines for each;
Interactive Manual Automatic
TRECVID 2005 33
Tsinghua University
- Three search modes - text, image match based
- n region matching, and concept matching in a
concept;
- Concept/feature recognition approach based on
their HLF submissions;
- Explore latent relationship (LSA) between (ASR)
text and visual features and concepts;
- Tried each of these alone and in combinations
using score fusion and query type-specific (2) weighting;
- Conclusion is that combinations work best;
Interactive Manual Automatic
TRECVID 2005 34
University of Central Florida
- UCF first time in search task;
- PEGASUS system, web-based, interactive, used
ASR, OCR, keyframe global histograms and high level features;
- Submitted ASR-only & multi-modal runs;
- Multi-modal better than ASR-only;
Interactive Manual Automatic
TRECVID 2005 35
University of Iowa
- Automatic search runs;
- Text-only vs. text+image features;
n Keyrame-keyframe pixel distances; n Text + colour information; n Text + texture information; n Text + edge information;
- Text-only was best - could have combined visual
features ?
Interactive Manual Automatic
TRECVID 2005 36
University of North Carolina
- Investigate the effects of providing context and
interactivity in a retrieval system, supporting the browsing of search result sets;
n Basic Google-like video search n Enhanced with shot context browsing; n Further enhanced with interactive feedback, eg mouseover gives enlarged keyframes;
- For both performance and user perceptions, the
Context+Interactive system was superior - higher recall, precision the same;
Interactive Manual Automatic
TRECVID 2005 37
Observations
- We’re still getting “Lots of variation, interesting shot
browsing interfaces, mixture of interactive & manual”, and additionally automatic runs;
- Top performances on all 3 search types are up, even with
more difficult data, but data is different, systems are different … anybody run 2004 system on 2005 data ?
- Some leveraged the structured nature of B/News;
- Many did automatic search & fewer did interactive search -
because its easier (no users) ?
- Most common issue explored was the best combination of
text vs. image search vs. concept/features;
- Search participants are the “regulars” plus new groups, some