TRECVID-2005: Search Task Alan Smeaton Dublin City University - - PowerPoint PPT Presentation

trecvid 2005 search task
SMART_READER_LITE
LIVE PREVIEW

TRECVID-2005: Search Task Alan Smeaton Dublin City University - - PowerPoint PPT Presentation

TRECVID-2005: Search Task Alan Smeaton Dublin City University & Tzveta Ianeva NIST Search Task Definition o Given a test collection, a multimedia statement of information need (topic) and a common shot boundary reference, return a


slide-1
SLIDE 1

TRECVID-2005: Search Task

Alan Smeaton Dublin City University & Tzveta Ianeva NIST

slide-2
SLIDE 2

TRECVID 2005 2

Search Task Definition

  • Given a test collection, a multimedia statement of

information need (topic) and a common shot boundary reference, return a ranked list of at most 1,000 shots which best satisfy the need;

  • Goal: promote progress in content-based retrieval from

digital video via open, metrics-based evaluation;

  • Many thanks to

n Christian Petersohn (Fraunhofer Institute) for master shot reference n DCU team for formatting and selecting keyframes n Jonathan Lasko for the shot boundary truth data creation n CMU & Randy Paul for getting a government contractor to provide MT/ASR

slide-3
SLIDE 3

TRECVID 2005 3

Search Task Definition

  • NIST created topics based on a number of basic search

types: generic/specific and person/thing/event where there are multiple relevant shots coming from more than one video;

  • Videos were viewed by NIST personnel (with sound turned
  • ff), notes taken on content, and candidates emerged and

were chosen;

  • Interactive search participants were asked to have their

subjects complete pre, post-topic and post-search questionnaires;

  • Each result for a topic can come from only 1 user search;

but the same searcher does not need to be used for all topics in a run.

slide-4
SLIDE 4

TRECVID 2005 4

Overarching Goals

  • Previous TRECVids show huge benefit from using text

(ASR, closed captions, video OCR);

  • TRECVid 2005 data is (deliberately) text-noisy with video

from English language, Arabic & Chinese broadcasts;

  • Text is derived from speech recognition and then machine

translation, thus poorer quality than previously ?

  • Net outcome is that task is harder, more emphasis on

visual and less on text ?

slide-5
SLIDE 5

TRECVID 2005 5

2005: Search task participants (20, up from 16)

Bilkent University Turkey Carnegie Mellon University USA Columbia University USA Dublin City University Ireland Fudan University China FX Palo Alto Laboratory USA Helsinki University of Technology Finland IBM USA Imperial College London UK Language Computer Corporation (LCC) USA Lowlands Team (CWI, Twente, U. of Amsterdam) Netherlands Mediamill Team (Univ. of Amsterdam and TNO) Netherlands National University of Singapore (NUS) Singapore Queen Mary University of London UK SCHEMA-Univ. Bremen Team EU Tsinghua University China University of Central Florida / University of Modena USA,Italy University of Iowa USA University of North Carolina USA University of Oulu / MediaTeam Finland

slide-6
SLIDE 6

TRECVID 2005 6

Search Types: Automatic, Manual and Interactive

Number of runs: 42 automatic (up from 23) 26 manual (down from 52) 44 interactive (down from 61)

slide-7
SLIDE 7

TRECVID 2005 7

24 Topics [ number of image, video examples and relevant found]

149. Find shots of Condoleeza Rice [3, 6, 116] 150. Find shots of Iyad Allawi, the former prime minister of Iraq [3, 6, 13] 151. Find Find shots of Omar Karami, the former prime minister of Lebannon [2, 5, 301] 152. Find shots of Hu Jintao, president of the People’s Republic of China [2, 9, 498] 153. Find shots of Tony Blair. [2, 4, 42] 154. Find shots of Mahmoud Abbas, also known as Abu Mazen, prime minister of the Palestinian Authority. [2, 9, 93] 155. Find shots of a graphic map of Iraq, location of Bagdhad marked – not a weather map [4, 10, 54] 156. Find shots of tennis players on the court – both players visible at the same time [2, 4, 55] 157. Find shots of people shaking hands [4, 10, 470] 158. Find shots of a helicopter in flight [2, 8, 63] 159. Find shots of George Bush entering or leaving a vehicle (e.g., car, van, airplane, helicopter, etc), he and vehicle both visible at the same time [2, 7, 29] 160. Find shots of something (e.g., vehicle, aircraft, building, etc.) on fire with flames and smoke visible [2, 9, 169]

slide-8
SLIDE 8

TRECVID 2005 8

24 Topics [number of image, video examples and relevant found]

  • 161. Find shots of people with banners or signs [2, 6,1245 ]
  • 162. Find shots of one or more people entering or leaving a building [5, 8, 385]
  • 163. Find shots of a meeting with a large table and more than two people [2, 5, 1160]
  • 164. Find shots of a ship or boat [3, 7, 214]
  • 165. Find shots of basketball players on the court [2, 8, 254]
  • 166. Find shots of one or more palm trees [2, 6, 253]
  • 167. Find shots of an airplane taking off [2, 5, 19]
  • 168. Find shots of a road with one or more cars [2, 5, 1087]
  • 169. Find shots of one or more tanks or other military vehicles [3, 8, 493]
  • 170. Find shots of tall building (with more than 5 floors above the ground) [3, 6, 543]
  • 171. Find shots of a goal being made in a soccer match [1, 7, 49]
  • 172. Find shots of an office setting, i.e., one or more desks/tables and one or more

computers and one or more people [3, 8, 790]

slide-9
SLIDE 9

TRECVID 2005 9

Some statistics

  • 2005:

n Number of shots in test collection: 45.765 n ~18.3% relevant shots found: 8.395

  • 2004

n Number of shots in test collection: 33.367 n ~5.4% relevant shots found: 1.800

  • 2003

n Number of shots in test collection: 32.318 n ~6.5% relevant shots found: 2.114

slide-10
SLIDE 10

TRECVID 2005 10

5 10 15 20 25 30 35 40 45 50 CMU Columbia University Fudan University Imperial College London Lowlands team National University of Singapure University of Oulu Helsinki University of Technology Shema-Univ. Bremen of Modena Tsinghua University IBM U.of Central Florida / U. of Modena University of Iowa Mediamil team Dublin City University Queen Mary University of London

Number of unique, relevant shots

2005: 16 sites contributed one or more unique, relevant shots (8 last year)

slide-11
SLIDE 11

TRECVID 2005 11

CMU Columbia Univ. Fudan Univ. Imperial College Lowlands team NUS

  • Univ. of Oulu

Helsinki U. of Technology SHEMA-U. Bremen Tsinghua Univ. IBM

  • U. of Central Florida / U. of Modena
  • Univ. of Iowa

Dublin City Univ. Queen Mary Univ. Mediamil team

1 4 9 ( 1 1 6 ) 1 5 ( 1 3 ) 1 5 1 ( 3 1 ) 1 5 2 ( 4 9 8 ) 1 5 3 ( 4 2 ) 1 5 4 ( 9 3 ) 1 5 5 ( 5 4 ) 1 5 6 ( 5 5 ) 1 5 7 ( 4 7 ) 1 5 8 ( 6 3 ) 1 5 9 ( 2 9 ) 1 6 ( 1 6 9 ) 1 6 1 ( 1 2 4 5 ) 1 6 2 ( 3 8 5 ) 1 6 3 ( 1 1 6 ) 1 6 4 ( 2 1 4 ) 1 6 5 ( 2 5 4 ) 1 6 6 ( 2 5 3 ) 1 6 7 ( 1 9 ) 1 6 8 ( 1 8 7 ) 1 6 9 ( 4 9 3 ) 1 7 ( 5 4 3 ) 1 7 1 ( 4 9 ) 1 7 2 ( 7 9 )

2 2 3 1 23 4 9 9 2 5 2 2 2 2 2 4 6 3 1 2 1 1 1 2 1 1 1 1 6 4 2 4 1 2 6 2 7 2 1 2 3 1 11 1 4 9 1 1 5 4 1 25 4 2 6 5 1 2 4 3 19 1 2 1 1 1 1 1 5 2 1 2 1 1 1 1 2 4 6 8 10

Number of unique true shots Group

Topic (total relevant)

2005: Rel shots contrib. uniquely per topic by team

161, 163, 168 have 1000+ 170, 172 have 500+

slide-12
SLIDE 12

TRECVID 2005 12

2005: Interactive runs - top 10 MAP (of 49)

(mean elapsed time for all == ~15 mins/topic)

  • B_2_UvA-MM_1

A_2_CMU.MotoX_6 B_2_CMU_Mon_1 A_2_CMU.Snowboarding_S A_1_FXPAL1LCN_2 A_1_FXPAL0LN_1 A_1_FXPAL4LC_5 B_2_UvA-MM_4 B_2_UvA-MM_2 A_1_FXPAL2RAN_3

slide-13
SLIDE 13

TRECVID 2005 13

2004: Interactive runs - top 10 MAP (of 62)

(mean elapsed time for all == ~15 mins/topic)

  • B_2_UvA-MM_1

C_2_CMU1I_1 A_1_FXPAL_2_5 A_1_FXPAL_1_4 A_1_FXPAL_3_6 A_2_IBM.Interactive_2_ARC_7 A_2_IBM.Interactive_1_ARC_1 A_1_FXPAL_1_7 A_1_FXPAL_2_8 A_1_FXPAL_3_9

DATA IS DIFFERENT SYSTEMS ARE DIFFERENT ONLY THE METRICS ARE THE SAME

slide-14
SLIDE 14

TRECVID 2005 14

2005: Manual runs - top 10 MAP (of 26)

(mean human effort (mins) / topic)

  • M_A_2_CMU.Manu.ExpECA.QC04CR.PU_5 (15)

M_A_2_CMU.Manu.ExpE.QC05U_7 (15) M_A_2_PicSOM-M3_2 (0.93) M_A_2_FD_MM_BC_1 (11.1) M_A_2_OUMT_M7TE_7 (5.06) M_A_2_OUMT_M6TS_6 (5.02) M_A_2_PicSOM-M2_4 (0.87) M_A_2_FD_AOH_LR_ONLINE_3 (11.1) M_A_1_OUMT_M5T_5 (5.01) M_A_1_dcu_manual_text_img_6 (3)

slide-15
SLIDE 15

TRECVID 2005 15

  • A_2_NUSVID2004_1 (3.92)

A_2_NUSVID2004_2 (3.93) A_2_NUSVID2004_3 (3.71) A_2_IBM.Manual_ARC_5 (15) A_1_IBM.SpeechBaseline_ARC_9 (15) C_2_CMU09M_9 (15) C_2_CMU07M_7 (15) A_2_NUSVID2004_4 (3.76) C_2_CMU05M_5 (15) A_2_LL-M-dyn-sel-ASR-RR_5 (3)

2004: Manual runs - top 10 MAP (of 52)

(mean human effort (mins) / topic) DATA IS DIFFERENT SYSTEMS ARE DIFFERENT ONLY THE METRICS ARE THE SAME

slide-16
SLIDE 16

TRECVID 2005 16

2005: Automatic runs (pilot) - top 10 MAP (of 23)

(mean elapsed time (mins) / topic)

  • F_B_2_NUS_PRIS_1 (0.55)

F_A_2_TJW_VM_4 (15) F_A_2_TJW_TVM_2 (15) F_A_2_TJW_V_3 (15) F_B_2_NUS_PRIS_2 (0.56) F_A_2_TJW_TV_5 (15) F_A_2_NUS_PRIS_3 (0.3) F_C_2_ColumbiaA2_5 (15) F_B_2_UvA-MM_6 (0.7) F_A_2_PicSOM-F2_3 (0.14)

slide-17
SLIDE 17

TRECVID 2005 17

2004: Automatic runs (pilot) - top 10 MAP (of 23)

(mean elapsed time (mins) / topic)

  • C_2_DCU_AUTOLM6_6 (3.43)

C_2_DCU_AUTOLM7_7 (3.12) C_2_CMUS5A_S (15) A_2_LL-F-stat-allvidim-ASR-RR_S (3) A_2_LL-F-stat-allimvid-ASR-RR_S (3) A_2_LL-F-dyn-allvidim-ASR-RR_S (3) A_2_LL-F-dyn-allimvid-ASR-RR_S (3) C_2_DCU_AUTOLM1_1 (0.007) A_1_LL-F-ASR-full_S (0?) C_2_DCU_AUTOLM3_3 (0.008)

DATA IS DIFFERENT SYSTEMS ARE DIFFERENT ONLY THE METRICS ARE THE SAME

slide-18
SLIDE 18

TRECVID 2005 18

2005: Text-only versus Text-plus searches by group (using only common training data)

0.1 0.2 L C C L o w l a n d s N U S H U T T s i n g h u a I B M U i o w a Text only Text plus

0.1 0.2 F u d a n L o w l a n d s O u l u H U T Text only Text plus

Automatic searches Manual searches

slide-19
SLIDE 19

TRECVID 2005 19

2005: Mean avg. precision by topic

  • 149

150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172

Topic number Mean average precision

Interactive max Manual max Automatic max Interactive median Manual median Automatic median

Tennis player Tony Blair Soccer match goal People entering/leaving a building

slide-20
SLIDE 20

TRECVID 2005 20

2005: Interactive runs’ median average precision by topic

0.56 0.546 0.486 0.405 0.389 0.339 0.336 0.286 0.275 0.274 0.27 0.258 0.195 0.138 0.098 0.097 0.096 0.074 0.067 0.067 0.065 0.057 0.044 0.013

0.1 0.2 0.3 0.4 0.5 0.6 156 153 171 149 151 165 154 155 158 150 152 164 159 161 163 168 169 167 166 157 170 160 172 162

Interactive median AP

156: Tennis players on the court – both players visible at the same time 153: Tony Blair 171: Goal being made in a soccer match 149: Condoleeza Rice 151: Omar Karami

slide-21
SLIDE 21

TRECVID 2005 21

2005: Manual runs’ median average precision by topic

0.255 0.2 0.153 0.128 0.076 0.07 0.056 0.053 0.048 0.04 0.037 0.032 0.029 0.02 0.016 0.015 0.013 0.009 0.007 0.005 0.004 0.004 0.002 0.002

0.1 0.2 0.3 0.4 0.5 0.6 151 152 153 171 164 154 161 165 156 158 149 169 168 155 150 163 160 172 159 170 157 166 162 167

Manual median AP

151: Omar Karami, the former PM of Iraq 152: Hu Jintao, President of the People’s Republic of China 153: Tony Blair 171: tall building 164: ship or boat

slide-22
SLIDE 22

TRECVID 2005 22

2005: Automatic runs’ median average precision by topic

0.166 0.165 0.157 0.154 0.084 0.05 0.042 0.039 0.037 0.038 0.034 0.032 0.028 0.009 0.008 0.008 0.007 0.004 0.004 0.002 0.001 0.001

0.1 0.2 0.3 0.4 0.5 0.6 171 151 153 152 164 154 168 156 149 158 169 161 165 163 150 172 160 166 170 157 155 162 167 159

Automatic median AP

171: Goal being made in a soccer match 151: Omar Karami, the former PM of Iraq 153: Tony Blair 152: Hu Jintao 164: ship or boat

slide-23
SLIDE 23

TRECVID 2005 23

2005: Mean average precision (interactive max) vs total number relevant

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

5 1 1 5 2 2 5 3 3 5 4 4 5 5 5 5 6 6 5 7 7 5 8 8 5 9 9 5 1 1 5 1 1 1 1 5 1 2 1 2 5

Total number of relevant Mean average precision

slide-24
SLIDE 24

TRECVID 2005 24

Who did what ?

  • Speaker slots to follow:

n Carnegie Mellon University n IBM Research n MediaMill (University of Amsterdam and TNO) n National University of Singapore n University of Oulu/MediaTeam

  • No papers from:

n Bilkent University n QMUL n SCHEMA - University of Bremen

  • Demos ?
  • Posters ?
slide-25
SLIDE 25

TRECVID 2005 25

Columbia University

  • Interactive search tool developed with

n Text search, CBIR search, Story segmentation and story-level browsing, 39 visual concepts from LSCOM-Lite, near- duplicate detection, query-class dependent weights and cue-X re-ranking;

  • Manual run with

n Text, CBIR and visual concepts;

  • Automatic runs with

n Query-class dependent weights of some of the above;

Interactive Manual Automatic

slide-26
SLIDE 26

TRECVID 2005 26

Dublin City University

  • Interactive search used a DiamondTouch collaborative

tabletop interface from MERL, to text and image-based video searching;

  • 2 versions

n Increase user’s awareness of other user thus forcing the collaboration; n More like “leave me alone” searching support for efficient solo searching;

  • Aim was to explore user-user collaborative search;
  • Findings are that group awareness benefits retrieval;
  • Also did manual and automatic runs - exploring text-only vs.

text+image searching;

Interactive Manual Automatic

slide-27
SLIDE 27

TRECVID 2005 27

Fudan University

  • Submitted manual runs and explored multi-modal

fusion;

  • Found that relation expression fusion better than

linear fusion using a variety of retrieval modalities;

n Text; n 14 x visual concepts; n Pseudo relevance feedback; n Logistic regression

  • Also explored training weights online vs. training

weights offline

Interactive Manual Automatic

slide-28
SLIDE 28

TRECVID 2005 28

FX Palo Alto Laboratory

  • Participated in interactive search;
  • Enhanced the 2004 system for efficient browsing

and enhanced visualisation, by adding 29 concepts/semantic features;

  • Story-level browsing, keyframe thumbnails, text

dialogue overlays, story timelines;

  • Query is text a/o image;
  • Text-only search is as good as text+others

(because the browser and visualisation is very strong ?);

Interactive Manual Automatic

slide-29
SLIDE 29

TRECVID 2005 29

Helsinki University of Technology

  • Automatic, manual and interactive runs;
  • Addressed text-only vs. text+mult-imodal

querying;

  • Multi-modal better than text-only !
  • Interactive search used relevance feedback only

with no “search” or shot browsing so very dynamic user control;

Interactive Manual Automatic

slide-30
SLIDE 30

TRECVID 2005 30

Imperial College London

  • Content-based search + NNk browsing in a 2D

GUI map browser;

  • Enhanced 2004 system with new kind of

relevance feedback;

  • Text-based search, content-based search with

relevance feedback and temporal browsing integrated into a unified interface;

  • Emphasis on supporting user task;

Interactive Manual Automatic

slide-31
SLIDE 31

TRECVID 2005 31

Language Computer Corporation (LCC)

  • Participated in automatic search;
  • Used combinations of ASR text search (language

modelling), image features, high-level features, alone and in combination;

  • Image features used blobs;
  • Text search alone was best-performing

Interactive Manual Automatic

slide-32
SLIDE 32

TRECVID 2005 32

Lowlands (CWI, Twente, U. of Amsterdam)

  • Manual and automatic search runs;
  • Visual and text searching
  • Weibull models and Gaussian mixture models for

visual features, language modeling for text;

  • No clear results differentiation;
  • First steps towards developing parameterised

search engines for each;

Interactive Manual Automatic

slide-33
SLIDE 33

TRECVID 2005 33

Tsinghua University

  • Three search modes - text, image match based
  • n region matching, and concept matching in a

concept;

  • Concept/feature recognition approach based on

their HLF submissions;

  • Explore latent relationship (LSA) between (ASR)

text and visual features and concepts;

  • Tried each of these alone and in combinations

using score fusion and query type-specific (2) weighting;

  • Conclusion is that combinations work best;

Interactive Manual Automatic

slide-34
SLIDE 34

TRECVID 2005 34

University of Central Florida

  • UCF first time in search task;
  • PEGASUS system, web-based, interactive, used

ASR, OCR, keyframe global histograms and high level features;

  • Submitted ASR-only & multi-modal runs;
  • Multi-modal better than ASR-only;

Interactive Manual Automatic

slide-35
SLIDE 35

TRECVID 2005 35

University of Iowa

  • Automatic search runs;
  • Text-only vs. text+image features;

n Keyrame-keyframe pixel distances; n Text + colour information; n Text + texture information; n Text + edge information;

  • Text-only was best - could have combined visual

features ?

Interactive Manual Automatic

slide-36
SLIDE 36

TRECVID 2005 36

University of North Carolina

  • Investigate the effects of providing context and

interactivity in a retrieval system, supporting the browsing of search result sets;

n Basic Google-like video search n Enhanced with shot context browsing; n Further enhanced with interactive feedback, eg mouseover gives enlarged keyframes;

  • For both performance and user perceptions, the

Context+Interactive system was superior - higher recall, precision the same;

Interactive Manual Automatic

slide-37
SLIDE 37

TRECVID 2005 37

Observations

  • We’re still getting “Lots of variation, interesting shot

browsing interfaces, mixture of interactive & manual”, and additionally automatic runs;

  • Top performances on all 3 search types are up, even with

more difficult data, but data is different, systems are different … anybody run 2004 system on 2005 data ?

  • Some leveraged the structured nature of B/News;
  • Many did automatic search & fewer did interactive search -

because its easier (no users) ?

  • Most common issue explored was the best combination of

text vs. image search vs. concept/features;

  • Search participants are the “regulars” plus new groups, some

bigger, some smaller;