ediaEval
multimedia benchmarking initiative
Gareth Jones Dublin City University Ireland Martha Larson TU Delft The Netherlands
ediaEval multimedia benchmarking initiative Gareth Jones Martha - - PowerPoint PPT Presentation
ediaEval multimedia benchmarking initiative Gareth Jones Martha Larson Dublin City University TU Delft Ireland The Netherlands Overview What is MediaEval? MediaEval task selection MediaEval 2011 MediaEval 2012 What is
multimedia benchmarking initiative
Gareth Jones Dublin City University Ireland Martha Larson TU Delft The Netherlands
– Multimedia content analysis – Social network structures – User-contributed tags
PetaMedia Network of Excellence: Peer-to-peer Tagged Media
– Proposed tasks included in pre-selection questionnaire.
– use scenario – research questions – accessible dataset – creative commons – realistic groundtruthing process - crowdsourcing – “champions” willing to be coordinators
to localise images and video, allowing them to be anchored to real world locations. Currently most online images and video are no labeled with their location.
coordinates to Flickr videos using one or more of: Flickr metadata, visual content, audio content, social information.
predominantly English.
Participants may use audio and visual data, and any available image metadata. Allowed to submit up 5 runs, including at most one incorporating a gazzetter and/or one run using additional crawled material from
Evaluated based on number within 1km, 10km, 100km, 1,000km and 10,000km of groundtruth.
Vanessa Murdock, Yahoo! Research Adam Rae, Yahoo! Research Pascal Kelm, TU Berlin Pavel Serdyukov, Yandex
available to communities in the developing world via audio input and output to mobile devices. Small amount of training data available for these languages poses challenges to speech recognition.
an audio content query for poorly resourced languages. This task is particularly intended for speech researchers in the area of spoken term detection.
Hindi, Gujarati and Telugu. Each of the ca. 400 data item is an 8 KHz audio file 4-30 secs in length.
Nitendra Rajput, IBM Research India Florian Metze, CMU
the management of movie databases. The company seeks to help users choose appropriate content. A particular use case involves helping families to choose movies suitable for their children.
portions of movies containing violent material. Violence is defined as “physical violence or accident resulting human pain or injury”.
purchased by the participants): 12 used for training, 3 for testing.
Each violent event labeled with start and end frame.
content and subtitles. The may not use any data external to collection.
combining false alarms and misses
Mohammad Soleymani, Univ. Geneva Claire-Helene Demarty, Technicolor Guillaume Gravier, IRISA
Flickr tylluan
browsing video, especially true of semi-professional user generated content (SPUG).
derived from speech, audio, visual content or associated text
video data harvested from blip.tv creative commons internet video.
speech recognition transcripts provided by LIMSI and Vocapia research; automatically extracted shot boundaries and keyframes.
and one including metadata. Evaluated using MAP.
Martha Larson, TU-Delft Sebastian Schmiedeke, TU-Berlin Christoph Kofler, TU-Delft Isabelle Ferrané, Université Paul Sabatier
content contains much potentially interesting material. However, this is only useful if relevant information can be found.
participants are required to automatically identify relevant jump-in points into the video based on the combination of modalities
– what speakers are accomplishing by speaking. Five acts chosen: `apology’, ‘definition’, ‘opinion’, ‘promise’ and ‘warning’.
Workers located section of video containing one of the speech acts that they would wish to share, and then form long form description and a short form query that they would use to re-find a jump-iin point to begin playback.
Roeland Ordelman, University of Twente and B&G Maria Eskevich, Dublin City University
made available in different forms. People generally think in terms of events, rather than scattered separate items.
to either a specific social event or an event-class of interest. The events are planned by people, attended by people and for which the social media are captured by people.
Amsterdam, Barcelona, London, Paris and Rome.
Rome (Italy) in the test collection. For each event provide all photos associated with it. – Must be soccer matches, not someone with a ball or picture of football stadium.
named Paradiso (in Amsterdam, NL) and in the Parc del Forum (in Barcelona, Spain). For each event provide all photos associated with it.
in other runs.
Raphael Troncy, Eurecom Vasileios Mezaris, ITI CERTH
20
– Unofficial satellite of ACM Multimedia 2010 – Potentially workshop at ECCV 2012
– Published by CEUR.WS: http://ceur-ws.org/Vol-807/
21
22
PetaMedia
MediaEval 2010 results were presented at ACM ICMR 2011 in a special session entitles “Automatic Tagging and Geo-Tagging in Video Collections and Communities”
24
25
– Please propose a task!
– Please complete the questionnaire!
26
http://www.multimediaeval.org