ediaEval multimedia benchmarking initiative Gareth Jones Martha - - PowerPoint PPT Presentation

ediaeval
SMART_READER_LITE
LIVE PREVIEW

ediaEval multimedia benchmarking initiative Gareth Jones Martha - - PowerPoint PPT Presentation

ediaEval multimedia benchmarking initiative Gareth Jones Martha Larson Dublin City University TU Delft Ireland The Netherlands Overview What is MediaEval? MediaEval task selection MediaEval 2011 MediaEval 2012 What is


slide-1
SLIDE 1

ediaEval

multimedia benchmarking initiative

Gareth Jones Dublin City University Ireland Martha Larson TU Delft The Netherlands

slide-2
SLIDE 2
  • What is MediaEval?
  • MediaEval task selection
  • MediaEval 2011
  • MediaEval 2012

Overview

slide-3
SLIDE 3

What is MediaEval?

  • … a multimedia benchmarking initiative.
  • … evaluates new algorithms for multimedia access

and retrieval.

  • … emphasizes the "multi" in multimedia: speech,

audio, visual content, tags, users, context.

  • … innovates new tasks and techniques focusing on

the human and social aspects of multimedia content. http://www.multimediaeval.org

slide-4
SLIDE 4

What is MediaEval?

  • MediaEval originates and follows on from the

VideoCLEF track at CLEF 2008 and CLEF 2009.

  • Established as an independent benchmarking

initiative in 2010.

  • Supported in 2010 and 2011 by PetaMedia EC

Network of Excellence, from 2012 by the Cubrik project.

slide-5
SLIDE 5

What is MediaEval?

  • Follows standard retrieval annual benchmarking cycle.
  • Look to real-world use scenarios for tasks.
  • Where possible select tasks with industrial relevance.
  • Also draw inspiration from the “PetaMedia Triple

Synergy”:

– Multimedia content analysis – Social network structures – User-contributed tags

PetaMedia Network of Excellence: Peer-to-peer Tagged Media

slide-6
SLIDE 6

MediaEval task selection

  • Community based task selection process.

– Proposed tasks included in pre-selection questionnaire.

  • Tasks must have:

– use scenario – research questions – accessible dataset – creative commons – realistic groundtruthing process - crowdsourcing – “champions” willing to be coordinators

  • Selected tasks must have a minimum of 5 registered

participants to run.

slide-7
SLIDE 7

Participating Teams

slide-8
SLIDE 8

MediaEval Tasks 2011

  • Placing task (6)
  • Spoken Web Search task (5)
  • Affect task (6)
  • Genre tagging task (10)
  • Rich Speech Retrieval task (5)
  • Social event detection task (7)
slide-9
SLIDE 9

Placing Task

  • Motivation: Knowing their geographical location helps users

to localise images and video, allowing them to be anchored to real world locations. Currently most online images and video are no labeled with their location.

  • Task: Automatically assigning geo-

coordinates to Flickr videos using one or more of: Flickr metadata, visual content, audio content, social information.

  • Data: Creative Commons Flickr data,

predominantly English.

slide-10
SLIDE 10

Placing Task

  • Data: Creative Commons Flickr video data, predominantly English

Participants may use audio and visual data, and any available image metadata. Allowed to submit up 5 runs, including at most one incorporating a gazzetter and/or one run using additional crawled material from

  • utside the collection.
  • Evaluation: Geo-coordinates from Flickr used as groundtruth.

Evaluated based on number within 1km, 10km, 100km, 1,000km and 10,000km of groundtruth.

  • Organizers:

Vanessa Murdock, Yahoo! Research Adam Rae, Yahoo! Research Pascal Kelm, TU Berlin Pavel Serdyukov, Yandex

slide-11
SLIDE 11

Spoken Web Search

  • Motivation: Spoken Web aspires to make information

available to communities in the developing world via audio input and output to mobile devices. Small amount of training data available for these languages poses challenges to speech recognition.

  • Task: Search FOR audio content WITHIN audio content USING

an audio content query for poorly resourced languages. This task is particularly intended for speech researchers in the area of spoken term detection.

  • Data: Audio from four different Indian languages - English,

Hindi, Gujarati and Telugu. Each of the ca. 400 data item is an 8 KHz audio file 4-30 secs in length.

  • Organizers:

Nitendra Rajput, IBM Research India Florian Metze, CMU

slide-12
SLIDE 12

Affect Task: Violent Scene Detection

  • Motivation: Technicolor is developing services which support

the management of movie databases. The company seeks to help users choose appropriate content. A particular use case involves helping families to choose movies suitable for their children.

  • Task: Deploy multimodal features to automatically detect

portions of movies containing violent material. Violence is defined as “physical violence or accident resulting human pain or injury”.

  • Data: A set of ca. 15 Hollywood movies (that must be

purchased by the participants): 12 used for training, 3 for testing.

slide-13
SLIDE 13

Affect Task: Violent Scene Detection

  • Groundtruth: 7 human assessors located one violent action,
  • r different overlapping actions.

Each violent event labeled with start and end frame.

  • Features: Participants use any features extracted from the

content and subtitles. The may not use any data external to collection.

  • Evaluation: Detection cost function

combining false alarms and misses

  • Organizers:

Mohammad Soleymani, Univ. Geneva Claire-Helene Demarty, Technicolor Guillaume Gravier, IRISA

Flickr tylluan

slide-14
SLIDE 14

Genre Tagging

  • Motivation: Genre tags can provide valuable information for

browsing video, especially true of semi-professional user generated content (SPUG).

  • Task: Automatically assign genre tags to video using features

derived from speech, audio, visual content or associated text

  • r social information.
  • Data: Around 350 hours of

video data harvested from blip.tv creative commons internet video.

  • Data accompanied by automatic

speech recognition transcripts provided by LIMSI and Vocapia research; automatically extracted shot boundaries and keyframes.

slide-15
SLIDE 15

Genre Tagging

  • Tag Assignment: Participants are required to predict one of 26
  • tags. Groundtruth tags collected directly using blip.tv API.
  • Runs: 5 runs submitted, one based only on ASR transcripts

and one including metadata. Evaluated using MAP.

  • Organizers:

Martha Larson, TU-Delft Sebastian Schmiedeke, TU-Berlin Christoph Kofler, TU-Delft Isabelle Ferrané, Université Paul Sabatier

slide-16
SLIDE 16

Rich Speech Retrieval

  • Motivation: Emerging semi-professional user generated

content contains much potentially interesting material. However, this is only useful if relevant information can be found.

  • Task: Given a set of queries and a video collection,

participants are required to automatically identify relevant jump-in points into the video based on the combination of modalities

  • Data: Same blip.tv collection used for Genre tagging task.
  • Features: Can be derived from speech, audio, visual content
  • r metadata.
slide-17
SLIDE 17

Rich Speech Retrieval

  • Speech Act: Treat speech content in terms of `illocutionary acts’

– what speakers are accomplishing by speaking. Five acts chosen: `apology’, ‘definition’, ‘opinion’, ‘promise’ and ‘warning’.

  • Search Topics: Formed by crowdsourcing with Amazon MT.

Workers located section of video containing one of the speech acts that they would wish to share, and then form long form description and a short form query that they would use to re-find a jump-iin point to begin playback.

  • Evaluation: Evaluated as a know item search task using mGAP
  • Organizers:

Roeland Ordelman, University of Twente and B&G Maria Eskevich, Dublin City University

IISSCoS

slide-18
SLIDE 18

Social Event Detection Task

  • Motivation: Much content related to social events is being

made available in different forms. People generally think in terms of events, rather than scattered separate items.

  • Task: Discover events and detect media items that are related

to either a specific social event or an event-class of interest. The events are planned by people, attended by people and for which the social media are captured by people.

  • Data: Set of 73,645 photos was collected from Flickr using its
  • API. All geo-tagged images that were available for 5 cities:

Amsterdam, Barcelona, London, Paris and Rome.

slide-19
SLIDE 19

Social Event Detection Task

  • Challenge 1: Find all soccer events taking place in Barcelona (Spain) and

Rome (Italy) in the test collection. For each event provide all photos associated with it. – Must be soccer matches, not someone with a ball or picture of football stadium.

  • Challenge 2: Find all events that took place in May 2009 in the venue

named Paradiso (in Amsterdam, NL) and in the Parc del Forum (in Barcelona, Spain). For each event provide all photos associated with it.

  • Baseline run only using only metadata required, visual features can be used

in other runs.

  • Evaluation: Using F-score and Normalised Mutual Information
  • Organizers:

Raphael Troncy, Eurecom Vasileios Mezaris, ITI CERTH

slide-20
SLIDE 20

MediaEval 2011 Workshop

20

  • Held at Santa Croce in Fossabanda – a medieval

convent in Pisa, Italy – 1st -2nd September 2011

  • Official Workshop of InterSpeech 2011

– Unofficial satellite of ACM Multimedia 2010 – Potentially workshop at ECCV 2012

  • Nearly 60 registered participants – up from around

25 in 2010

  • 39 2-page working notes papers (13 in 2010)

– Published by CEUR.WS: http://ceur-ws.org/Vol-807/

slide-21
SLIDE 21

MediaEval 2011 Workshop

21

slide-22
SLIDE 22

MediaEval 2011 Workshop

22

slide-23
SLIDE 23

MediaEval Project Support

  • Genre Tagging Task: PetaMedia
  • Rich Speech Retrieval Task: AXES and IISSCOS with support from

PetaMedia

  • Affect Task: Violent Scenes Detection Task: PetaMedia and Quaero
  • Social Event Detection: PetaMedia, Glocal, weknowit, Chorus+
  • Placing Task: Glocal with support from PetaMedia

IISSCoS

slide-24
SLIDE 24

Special Session

MediaEval 2010 results were presented at ACM ICMR 2011 in a special session entitles “Automatic Tagging and Geo-Tagging in Video Collections and Communities”

24

slide-25
SLIDE 25

MediaEval 2012

25

  • Informal presentation of task proposals at MediaEval

2011 workshop

  • Call for formal task proposals in late 2011

– Please propose a task!

  • MediaEval 2012 online task selection questionnaire

in early 2012

– Please complete the questionnaire!

  • Tasks announced and Call for Participation in spring

2012

  • Task participation summer 2012
  • MediaEval 2012 workshop - September/October
slide-26
SLIDE 26

MediaEval

26

Thank You Questions?

http://www.multimediaeval.org