Cross Language Image Retrieval ImageCLEF 2010 Henning Mller 1 , - - PowerPoint PPT Presentation

cross language image retrieval imageclef 2010
SMART_READER_LITE
LIVE PREVIEW

Cross Language Image Retrieval ImageCLEF 2010 Henning Mller 1 , - - PowerPoint PPT Presentation

Cross Language Image Retrieval ImageCLEF 2010 Henning Mller 1 , Theodora Tsikrika 2 , Steven Bedrick 3 , Barbara Caputo 4 , Henrik Christensen 5 , Marco Fornoni 4 , Mark Huiskes 6 , Jayashree Kalpathy-Cramer 3 , Jana Kludas 7 , Stefanie Nowak 8


slide-1
SLIDE 1

Cross Language Image Retrieval ImageCLEF 2010

Henning Müller1, Theodora Tsikrika2, Steven Bedrick3, Barbara Caputo4, Henrik Christensen5, Marco Fornoni4, Mark Huiskes6, Jayashree Kalpathy-Cramer3, Jana Kludas7, Stefanie Nowak8, Adrian Popescu9, Andrzej Pronobis10

1 University of Applied Sciences Western Switzerland (HES-SO), Sierre, Switzerland 2 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

3 Oregon Health and Science University (OHSU), Portland, OR, USA

4 Idiap Research Institute, Martigny, Switzerland 5 Georgia Institute of Technology, Atlanta, USA 6 Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands

7University of Geneva, Switzerland

8 Fraunhofer Institute for Digital Media Technology, Ilmenau, Germany

9 Institut Telecom/Telecom Bretagne, Brest, France

10 Centre for Autonomous Systems, The Royal Institute of Technology, Stockholm, Sweden

slide-2
SLIDE 2

ImageCLEF History

slide-3
SLIDE 3

ImageCLEF 2010

  • General overview
  • news, participation, management
  • Tasks
  • Medical Image Retrieval
  • Wikipedia Retrieval
  • Photo Annotation
  • Robot Vision
  • Conclusions
slide-4
SLIDE 4

News - ImageCLEF Book!

ImageCLEF: Experimental Evaluation in Visual Information Retrieval

The Information Retrieval Series, Vol. 32

Müller, H.; Clough, P.; Deselaers, Th.; Caputo, B. (Eds.)

1st Edition., 2010, 495 pages

Contents

  • Basic concepts (6 chapters)
  • history, datasets, topic development, relevance

assessments, evaluation, fusion approaches

  • Task reports (7 chapters)
  • Participants' reports (11 chapters)
  • External perspectives on ImageCLEF

(3 chapters)

slide-5
SLIDE 5

News - ImageCLEF @ ICPR!

  • ImageCLEF contest @ ICPR 2010
  • ICPR: major event in pattern recognition (Aug 2010)
  • ImageCLEF contest: Oct 2009 - April 2010
  • ImageCLEF 2009 test collections
  • 4 tasks
  • photo annotation
  • robot vision
  • information fusion for medical image retrieval
  • interactive photo retrieval (showcase event)
  • 76 registrations, 30 submitted results, 14 presented
  • Half had not previously participated at ImageCLEF
  • Largest contest at ICPR!
slide-6
SLIDE 6

News - ImageCLEF 2010

  • Medical Image Retrieval
  • new subtask: modality detection
  • larger image collection, more case-based topics
  • Wikipedia Retrieval
  • new, larger image collection
  • multilingual annotations and topics
  • Wikipedia articles containing the images provided
  • Photo Annotation
  • new concepts added
  • crowdsourcing for image annotation
  • multi-modal approaches
  • Robot Vision
  • new image collection
  • unknown places as category
slide-7
SLIDE 7

Participation

  • Total:
  • 2010: 112 groups registered, 47 submitted results
  • 2009: 84 groups registered, 40 submitted results
  • Tasks
  • Medical Image Retrieval: 16 groups
  • Wikipedia Retrieval: 13 groups
  • Photo Annotation: 17 groups
  • Robot Vision: 7 groups
slide-8
SLIDE 8

ImageCLEF Management

  • Online management system for participants
  • registration, collection access, result submission
  • ImageCLEF web site
  • Unique access point to all information on tasks & events
  • Access to test collections from previous years
  • Use of content-management system so that all 15
  • rganisers can edit directly
  • Very appreciated!!
  • 2000-3000 unique visits per month
  • >10,000 page views
  • very international access

http://www.imageclef.org/

slide-9
SLIDE 9

Medical Image Retrieval Task

slide-10
SLIDE 10

Tasks proposed

  • Modality detection task
  • purely visual task, training set with modalities given
  • one of seven modalities had to be assigned to all images
  • Image-based retrieval task
  • clear information need for a single image
  • topics are based on a survey, 3 languages, example images
  • Case-based retrieval task
  • full case description from teaching file as example but

without diagnosis, including several image examples

  • unit for retrieval is a complete case or article, closer to

clinical routine

slide-11
SLIDE 11

Setup

  • Database with journal articles and 77,506 images
  • very good annotations
  • all in English
  • Image-based topics generated from a survey among

clinicians using a retrieval system

  • OHSU, Portland, OR
  • selection based on available images
  • Case-based topics used a teaching file as source
  • Relevance judgements performed by clinicians in Portland

OR, USA

  • double judgements to control behaviour and compare

ambiguity

  • several sets of qrels, but ranking remains stable
slide-12
SLIDE 12

Participation

  • 51 registrations, 16 groups submitting results
  • AUEB (Greece)
  • Bioingenium (Colombia)*
  • Computer Aided Medical Procedures (Germany)*
  • Gigabioinforamtics (Belgium)*
  • IRIT (France)
  • ISSR (Egypt)
  • ITI, NIH (USA)
  • MedGIFT (Switzerland)
  • OHSU (USA)
  • RitsMIP (Japan)*
  • Sierre, HES--SO (Switzerland)
  • SINAI (Spain)
  • UAIC (Romania)*
  • UESTC (China)*
  • UIUC-IBM (USA)*
  • Xerox (France)*
  • *=new groups
  • Fusion task at ICPR with another five participants
slide-13
SLIDE 13

Example of a case-based topic

Immunocompromised female patient who received an allogeneic bone marrow transplantation for acute myeloid leukemia. The chest X-ray shows a left retroclavicular opacity. On CT images, a ground glass infiltrate surrounds the round opacity. CT1 shows a substantial nodular alveolar infiltrate with a peripheral anterior air crescent. CT2, taken after 6 months of antifungal treatment, shows a residual pulmonary cavity with thickened walls.

slide-14
SLIDE 14

Results

  • Modality detection task (purely visual) was very popular
  • performance of over 90% very high
  • CT and MRI are mixed (hard task)
  • Text-based retrieval is much better than visual retrieval for

both image and case-based topics

  • difference lower for the case-based topics
  • more research on visual techniques has to be fostered
  • Early precision can be improved using visual techniques
  • Fusion of visual and textual retrieval remains hard but can

improve performance

  • fusion works really well when different systems are used
  • Interactive and feedback is rarely used
  • ICPR session on interactive retrieval was even cancelled
slide-15
SLIDE 15

Wikipedia Retrieval Task

slide-16
SLIDE 16

Wikipedia Retrieval Task

  • History:
  • 2008-2009: wikipediaMM task @ ImageCLEF
  • 2006-2007: MM track @ INEX
  • Description:
  • ad-hoc image retrieval
  • collection of Wikipedia images
  • large-scale
  • heterogeneous
  • user-generated multilingual annotations
  • diverse multimedia information needs
  • Aim:
  • investigate mono-media and multi-modal retrieval approaches
  • focus on fusion/combination of evidence from different modalities
  • attract researchers from both text and visual retrieval communities
  • support participation through provision of appropriate resources
slide-17
SLIDE 17

Wikipedia Retrieval Collection

  • Image collection
  • 237,434 Wikipedia images
  • wide variety, global scope
  • Annotations
  • user-generated
  • highly heterogeneous, varying length, noisy
  • semi-structured
  • multi-lingual (English, German, French )
  • 10% images with annotations in 3 languages
  • 24% images with annotations in 2 languages
  • 62% images with annotations in 1 language
  • 4% images with annotations in unidentified language or no annotations
  • Wikipedia articles containing the images in the collection
  • Low-level features provided by CEA-LIST
  • cime : border/interior classification algorithm
  • telp: texture + colour
  • bag of visual words
slide-18
SLIDE 18

Wikipedia Retrieval Collection

slide-19
SLIDE 19

Wikipedia Retrieval Collection

slide-20
SLIDE 20

Wikipedia Retrieval Collection

slide-21
SLIDE 21

Wikipedia Retrieval Collection

slide-22
SLIDE 22

Wikipedia Retrieval Topics

  • range from easy (eg. 'postage stamps')

to difficult highly semantic topics (e.g. 'paintings related to cubism')

  • challenging for current state-of-the-art

retrieval algorithms

Number of topics 70 average # of images/topic 1.68 average # of terms/topic 2.7 average # of relevant images 252.3

<topic> <number> 68 </number> <title xml:lang="en"> historic castle <title> <title xml:lang="de"> historisches schloss <title> <title xml:lang="fr"> château fort historique <title> <image> 3691767116_caa1648fee.jpg </image> <image> 4155315506_545e3dc590.jpg </image> <narrative> We like to find pictures of historic castles. The castle should be of the robust, well-fortified kind. Palaces and chateaus are not relevant. </narrative> </topic>

slide-23
SLIDE 23

Wikipedia Retrieval Participation

  • 43 groups registered
  • 13 groups submitted a total of 127 runs

48 textual 7 visual 72 mixed 41 monolingual 79 multilingual 23 relevance feedback 18 query expansion 1 QE + RF

slide-24
SLIDE 24

Wikipedia Retrieval Results

Conclusions:

  • best performing run: a multi-modal, multi-lingual approach
  • 8 groups with mono-media and multi-modal runs
  • for 4 groups multi-modal runs outperform mono-media runs
  • combination of modalities remains a challenge
  • many (successful) query/document expansion submissions
  • topics with named entities are easier and benefit from textual approaches
  • topics with semantic interpretation and visual variation are more difficult
slide-25
SLIDE 25

Photo Annotation Task

slide-26
SLIDE 26

Task Description

  • Automated Annotation of 93 visual concepts in photos
  • Flickr photos based on interestingness (MIR Flickr Set):
  • Trainingset:

8,000 photos + Flickr User Tags + EXIF data + GT

  • Testset:

10,000 photos + Flickr User Tags + EXIF data

  • 3 Configurations:
  • Textual information (EXIF tags, Flickr User Tags)
  • Visual information (photos)
  • Multi-modal information (all)
  • Evaluation:
  • Average Precision (AP)
  • Example-based F-measure (F-ex)
  • Ontology Score with Flickr Context Similarity (OS-FCS)
slide-27
SLIDE 27

Amazon Mechanical Turk

  • Online marketplace that

distributes mini-jobs to a crowd of workers (turkers)

  • Requester asks to complete

Human Intelligence Task (HIT)

  • Guidelines for turkers
  • Annotation of 41 new

concepts at MTurk

  • Design of 4 HIT templates
  • Majority vote of 3 opinions
slide-28
SLIDE 28

Example Image & Metadata

User Tags:

  • berlin
  • germany
  • Travel

EXIF Tags:

  • Date and Time (Original):

2008:03:14 10:27:50

  • Focal Length: 6.3 mm
  • Color Space: sRGB
  • Pixel X-Dimension: 800
  • Pixel Y-Dimension: 450

…. GT: Family Friends, Building, Autumn, Outdoor, Plants, Trees, Sky, Clouds, Water, River, Day, Neutral Illumination, Partly Blurred, Small Group, Travel, Visual Arts, natural, cute, female, male, Adult

slide-29
SLIDE 29

Participation

  • 54 groups registered
  • 41 groups signed licence

agreement

  • 17 groups participated
  • 6 new teams
  • Submission of 63 runs:
  • 45 visual runs
  • 2 textual runs
  • 16 multi-modal runs
slide-30
SLIDE 30

Results: Visual

TEAM RANK MAP RANK F-ex RANK OS-FCS

ISIS 10.407 10.680 80,601 XRCE 60.390 60.639 10,645 LEAR 90.364 150.582 280,387 HHI 110.350 70.634 20,64 IJS 150.334 140.596 100,595 BPACAD 200.283 300.428 270,439 Romania 210.259 220.531 150,562 INSUNHIT 230.237 380.209 310,372 LSIS 240.234 230.530 190,536 LIG 300.225 270.477 200,53 MEIJI 310.222 180.559 340,363 WROCLAW 340.189 260.482 300,379 MLKD 400.177 370.224 370,359 UPMC 420.148 430.174 400,348 CEALIST 430.147 290.451 260,458

  • ISIS, XRCE close

results in concept- based evaluation: 1.7% difference

  • Bigger gap in

example-based evaluation: 4.1% and 4.4%

slide-31
SLIDE 31

Results: Textual & Multi-modal

TEAM RANK MAP RANK F-ex RANK OS-FCS

XRCE 10.455 10.655 10.657 LEAR 30.437 30.602 50.411 MEIJI 60.326 60.573 30.428 CNRS 90.296 90,.351 40.421 MLKD 140.235 130.257 120.379 UPMC 150.182 150.186 150.351

TEAM RANK MAP RANK F-ex RANK OS-FCS

MLKD 10.234 10.260 10.368 DCU 20.228 20.178 20.304

Textual: Multi-modal: Textual: Concept-based eval: close results Example-based eval: significant differences Multi-modal: Significant differences for OS-FCS measure

slide-32
SLIDE 32

Conclusions

  • Strong participation with 17 teams from 11 countries
  • Best runs:
  • Average annotation per concept: 0.48 MAP
  • Multi-modal runs always outperformed visual or textual

runs for teams that submitted in several configurations

MAP F-ex OS-FCS

Multi-modal 0.455 0.66 0.66 Visual 0.407 0.68 0.65 Textual 0.234 0.26 0.37

slide-33
SLIDE 33

Robot Vision Task

slide-34
SLIDE 34

Robot Vision - Aim

  • Aim: visual place recognition for robot topological localization

Canon VC-C4 camera SICK LMS200 Laser Scanner

slide-35
SLIDE 35

Robot Vision - Data

  • Sequences of images acquired using mobile robot platform
  • Divided into training / validation / testing sequences
  • Training/validation sequences acquired within rooms and

functional areas of an office environment

  • Testing sequence: additional new rooms in same building
  • Images labeled with the room ID based on the robot position
slide-36
SLIDE 36

Robot Vision - Data

slide-37
SLIDE 37

Robot Vision - Task

  • Task
  • Determine the topological location of a robot for each image in a

single unlabeled test image sequence

  • Participants asked to:
  • classify correctly known rooms/functional areas
  • detect new rooms, not seen during training
  • Training/Validation:
  • 11 types of rooms/areas
  • 6th/5th floor of the Computer Science Department at KTH, Stockholm
  • under cloudy weather
  • Test: 14 types of rooms/areas on the 7th floor of the same building in

similar conditions

  • Two sub-tasks:
  • Obligatory - classify each image independently
  • Optional - exploit continuity of the image sequence
  • Score based on the number of correctly classified images
slide-38
SLIDE 38

Robot Vision - Participants

  • 7 groups submitted results
  • CVG: Computer Vision and Geometry laboratory, ETH Zurich,

Switzerland

  • Idiap-MULTI: The Idiap Research Institute, Martigny,

Switzerland

  • NUDT: Department of Automatic Control, College of

Mechatronics and Automation, National University of Defense Technology, Changsha,China.

  • Centro Gustavo Stefanini, La Spezia, Italy.
  • CAOR, France.
  • DYNILSIS: Univ. Sud Toulon Var, France.
  • UAIC2010: Facultatea de Informatica, Universitatea Al. I.

Cuza, Romania.

slide-39
SLIDE 39

Robot Vision - Results

  • 55 runs submitted in total
  • 42 obligatory runs
  • 13 runs submitted to the optional task
slide-40
SLIDE 40

Robot Vision - Conclusions

  • The 3 editions of the Robot Vision task attracted

considerable attention

  • An interesting complement to the existing tasks
  • Diverse and original approaches to the place recognition

problem

  • Local-feature based approaches dominate
  • Future
  • Introducing new challenges (categorization)
  • Adding new sources of information about the

environment

slide-41
SLIDE 41

ImageCLEF 2010 Conclusions

  • Increasing interest and participation!
  • Larger scale collections
  • Multi-linguality reinforced
  • Multi-modal approaches becoming more

effective, but fusion remains a challenge

  • Several ideas for next year!
  • What do you expect?
  • What are our ideas?
  • What data are available?
  • Fill in the survey www.imageclef.org/survey
slide-42
SLIDE 42

Thank you!