AXES KIS/INS Interactive 2011 System Overview and Evaluation Kevin - - PowerPoint PPT Presentation

axes kis ins interactive 2011
SMART_READER_LITE
LIVE PREVIEW

AXES KIS/INS Interactive 2011 System Overview and Evaluation Kevin - - PowerPoint PPT Presentation

AXES KIS/INS Interactive 2011 System Overview and Evaluation Kevin McGuinness Dublin City University Robin Aly University Twente Overview System Overview User interface System design Experiments Future System Overview


slide-1
SLIDE 1

AXES KIS/INS Interactive 2011

System Overview and Evaluation

Kevin McGuinness

Dublin City University

Robin Aly

University Twente

slide-2
SLIDE 2

Overview

 System Overview  User interface  System design  Experiments  Future

slide-3
SLIDE 3

System Overview

 Web browser-based user interface  Search using:

 Text  Images (visual similarity)  Concepts

 Text Search on Metadata and ASR

 Apache Lucene 3.1.2  Five metadata fields: title, description, keywords, subject, uploader

slide-4
SLIDE 4

System Overview

 Visual Concepts

 10 Concepts:  faces, female face, airplane, boat/ship, cityscape, singing, gender, nighttime, demonstration, playing instrument.  Subset of 5 used for INS  Pyramid histogram of visual words (PHOW) descriptor  Dense grid of VQ SIFT features at multiple resolutions  Ranked using non-linear 휒2 SVM

2 SVM

 Trained using PEGASOS stochastic gradient descent algorithm (vlfeat implementation)  Train 100K frames in ~2 mins  Classify 100K frames in ~1 min

slide-5
SLIDE 5

System Overview

 Visual Similarity Search

 Web service that accepts a URL and returns a list of visually similar images  Based on “Video Google”  Hessian-affine interest points  SIFT descriptors quantized to visual words  Text retrieval methods on visual words  Search 100K frames in < 1 sec

slide-6
SLIDE 6

System Overview

 Fusion of results

 Simple weighted combination of results from text ASR search, text metadata search, visual concept search, and image similarity search  All scores (text, concepts, similarity) normalized to [0,1] by dividing through the max score  Active concepts equally weighted  The text, concept, and similarity scores equally weighted

slide-7
SLIDE 7

User Interface

 Same user interface used for both KIS and INS tasks  Web browser-based (Google Chrome only)  Heavy emphasis on drag-and-drop

 Drag to save shots  Drag to add shots to visual similarity search

slide-8
SLIDE 8
slide-9
SLIDE 9

Query Area Similarity Search Saved Shots Results Timer

slide-10
SLIDE 10
slide-11
SLIDE 11

Video Demo

slide-12
SLIDE 12

System Design

UI Middleware LIMAS

slide-13
SLIDE 13

System Design

UI Middleware LIMAS

Responsibilities:

  • Present tasks to user
  • Allow user to

formulate query

  • Present results to

user

  • Time experiments
  • Gather results

Technologies:

  • HTML5
  • CSS3
  • Javascript
  • JQuery
  • AJAX
slide-14
SLIDE 14

System Design

UI Middleware LIMAS

Responsibilities:

  • Store topics, tasks,

example images,

  • etc. in a database
  • Assign topics to

users

  • Mediate user

queries

  • Collect saved shots

and store them in the database

  • Log user actions
  • Communicate with

KIS oracle

Technologies:

  • Python
  • Django
  • Apache/WSGI
  • SQLite 3
slide-15
SLIDE 15

System Design

UI Middleware LIMAS

Responsibilities:

  • Visual concept

indexing and search

  • Text indexing and

search

  • Communication

with Oxford Similarity search

  • Fusion of results

Technologies:

  • Java
  • Servlets
  • Tomcat
  • Apache Lucene
  • Hadoop/HBase
slide-16
SLIDE 16

System Design

UI Middleware LIMAS

Session Management Search Activity Logging

slide-17
SLIDE 17

System Design

UI Middleware LIMAS

Search Index Indexer Scripts Indexer Scripts Indexer Scripts

slide-18
SLIDE 18

Communication

UI Middleware LIMAS

AJAX HTTP POST Request Results

JSON JSON

{ ¡ ¡ ¡ ¡'action': ¡'search', ¡ ¡ ¡ ¡'text': ¡'test', ¡ ¡ ¡ ¡'concepts': ¡'Faces:Positive', ¡ ¡ ¡ ¡'images':'http://..9026.jpg', ¡ ¡ ¡ ¡'startShot': ¡0, ¡ ¡ ¡'endShot': ¡53 ¡ ¡ } ¡ { ¡"status": ¡"OK", ¡ ¡ ¡"resultCount": ¡1000, ¡ ¡ ¡"startShot": ¡0, ¡ ¡ ¡"endShot": ¡54, ¡ ¡ ¡"shots": ¡[ ¡ ¡ ¡ ¡ ¡{ ¡"uid": ¡"bbc.rushes:video_017039/keyframe_001", ¡ ¡ ¡ ¡ ¡ ¡ ¡"videoNumber": ¡17039, ¡ ¡ ¡ ¡ ¡ ¡ ¡"shotNumber": ¡1, ¡ ¡ ¡ ¡ ¡ ¡ ¡"shotId": ¡"shot17039_1", ¡ ¡ ¡ ¡ ¡ ¡ ¡"shotStartTimeSeconds": ¡0, ¡ ¡ ¡ ¡ ¡ ¡ ¡"shotEndTimeSeconds": ¡19.278, ¡ ¡ ¡ ¡ ¡ ¡ ¡"keyframeURL": ¡"http://...", ¡ ¡ ¡ ¡ ¡ ¡ ¡"thumbnailURL": ¡"http://...", ¡ ¡ ¡ ¡ ¡ ¡ ¡"videoUrls": ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡"mp4":....mp4", ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡"webm": ¡"http://....webm" ¡} ¡ ¡ ¡ ¡ ¡}, ¡ ¡ ¡ ¡ ¡… ¡

slide-19
SLIDE 19

Communication

UI Middleware LIMAS

HTTP GET Request Results

JSON

slide-20
SLIDE 20

Communication

UI Middleware LIMAS Similarity Search Service

HTTP GET Request XML Document

slide-21
SLIDE 21

Typical Interaction

UI Middleware LIMAS

  • User inputs query terms

and images and clicks “Find”

  • UI Software sends AJAX

JSON HTTP POST request to middleware

  • Middleware logs request

to database

  • Middleware sends

request to backend

  • LIMAS sends visual

similarity search

  • LIMAS performs text

search with Apache Lucene

  • LIMAS fuses results

into a single result list

  • LIMAS sends result

list in JSON format to middleware

  • Middleware logs

results to database

  • Middleware sends

results in JSON format to UI

  • UI Generates HTML

for results and displays them to the user

Similarity Search

slide-22
SLIDE 22

Experiments

 NISV Hilversum, early September  Known item search

 14 Media Professionals  10 topics each  5 minutes per topic (1 hr total)

 Instance search

 30 media students from Washington state (varying age)  6 topics each  15 minutes per topic (1.5 hr total)

slide-23
SLIDE 23

Experiments

 Before experiment…

 Participants briefed on purpose of experiment  Participants given short tutorial on UI

 After experiment…

 Participants given freeform feedback form to fill out

slide-24
SLIDE 24

The experiment setting

slide-25
SLIDE 25

KIS Experiments

 4 runs submitted

 AXES_DCU_[1-4]  Same interface and system for all runs  Different users  Each user was randomly assigned to a single run

slide-26
SLIDE 26

INS Experiments

 15 simultaneous users for INS experiments

 Latin-square method

 Some technical issues during the experiments  4 runs ordered by the recall orientation of users  Unfortunately, no other team participated

slide-27
SLIDE 27

KIS Results

slide-28
SLIDE 28

Evaluation (KIS)

run correct

2 4 6 8 10 12 14 11 3 12 2 1 4 7 5 6 8 9 10

Number of correct results found by run

slide-29
SLIDE 29

Evaluation (KIS)

run correct

2 4 6 8 10 12 14 11 3 12 2 1 4 7 5 6 8 9 10

Number of correct results found by run AXES runs

slide-30
SLIDE 30

Evaluation (KIS)

run correct

2 4 6 8 10 12 14 11 3 12 2 1 4 7 5 6 8 9 10

Number of correct results found by run AXES best run: 11/25

slide-31
SLIDE 31

Evaluation (KIS)

run correct

2 4 6 8 10 12 14 11 3 12 2 1 4 7 5 6 8 9 10

Number of correct results found by run AXES worst run: 9/25

slide-32
SLIDE 32

topic correct

2 4 6 8 10 12 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524

Evaluation (KIS)

Number of correct results found by topic Everybody found 501 and 508

slide-33
SLIDE 33

topic correct

2 4 6 8 10 12 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524

Evaluation (KIS)

Number of correct results found by topic Everybody found 501 and 508 Nobody found 503, 505, 513, 515, 516, and 520

slide-34
SLIDE 34

Evaluation (KIS)

topic mean time (mins)

1 2 3 4 500 501 502 504 505 507 508 509 510 511 512 514 517 519 521 523 Runs AXES Other

Mean time to find the correct video by topic (Topics where the correct answer was not found by any AXES runs are not shown)

slide-35
SLIDE 35

Evaluation (KIS)

time count

2 4 6 8 10 1 2 3 4 5

Histogram of time taken to find the correct video (all runs) 19/41 (46%) of videos found were found in first minute 31/41 (75%) of videos found were found in first 2.5 minutes

slide-36
SLIDE 36

INS Results

slide-37
SLIDE 37

Evaluation (INS)

run precision recall MAP bpref rel non-rel 1 0.74 0.36 0.33 0.34 26.40 8.68 2 0.73 0.28 0.26 0.27 20.80 5.60 3 0.81 0.26 0.25 0.25 18.76 3.12 4 0.81 0.21 0.21 0.21 14.76 2.68

slide-38
SLIDE 38

Evaluation (INS)

 Per topic comparison

slide-39
SLIDE 39

Evaluation (INS)

count topic

9023 9024 9025 9026 9027 9028 9029 9030 9031 9032 9033 9034 9035 9036 9037 9038 9039 9040 9041 9042 9043 9044 9045 9046 9047 1 20 40 60 80 100 120 2 20 40 60 80 100 120 3 20 40 60 80 100 120 4 20 40 60 80 100 120

slide-40
SLIDE 40

Evaluation Summary

 Large variation in user performance!

 For KIS a combined run containing our best performing users would have found 16/25 videos  Only 5/25 topics were found by all of our users

 Large variation in topic difficulty

 Six topics found by no submitted run  Two topics found by all submitted runs  One topic only found by one submitted run

 Similar results from INS experiments

slide-41
SLIDE 41

Feedback

 Users liked UI design and drag and drop based interaction mechanism  Participants would have preferred to be able to adjust video size  Professional users were unclear if Boolean search could be used  Participants would like the system to give better hints on why a video was judged by the system to be relevant

 Some remarked they did not know how the system worked and were not able to learn the system to adjust their search strategy

slide-42
SLIDE 42

Feedback

 Users seemed to enjoy the task and the system   Lots of users said they wanted visual similarity search

 Although, visual similarity was used less in the KIS task

 People used the visual concepts  Got some great feedback from users

 Excellent resource for building the future systems

slide-43
SLIDE 43

Experiences

 Text is very important for KIS

 If the metadata/ASR had some text that described the video, users usually found the correct one.  If there was no good metadata or ASR that matched the query topic, it’s very hard to find the video using concepts and visual similarity alone

slide-44
SLIDE 44

Conclusions

 Participation of AXES in the KIS & INS Task  Simple Fusion Approach of Similarity, Concepts and ASR  Known-item search

 14 media professionals participated  Median performance (MAP)

 Instance search

 30 media students from Washington participated  Only task participant in INS

 Users were positive about possibilities

slide-45
SLIDE 45

Future

 TRECVid 2012

 Improve fusion  UI enhancements based on user feedback  Pre-clustering results on video

slide-46
SLIDE 46

Questions?