[PPT] - M4 WP3 Multimodal integration Progress report Viper group PowerPoint Presentation

SLIDE 1

M4 – WP3 Multimodal integration

Progress report

Viper group Computer Vision and Multimedia Lab University of Geneva 30-01-03

SLIDE 2

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

2

Progress report

UniGE

Information retrieval setup / extension Video data processing Information management framework

WP3:

Issues Status – deliverable

SLIDE 3

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

3

Information retrieval setup (initial)

Segmentation Event definition A/V/text input URLisation SQL DB Time codes URLs Characterisation Feature definition Feature files GIFT indexing Keyframes Index file GIFT Text QBE query Text query Interface MRML Query client

SLIDE 4

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

4

Information retrieval setup (planned)

Segmentation Event definition A/V/text input URLisation SQL DB Time codes URLs Characterisation Feature definition Feature files GIFT indexing Keyframes Index file Text QBE query Text query Interface MRML Query client Text Audio query GIFT

SLIDE 5

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

5

Video processing (1)

OVAL :Video Access Library C++ Video Object Model Accepts plugin for specific formats

MPEG-1 : Dali from Cornell LibDV, « XML » video plugin

Provides a generic API

Open, Close, GetProp stream GetFrame(s) Specific (MPEG: getMV, getDCT)

Do not accomodate Image Processing functionalities

Use of Matlab Mex with persistent memory

SLIDE 6

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

6

Video Processing (2)

Video segmentation

Classical techniques Based on spatio-temporal features (ongoing)

Mixed colour/motion information

Need to be extended to event-based segmentation

Integration of M4 features

Video characterisation

Estimation on feature pattern model (motion) Support Vector Regression

Non-linear Prediction of Chaotic Times Series using SVM, NNSP’97 (Mukherjee, Osuna, Girosi) Predicting Time Series with SVM, ICANN’97 (Muller, Smola, Schölkopf, Vapnik)

SLIDE 7

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

7

Video Similarity Measure

Problems: S(V1 , V1) ≠ 0 S(V1 , V2) ≠ S(V2 , V1) Artificial symetrization D (V1 , V2) = 0.5*[S(V1 , V2) + S(V2 , V1) ]

) ( 1 ) , (

2 2 1

1 V

E V V S

V

− =

SLIDE 8

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

8

Video Classification

Distance matrix computed with prediction error D(Vi , Vj )

For all pair of video <i,j> in the given database

Di,j = D(Vi , Vj )

Curvilinear Component Analysis is applied on D

⇒ gives a 2-dimensionnal mapping of the feature space

SLIDE 9

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

9

Preliminary experiment

29 video shots containing mainly Tv news and sport activities

SLIDE 10

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

10

SLIDE 11

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

11

Ongoing…

Text retrieval

Inclusion within GIFT Multimodal embedding (visual+text query) Query expansion (eg using WordNet)

Event characterisation

High level model Feature-based inference ⇒Characterisation of well-known events ⇒Suitable for restricted contexts (M4)

SLIDE 12

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

12

Information management

MRML : Going toward version 2.0

More multimedia More like an XML protocol (as defined by W3C - XMLP) Trully multimedia / multimodal ⇒ Spec proposal release mid-Feb ⇒ Expected validation software: this summer

DEVA (Annotation model)

Based on RDF and Dublin Core (XML) DAML+OIL (OWL) compatible Makes existing software available (Xerces, Jena,…) Allows multiple extensions (WordNet,…)

SLIDE 13

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

13

WP3: Initial work plan

SLIDE 14

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

14

WP3: Delivrables

D3,1: Report on baseline information access methods

m12 (Feb 2003) Technical doc of the working system in place

D3,2: Report on methods for multimodal integration and NLP

m24 (Feb 2004) Define intuitive way for meeting data querying and retrieval

D3,3: Final report on multimodal information access

m36 (Feb 2005) Technical doc of the meeting manager

SLIDE 15

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

15

D3.1

Gathered basic information

Group-based

Template sent by next week

Activity-based Description of what you can contribute in one field

Response by Feb 20th

Fill in where you feel is relevant

Edited by End of Feb

Smoothed out gaps…

Sent to Steve by Mid March

SLIDE 16

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

16

WP3: Issues

Visual data is not usable alone

Need for text transcitps Use of « external » data

Need for common format for data exchange

Annotation (explicit) Processing results

Increase collaboration

Integration

SLIDE 17

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

17

WP3 breakdown

Year 1 (-> 03/2003)

Emphasis on multimedia information processing and retrieval Image, Video : Visual + Motion Audio (speech), Text Framework: Architecture, integration

Year 2: (-> 03/2004)

Emphasis on multimodal interaction (query processing) Information from text, speech (text?), gesture,... Natural language processing

Year 3: (-> 03/2005)

Emphasis on data summarisation Video, dialogs, documents

SLIDE 18

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

18

????

SLIDE 19

S. Marchand-Maillet http://viper.unige.ch/ M4 meeting #3, Sheffield, UK, January 2003

19

CBIR server

T CP / IP

Client

Multimedia data

http server

…

soc ket

MRML

layer QBE query formulator (eg PHP interface) Existing tool so ck et

MRML

layer Tool plugin (eg GIMP plugin)

MRML layer

so ck et Assessor (eg Viper evaluation script) Op en soc ket

GIFT

plugins M R M L PluginX PluginY

…

Multimedia feature storage

MRML logging

Multimedia data

Online Offline

Feature extraction

…

fe at ur es

URL abstraction (temporary local copy) Queries Response Relevance feedback

M4 – WP3 Multimodal integration

Progress report

Progress report

UniGE

Information retrieval setup / extension Video data processing Information management framework

WP3:

Issues Status – deliverable

Information retrieval setup (initial)

Information retrieval setup (planned)

Video processing (1)

OVAL :Video Access Library C++ Video Object Model Accepts plugin for specific formats

MPEG-1 : Dali from Cornell LibDV, « XML » video plugin

Provides a generic API

Open, Close, GetProp stream GetFrame(s) Specific (MPEG: getMV, getDCT)

Do not accomodate Image Processing functionalities

Use of Matlab Mex with persistent memory

Video Processing (2)

Video segmentation

Video characterisation

Video Similarity Measure

Problems: S(V1 , V1) ≠ 0 S(V1 , V2) ≠ S(V2 , V1) Artificial symetrization D (V1 , V2) = 0.5*[S(V1 , V2) + S(V2 , V1) ]

) ( 1 ) , (

E V V S

− =

Video Classification

Distance matrix computed with prediction error D(Vi , Vj )

For all pair of video <i,j> in the given database

Di,j = D(Vi , Vj )

Curvilinear Component Analysis is applied on D

⇒ gives a 2-dimensionnal mapping of the feature space

Preliminary experiment

29 video shots containing mainly Tv news and sport activities

Ongoing…

Text retrieval

Inclusion within GIFT Multimodal embedding (visual+text query) Query expansion (eg using WordNet)

Event characterisation

High level model Feature-based inference ⇒Characterisation of well-known events ⇒Suitable for restricted contexts (M4)

Information management

MRML : Going toward version 2.0

DEVA (Annotation model)

WP3: Initial work plan

WP3: Delivrables

D3,1: Report on baseline information access methods

D3,2: Report on methods for multimodal integration and NLP

D3,3: Final report on multimodal information access

D3.1

Gathered basic information

Group-based

Template sent by next week

Activity-based Description of what you can contribute in one field

Response by Feb 20th

Fill in where you feel is relevant

Edited by End of Feb

Smoothed out gaps…

Sent to Steve by Mid March

WP3: Issues

Visual data is not usable alone

Need for text transcitps Use of « external » data

Need for common format for data exchange

Annotation (explicit) Processing results

Increase collaboration

Integration

WP3 breakdown

Year 1 (-> 03/2003)

Year 2: (-> 03/2004)

Year 3: (-> 03/2005)

????

The framework