Towards Cross-Media Information Extraction Thierry D Thierry - - PowerPoint PPT Presentation

towards cross media information extraction
SMART_READER_LITE
LIVE PREVIEW

Towards Cross-Media Information Extraction Thierry D Thierry - - PowerPoint PPT Presentation

Towards Cross-Media Information Extraction Thierry D Thierry Declerck, clerck, DFKI DFKI Language Technology Lab Presenting a collection of slides with contribtutions by Paul Buitelaar, Michael Sintek, Malte Kiesel (all DFKI), Manuel


slide-1
SLIDE 1

Towards Cross-Media Information Extraction

Thierry D Thierry Declerck, clerck, DFKI DFKI Language Technology Lab Presenting a collection of slides with contribtutions by Paul Buitelaar, Michael Sintek, Malte Kiesel (all DFKI), Manuel Alcantara (UAM), Jan Nemrava (VSE) , David Sadlier (DCU) and others

slide-2
SLIDE 2

Slide 2

FEAST Talk; 28.01.2009

K-Space (2006-2008)

K-Space Overview: A Network of Excellence im 6th Framework (see http://www.k-space.eu/

  • K-Space -- Knowledge Space of Shared Technology and

Integrative Research to Bridge the Semantic Gap:

  • K-Space was a network of research teams from academia and industry to

conduct integrative research and dissemination activities in semantic inference for (semi-)automatic annotation and retrieval of multimedia content.

  • K-Space is aiming at closing the gap between low-level content descriptions that

can be computed automatically by machines and the richness and subjectivity of semantics in high-level human interpretations of audiovisual media.

  • Our Work in K-Space
  • Adding Semantic Metadata to Audio-Video Material by Automatic

Analysis of Complementary Source

  • Cross-Media Ontologies
  • Cross-Media Knowledge Extraction
slide-3
SLIDE 3

Adding Semantic Metadata to Adding Semantic Metadata to Audio-Video Material by Audio-Video Material by Automatic Analysis of Automatic Analysis of Complementary Sources Complementary Sources

Data we deal with in K-Space and other projects

slide-4
SLIDE 4

Slide 4

FEAST Talk; 28.01.2009

Concepts from visual Analysis

=> Sky, Sea, Sand, Person Contribution of possibly associated text?

  • Named Entities
  • Event (walking, swimming)
  • Location (Greece?)
  • Date/Time
  • Background knowledge
  • Etc.

Picture: Courtesy of Yiannis Kompatsiaris and Stamatia Dasiopoulou

slide-5
SLIDE 5

Slide 5

FEAST Talk; 28.01.2009

Still images and Text in Webpages (Esperonto)

slide-6
SLIDE 6

Slide 6

FEAST Talk; 28.01.2009

Relevant Text Regions

Title of the document Caption text: „Click on the image to enlarge“ (a non relevant item, to be filtered, also on the base of lexical properties of the words). Content of the HTML „Alt“ tag: “'VEGETABLE GARDEN WITH DONKEY'” Content of the HTML „Src“ tag: http://www.spanisharts.com/reinasofia/miro/burro _lt.jpg Abstract text Running text

slide-7
SLIDE 7

Slide 7

FEAST Talk; 28.01.2009

Linguistic Analysis of the Text Regions

  • „Alt“ text: 'VEGETABLE GARDEN WITHDONKEY'

<NP HEAD=“garden” PRE_MOD=“vegetable” <POST_MOD CAT= “PP” HEAD=“with” NP_COMP_HEAD=“donkey”</POST_MOD> </NP>

  • Abstract/Running text: “…This picture depicts the rural landcape of Montroig …”

<SENT SUBJ=“This picture” PRED=“depicts” OBJ=“the rural lansdscape of Montroig”</SENT>

  • Detailled annotation of the direct_object:

<NP HEAD=“landscape” PRE_MOD=“rural” <POST_MOD CAT=“PP” HEAD=“of” NP_COMP_HEAD=“Montroig”</POST_MOD> </NP>

slide-8
SLIDE 8

Slide 8

FEAST Talk; 28.01.2009

The Semantic Annotation (1)

The Toy Artwork Ontology (schematized)

Object > Arwtork > Painting [has_creator, has_name, has_subject, has_dimension,has_material, has_genre, has_date...] Person > Artist > Painter [has_name, has_birth_date, part_of_artistic_movement …]

slide-9
SLIDE 9

Slide 9

FEAST Talk; 28.01.2009

The Semantic Annotation (2)

The Instantiation

Title: Vegetable garden with donkey Creator: Miro Date: 1918 Genre: naïve (if correctly extracted by some reasoning on the linguistically and semantically annotated text) Subject: rural landscape of Montroig + garden and donkey (if the association between the title and the explanation given by the art expert can be grouped). Dimension: 65x71 Material: Oil on canvas

slide-10
SLIDE 10

Slide 10

FEAST Talk; 28.01.2009

TRECVid: Linguistic Analysis of Transcripts attached to Video

Language Analysis can provide till a certain extent for structured analysis

  • f transcripts delivered with the TRECVid shots

Identify the Part-of-Speech (POS) of words contained in the “freetextannotation” MPEG-7 tags of the shots Identify Named Entities in those annotation and structure them Identify Phrasal Structures in those annotations

slide-11
SLIDE 11

Slide 11

FEAST Talk; 28.01.2009

Example of Transcript attached to Video

<VideoSegment id="shot6_68"> <MediaTime> <MediaTimePoint>T00:10:41:5216F30000</MediaTimePoint <MediaDuration>PT13S26416N30000F</MediaDuration> </MediaTime> <TextAnnotation confidence="0.500000"> <FreeTextAnnotation>FACILITIES OFTEN BEYOND THE REACH OF THE AVERAGE FABRICATE ARE ARE MADE AVAILABLE THROUGH THE SERVICE INCLUDING PRESS EQUIPMENT CAPABLE OF HANDLING THE LARGEST ALUMINUM DRAWS EVER MADE PLUS A WORK FORCE OF SKILLED </FreeTextAnnotation> </TextAnnotation> </VideoSegment>

slide-12
SLIDE 12

Slide 12

FEAST Talk; 28.01.2009

Extracted Transcripts

shot6_28 THIS MOST REMARKABLE METAL USE ITS TREMENDOUS ABUNDANCE AS A RAW MATERIAL IN AMERICA THE RICHEST OF THESE COMMERCIAL GRADE ALUMINUM DEPOSITS ARE LOCATED IN THE CENTRAL REGION OF ARKANSAS ALTHOUGH TRACES OF ALUMINUM MAY BE FOUND IN ALMOST ANY SOIL ONLY THOSE PLAYS CONTAINING FIFTY OR SIXTY PERCENT ALUMINUM WAR AND KNOWN AS BLOCK SITE ARE MINED FOR COMMERCIAL FORGOT AND HERE. shot6_45 INCREASING AMOUNT OF THIS A LUMINA IS BEING USED IN CHEMICAL PROCESSING IN SOIL CONDITIONERS AND ABRASIVE US AND MANY OTHER IN ORDER TO REDUCE THE LUMINA TO SOLID ALUMINUM IT'S TRANSFERRED TO. shot6_80 THE USE OF ALUMINUM IS EQUALLY EFFECTIVE INSIDE AS WELL AS OUT FROM TABLE LAMP TO A COAST TO COAST CEILING ALUMINUM CONTRIBUTES TO A DECORATIVE MODERN TOUCH TO OFFICE AND HOME BECAUSE. shot6_81 ALUMINUM REFLECTS UP TO NINETY FIVE PERCENT OF ALL RADIANT HEAT AND EFFECTIVELY STOPS MOISTURE IT FINDS EXTENSIVE YOU'LL SEE IN ALL TYPES OF INSULATION THESE SAME REFLECTIVE. shot7_1 IN THE FIELD OF HOME APPLIANCES FOR DECORATIVE AS WELL AS FUNCTIONAL USES ALUMINUM HAS NO EQUAL. ……..

slide-13
SLIDE 13

Slide 13

FEAST Talk; 28.01.2009

POS Tagging of Transcripts

shot6_28 THIS MOST REMARKABLE METAL USE ITS TREMENDOUS ABUNDANCE AS A RAW MATERIAL IN AMERICA THE RICHEST OF THESE COMMERCIAL GRADE ALUMINUM DEPOSITS ARE LOCATED IN THE CENTRAL REGION OF ARKANSAS ALTHOUGH TRACES OF ALUMINUM MAY BE FOUND IN ALMOST ANY SOIL ONLY THOSE PLAYS CONTAINING FIFTY OR SIXTY PERCENT ALUMINUM WAR AND KNOWN AS BLOCK SITE ARE MINED FOR COMMERCIAL FORGOT AND HERE. <text> <token id="1" pos="CARD">shot6_28</token> <token id="2" pos="DT" lemma="this">THIS</token> <token id="3" pos="JJS" lemma="most">MOST</token> <token id="4" pos="JJ" lemma="remarkable" morph="3">REMARKABLE</token> <token id="5" pos="NN" lemma="metal" morph="1">METAL</token> <token id="6" pos="VB" lemma="use" morph="10">USE</token> <token id="7" pos="PRP$" lemma="its">ITS</token> <token id="8" pos="JJ" lemma="tremendous" morph="3">TREMENDOUS</token> <token id="9" pos="NN" lemma="abundance" morph="1">ABUNDANCE</token> <token id="10" pos="NNP" lemma="as" morph="1">AS</token> <token id="11" pos="DT" lemma="a" morph="24">A</token> <token id="12" pos="JJ" lemma="raw" morph="3">RAW</token> <token id="13" pos="NN" lemma="material">MATERIAL</token> <token id="14" pos="IN" lemma="in">IN</token>

slide-14
SLIDE 14

Slide 14

FEAST Talk; 28.01.2009

Problems with Transcripts and possible Remedies

  • Problems
  • Noise (context of recording)
  • Quality of the underlying ASR tools
  • Transcripts all in Capital letters
  • Lack of punctuation signs
  • Possible Remedies
  • Use of manually annotated speech corpora for improving POS

tagging

  • Use of related textual sources for improving lexical coverage and

syntactic boundaries

  • Use of related domain knowledge bases and metadata for improving

lexical coverage, syntactic boundaries and semantic annotation

slide-15
SLIDE 15

Slide 15

FEAST Talk; 28.01.2009

Metadata of a Broadcaster (MESH Project)

  • Deutsche Welle provides in the MESH project for data

consisting on audio/video material and textual metadata. This is a very valuable data set, since the textual metadata consists also in manually manually annotated scenes descriptions.

  • This dataset can be used for building a training corpus for

automated alignment of video, audio and text data.

slide-16
SLIDE 16

Slide 16

FEAST Talk; 28.01.2009

The Metadata Labels

<DOC filename=„0324000-3_Journal_ ENG_F4001C_26122003_2000“>

<TYPE>Earthquake Iran</TYPE> <SERIES>Journal F: 4001 C</SERIES> <SEG sid=“integer”> <TITLE></TITLE> <DESCRIPTION></DESCRIPTION> <SCENES> </SCENES> <KEYWORDS></KEYWORDS> </SEG>

</DOC>

slide-17
SLIDE 17

Slide 17

FEAST Talk; 28.01.2009

The Title Tag

<TITLE> TdT: Erdbeben /Iran/Zerstörungen in Bam/Trümmer/Ruinen/Opfer </TITLE> Extract: “Erdbeben” (keyword for disaster ontology) ; location “Iran” (with NE detection). Other terms, but yet still unclear about their role

slide-18
SLIDE 18

Slide 18

FEAST Talk; 28.01.2009

The Description Tag

<DESCRIPTION> Ein schweres Erdbeben hat im Iran die Stadt Bam fast völlig zerstört. </DESCRIPTION> Linguistic and semantic analysis: [Subj-NP Ein schweres <noun-disaster>Erbeben</noun-disaster>] [Vhat] [LOC-PP in <ne-country>Iran</ne-country>] [OBJ-NP die Stadt <ne- city>Bam</ne-city>] [ADV fast völlig] [Vzerstört]. Extraction: Who (causation):Erbeben (Earthquake) What_action: zertören (destroy) What: Stadt Bam (city of Bam). Here the system can infer that Bam is located in Iran. Where: Iran

slide-19
SLIDE 19

Slide 19

FEAST Talk; 28.01.2009

The Scenes Tag

<SCENES> <SCENE sid="1">Bam: Menschen sitzen zwischen Trümmern auf Boden</SCENE> <SCENE sid="2">verzweifelte Menschen sitzen am Strassenrand</SCENE> <SCENE sid="3">Schuttberge</SCENE> <SCENE sid="4">zerstörte Häuser</SCENE> <SCENE sid="5">rauchende Trümmer</SCENE> </SCENES> Descriptons of sequences of images displayed. Extracting related entities:People within ruins, desperate people, destroyed houses, smoking ruins etc. All those terms can be seen as “consequences of the earthquake”. Important also: they provide for a description of what is to be seen in the video.

slide-20
SLIDE 20

Slide 20

FEAST Talk; 28.01.2009

The Keywords Tag

<KEYWORDS> Naher Osten: Iran; Erdbeben </KEYWORDS> The pattern of the content of this tag allows us to infer that Iran is located in “near-east”.

slide-21
SLIDE 21

Slide 21

FEAST Talk; 28.01.2009

Text Regions, OCR, Speech and Still Images: Towards a cross-media grammar

In both images the text is splittet and distributed around an image, but with different referential properties

slide-22
SLIDE 22

Slide 22

FEAST Talk; 28.01.2009

Linguistic Knowledge Structures

Multiple layers and levels

Low-level linguistic features (tokenization, morphology, …) Semantic properties of terms and phrases Named Entities Relation Extraction (incl. Grammatical Relations) Semantic linking to domain ontologies Can involve several abstraction layers connected through reasoning/mapping processes Semantic linking to other media analysis

slide-23
SLIDE 23

Slide 23

FEAST Talk; 28.01.2009

Features for MM Analysis

Mostly Physical features extracted, also called low-level features (see MPEG7 for an exhaustive listing, or http://gps-tsc.upc.es/imatge/_Philippe/Philippe.html for a very good introduction to MPEG-7)

<element <element name="MediaTime" type="mds:MediaTime" name="MediaTime" type="mds:MediaTime" minOccurs="1" maxOccurs="1"/ minOccurs="1" maxOccurs="1"/ > <element <element ref="ColorSpace" minOccurs="0" maxOccurs="1"/> ref="ColorSpace" minOccurs="0" maxOccurs="1"/> <element <element ref="ColorQuantization" minOccurs="0" ref="ColorQuantization" minOccurs="0" maxOccurs="1"/> maxOccurs="1"/> <element <element ref="GofGopColorHistogram” ref="GofGopColorHistogram” minOccurs="0" inOccurs="0" maxOccurs="1"/> maxOccurs="1"/> <element <element ref="CameraMotion" minOccurs="0" maxOccurs="1"/> ref="CameraMotion" minOccurs="0" maxOccurs="1"/> <element <element ref="MotionActivity" minOccurs="0" maxOccurs="1"/> ref="MotionActivity" minOccurs="0" maxOccurs="1"/> <element <element ref="Mosaic" ref="Mosaic"

slide-24
SLIDE 24

Cross-Media Ontologies

Slides from Michael Sintex, Paul Buitelaar and others

slide-25
SLIDE 25

Slide 25

FEAST Talk; 28.01.2009

Semiotic Triangle

  • See (Ogden & Richards, 1923) - based on
  • Structural Linguistics (de Saussure, 1916)
  • philosophical work by Peirce (mostly 19th century)
slide-26
SLIDE 26

Slide 26

FEAST Talk; 28.01.2009

Semiotic Triangle – the real world

... actual goalkeepers in the real world ...

slide-27
SLIDE 27

Slide 27

FEAST Talk; 28.01.2009

Semiotic Triangle – concepts

... actual goalkeepers in the real world ...

slide-28
SLIDE 28

Slide 28

FEAST Talk; 28.01.2009

Semiotic Triangle – words

... actual goalkeepers in the real world ... goalkeeper (EN) Torwart (DE) doelman (NL) ...

slide-29
SLIDE 29

Slide 29

FEAST Talk; 28.01.2009

Semiotic Triangle – images

... actual goalkeepers in the real world ...

slide-30
SLIDE 30

Slide 30

FEAST Talk; 28.01.2009

Features of Multilingual & MM Signs

Features of Words (Multilingual Signs), e.g.

  • Term

“goalkeeper”

  • Language

EN

  • Part-of-Speech

noun

  • Morphology

goal-keeper

Features of Images (Multimedia Signs), e.g.

  • Image-Segment

“Kahn.jpg”

  • Color-Set

#111111, …

  • Shape

  • Texture

keypatch-set …

slide-31
SLIDE 31

Slide 31

FEAST Talk; 28.01.2009

Linguistic Knowledge Structures

Multiple layers and levels

Low-level linguistic features (tokenization, morphology, …) Semantic properties of terms and phrases Named Entities Relation Extraction (incl. Grammatical Relations) Semantic linking to domain ontologies Can involve several abstraction layers connected through reasoning/mapping processes Semantic linking to other media analysis

slide-32
SLIDE 32

Slide 32

FEAST Talk; 28.01.2009

Representing Features of Multilingual and

Multimedia Signs

rdfs: subClassOf rdfs:subClassOf

meta-classes classes rdfs:Class feat:ClassWithFeats

  • :FootballPlayer

feat:ClassWithFeats

  • :Midfielder

feat:ClassWithFeats

feat:imgFeat feat:lingFeat

if:ImgFeat lf:LingFeat

rdfs:Class rdfs:Class

lf:lang “de” lf:term “Mittelfeldspieler” …

lf:LingFeat

if:color “#111111” lf:texture “&keypatchSet_223 …

if:ImgFeat

URI

rdf:type

property ...

Legend

  • :Defender

feat:ClassWithFeats

feat:lingFeat lf:lang “de” lf:term “Abwehrspieler” …

lf:LingFeat

... ...

instances

slide-33
SLIDE 33

Slide 33

FEAST Talk; 28.01.2009

Features – Interacting Layers

  • nto-

logy

Images Other Media

content features

English Text German Text

informal f

  • r

m a l informal formal

feature associations

informal f

  • r

m a l formal formal informal informal

slide-34
SLIDE 34

Cross-Media Knowledge Extraction

slide-35
SLIDE 35

Slide 35

FEAST Talk; 28.01.2009

KSpace Multimedia Data Set

Extend and align SmartWeb data set with related 2006 world cup A/V data

Video available in MPEG-7 (Audio transcripts of radio reports)

Use aligned multimedia data set to extract multimedia features

Image segmentation Event and object recognition

Develop a cross-media ontology as a basis for developing semantic-based algorithms for cross-media OLP The SmartWeb Data Set is freely available

slide-36
SLIDE 36

Slide 36

FEAST Talk; 28.01.2009

Image Symbols

slide-37
SLIDE 37

Slide 37

FEAST Talk; 28.01.2009

Image Symbols – Segments

SmartMedia (MPEG7) Ontology

slide-38
SLIDE 38

Slide 38

FEAST Talk; 28.01.2009

Image Symbols – Segments and Concepts

SmartMedia (MPEG7) Ontology SportEvent Ontology

slide-39
SLIDE 39

Slide 39

FEAST Talk; 28.01.2009

SmartMediaLing

Integrated ontology for the representation of text and image segments with their features - as information

  • bjects

Based on the DOLCE Ontology of Information Objects (OIO) Includes

SmartMedia ontology (based on MPEG7) for the representation of image segments with features LingInfo ontology (lexicon model) for the representation of multilingual terms with features

slide-40
SLIDE 40

Slide 40

FEAST Talk; 28.01.2009

Information Objects

smartdns:information-object smartdns:about smartdns:ordered-by (not in SWINTO) edns:interprets (not in SWINTO) smartdns:realized-by edns:agent (not in SWINTO) any segmentation or classification algorithm for text, image, A/V analysis smartdns:description Format: JPG, UTF-8, … Lang: DE, EN, … is-a smartoio:physical-realization Media File: text, image, … smartdns:description (not correct in SWINTO) Media Modality: text, audio, video, … smartdns:expresses smartdolce DOLCE smartdns / edns DOLCE-D&S smartoio DOLCE-OIO sm (smartmedia) MEPG7 li (linginfo) LingInfo smartoio:linguistic-object smartdolce:entity link to a domain class or property (semantics / interpretation)

slide-41
SLIDE 41

Slide 41

FEAST Talk; 28.01.2009

Image Segments with Features Information Objects

smartdns:information-object smartdns:about smartdns:ordered-by (not in SWINTO) edns:interprets (not in SWINTO) smartdns:realized-by smartdolce:uses (not in SWINTO) edns:agent (not in SWINTO) any segmentation or classification algorithm for text, image, A/V analysis smartdns:description Format: JPG, UTF-8, … Lang: DE, EN, … smartdolce:entity link to a domain class or property (semantics / interpretation) sm:SegmentDecomposition result of segmentation sm:TextSegment segment in text is-a sm:textSegment smartoio:physical-realization Media File: text, image, … smartdolce:perdurant segmentation process smartdolce:participant-in smartoio:identified-by (not in SWINTO) smartdns:description (not correct in SWINTO) Media Modality: text, audio, video, … smartdns:expresses sm:TextDecomposition result of text segmentation sm:VideoSegmentSpatio TemporalDecomposition result of video segmentation sm:StillRegion segment in video sm:stillRegion sm:VisualDescriptor image features sm:visualDescriptor smartdolce DOLCE smartdns / edns DOLCE-D&S smartoio DOLCE-OIO sm (smartmedia) MEPG7 li (linginfo) LingInfo is-a smartoio:linguistic-object

slide-42
SLIDE 42

Slide 42

FEAST Talk; 28.01.2009

Linguistic Segments with Features

smartdns:information-object smartdns:about smartdns:ordered-by (not in SWINTO) edns:interprets (not in SWINTO) smartdns:realized-by smartdolce:uses (not in SWINTO) edns:agent (not in SWINTO) any segmentation or classification algorithm for text, image, A/V analysis smartdns:description Format: JPG, UTF-8, … Lang: DE, EN, … li:LingInfo linguistic realization of domain class (similar: visual- / audio-information object) smartdolce:entity link to a domain class or property (semantics / interpretation) sm:SegmentDecomposition result of segmentation sm:TextSegment segment in text is-a sm:textSegment li:PhraseOrWordForm linguistic features smartoio:physical-realization Media File: text, image, … smartdolce:perdurant segmentation process smartdolce:participant-in smartoio:identified-by (not in SWINTO) smartdns:description (not correct in SWINTO) Media Modality: text, audio, video, … smartdns:expresses sm:TextDecomposition result of text segmentation sm:VideoSegmentSpatio TemporalDecomposition result of video segmentation sm:StillRegion segment in video sm:stillRegion sm:VisualDescriptor image features sm:visualDescriptor smartdolce DOLCE smartdns / edns DOLCE-D&S smartoio DOLCE-OIO sm (smartmedia) MEPG7 li (linginfo) LingInfo li:lingInfo is-a is-a smartoio:linguistic-object is-a li:MorphoSyntactic Decomposon linginfo:morphSyntacticDecomposition is-a sm:textSegment

slide-43
SLIDE 43

Slide 43

FEAST Talk; 28.01.2009

smartdolce DOLCE smartdns / edns DOLCE-D&S smartoio DOLCE-OIO sm (smartmedia) MEPG7 li (linginfo) LingInfo smartdns:information-object smartdns:about smartdns:ordered-by (not in SWINTO) edns:interprets (not in SWINTO) smartdns:realized-by smartdolce:uses (not in SWINTO) edns:agent (not in SWINTO) any segmentation or classification algorithm for text, image, A/V analysis smartdns:description Format: JPG, UTF-8, … Lang: DE, EN, … li:LingInfo linguistic realization of domain class (similar: visual- / audio-information object) smartdolce:entity link to a domain class or property (semantics / interpretation) sm:SegmentDecomposition result of segmentation sm:TextSegment segment in text is-a sm:textSegment li:PhraseOrWordForm linguistic features smartoio:physical-realization Media File: text, image, … smartdolce:perdurant segmentation process smartdolce:participant-in smartoio:identified-by (not in SWINTO) smartdns:description (not correct in SWINTO) Media Modality: text, audio, video, … smartdns:expresses sm:TextDecomposition result of text segmentation sm:VideoSegmentSpatio TemporalDecomposition result of video segmentation sm:StillRegion segment in video sm:stillRegion sm:VisualDescriptor image features sm:visualDescriptor li:lingInfo is-a is-a smartoio:linguistic-object is-a li:MorphoSyntactic Decomposon linginfo:morphSyntacticDecomposition is-a sm:textSegment

slide-44
SLIDE 44

Slide 44

FEAST Talk; 28.01.2009

COMM

Integrated ontology for the representation of image (and text) segments, their features and their analysis - as information objects Based on DOLCE patterns

Descriptions and Situations (DnS) Ontology of Information Objects (OIO)

Includes

MPEG7 for image segments, features and analysis (LingInfo for multilingual terms with features)

slide-45
SLIDE 45

Slide 45

FEAST Talk; 28.01.2009

COMM Overview

Decomposition Pattern Content and Annotation Pattern Semantic Annotation Pattern Algorithm and Digital Data Pattern

slide-46
SLIDE 46

Slide 46

FEAST Talk; 28.01.2009

Events in Text and Image/Video

Costa Rica's Ronald Gomez escapes a tackle by Germany's Tim Borowski during the Germany v Costa Rica Group A match in Munich

slide-47
SLIDE 47

Slide 47

FEAST Talk; 28.01.2009

Events in Text and Image/Video

Costa Rica's Ronald Gomez escapes a tackle by Germany's Tim Borowski

slide-48
SLIDE 48

Slide 48

FEAST Talk; 28.01.2009

Events in Video

HEADER RED-CARD PARRY TACKLE SHOT

slide-49
SLIDE 49

An Architecture for Mining Resources Complementary to Audio-Visual Streams

  • J. Nemrava, P. Buitelaar, N. Simou, D. Sadlier, V. Svátek, T.

Declerck, A. Cobet, T. Sikora, N. O'Connor, V. Tzouvaras, H. Zeiner, J. Petrák

slide-50
SLIDE 50

Slide 50

FEAST Talk; 28.01.2009

A/V Streams and Complementary Resources

Audio-Video Streams

Analysis of A/V stream captures low-level features using suitable detectors

Primary Complementary

Directly attached to the media Overlay text, spoken commentaries,

Secondary Complementary

Independent from the media Written commentaries, summaries, analysis

slide-51
SLIDE 51

Slide 51

FEAST Talk; 28.01.2009

Architecture Overview

2 1

slide-52
SLIDE 52

Slide 52

FEAST Talk; 28.01.2009

Audio-Video Analysis

  • 6 available detectors
  • Crowd image
  • Speech-Band Audio Activity
  • On-Screen Graphics Tracking
  • Motion activity measure
  • Field Line orientation
  • Close-up
slide-53
SLIDE 53

Slide 53

FEAST Talk; 28.01.2009

Primary Complementary Resources

  • Video track
  • Overlay text

Text region detection

  • Merging 16 frames to recognize

static objects in the video

OCR to establish match time Use this information for synchronization between video and match time

  • Other textual information such as player numbers provide additional

information from the primary resource

  • Audio track
  • Speech commentaries (not used yet)
slide-54
SLIDE 54

Slide 54

FEAST Talk; 28.01.2009

Secondary Complementary Resources

Tabular

Basic Match Information List of players, goals, cards, etc. Meta Information Location, Attendance, Date, etc.

slide-55
SLIDE 55

Slide 55

FEAST Talk; 28.01.2009

Secondary Complementary Resources

Unstructured

Several minute-by-minute sources Text analysis for entity and event extraction using SProUT IE system Player Actions Player Names German and English

Ontology based IE tool

SProUT

‘A beautiful pass by Ruud Gullit set up the first Rijkaard header.’

slide-56
SLIDE 56

Slide 56

FEAST Talk; 28.01.2009

SmartWeb Ontology

SProUT uses the SmartWeb football ontology for extraction

  • f:

Player actions (like Foul, Diving, etc.) Goalkeeper actions (like Parry, DropKick, etc.) Referee actions Trainer actions …

slide-57
SLIDE 57

Slide 57

FEAST Talk; 28.01.2009

Alphabet of the FiRE Knowledge Base

Concepts = {Scoringopportunity Outofplay Handball Kick Scoregoal Cross Foul Clear Cornerkick Dribble Freekick Header Trap Shot Throw Pass Ballpossession Offside Charge Lob Challenge Booked Goalkeeperdive Block Save Substitution Tackle EndOfField EndOfField MiddleF MiddleField ield Other Cr Other Crowd

  • wd

Motion Clos Motion CloseUp eUp Audio udio} Roles = {consistOf} Individuals = {min0 sec20 sec40 sec60 min1 sec80 sec100 sec120 min2 sec140 sec160 sec180 min3 sec200…}

slide-58
SLIDE 58

Slide 58

FEAST Talk; 28.01.2009

FiRE Knowledge Base

〈 min1 : Kick ≥ 1 〉 # extracted from text 〈 min1 : Scoregoal ≥ 1 〉 〈 sec80 : Audio ≥ 0.06 〉 # extracted from A/V stream 〈 sec80 : Crowd ≥ 0.231 〉 〈 sec80 : Motion ≥ 0.060 〉 〈 sec80 : EndOfField ≥ 0.05 〉 〈(min1 : sec60 ) : consistOf ≥ 1〉 # relating temporal objects extracted from text and from A/V stream 〈(min1 : sec80 ) : consistOf ≥ 1〉 〈(min1 : sec100 ) : consistOf ≥ 1〉 〈(min1 : sec120 ) : consistOf ≥ 1〉

slide-59
SLIDE 59

Slide 59

FEAST Talk; 28.01.2009

Cross-Media Ontology in FiRE Format

  • Subset of SmartWeb football ontology enriched with A/V detectors. All

Classes are sets of temporal objects

slide-60
SLIDE 60

Slide 60

FEAST Talk; 28.01.2009

Cross-Media Extraction of Instances

  • Extraction from textual resources (by minute)
  • Entity and event instance
  • Extraction from A/V streams (by second)
  • Crowd image detector – values range ∈ [0,1]
  • Speech-Band Audio Activity - values range ∈ [0,1]
  • Motion activity measure - values range ∈ [0,1]
  • Close-up - values range ∈ [0,1]
  • Field Line orientation - values range ∈ [0,90]
  • Values of first four A/V detectors are fuzzified for a range between a

mean value and 1

  • Values of Field Line detector are classified into three high-level classes

(MidField, EndField, Other)

slide-61
SLIDE 61

Slide 61

FEAST Talk; 28.01.2009

Query Examples

slide-62
SLIDE 62

Slide 62

FEAST Talk; 28.01.2009

Cross-Media Features

  • Basic idea

Identify which video detectors are more prominent for which event class For instance for CORNERKICK the “end-zone” video detector should be significantly high

  • Strategy

Analyze distribution of video detectors over event classes Identify significant detectors for each class Feedback into the video event detection algorithm

slide-63
SLIDE 63

Slide 63

FEAST Talk; 28.01.2009

Cross-Media Features

  • Purpose of the cross-media descriptors is to capture the features and relations

in multimodal data so as to be able to retrieve complementary information when dealing with one of the data sources

  • Goal is to build up model to classify events in video independently from the

text

  • Use of cross-media features in event-type classification of video segments

by use of fuzzy reasoning with the FiRe inference engine

slide-64
SLIDE 64

Slide 64

FEAST Talk; 28.01.2009

UI for Browsing and Querying

Extracted Entities and Events

  • from tables and textual match summaries -

Extracted Entities and Events

  • from tables and textual match summaries -

Non-linear event and entities browser Non-linear event and entities browser A/V Stream A/V Stream A/V Features Analysis A/V Features Analysis Complementary Text Resources

  • minute-by-minute
  • match reports -

Complementary Text Resources

  • minute-by-minute
  • match reports -
slide-65
SLIDE 65

Slide 65

FEAST Talk; 28.01.2009

Relations between NLP and Multimedia

Before: separated set of annotations (low-level features for MM content and Linguistic-Semantic features for text) put in relations via time codes (audio document) or layout properties (Web pages with a combination of text and images) Actual Work: Merging LL features and LS features via interoperable ontological descriptions of objects and events => Knowledge driven integration/interoperability of key semantic features in both fields.

adding an ontology description on the top of linguistic descriptions and concepts resulting from image analysis. shares/integrates concepts over the linguistic and the low-level features.

slide-66
SLIDE 66

Slide 66

FEAST Talk; 28.01.2009

Conclusions and future Work

A model for the integration of audio/video and textual semantic features is being proposed at the upper level of domain ontologies Semantically annotated multimedia (inlcuding text) data sets are under construction. Those data sets can be used as training data for improving the automatic semantic annotation of audio/video material, also at the combined level of low- level MM and linguistic features. Concepts to be proved/tested in a real applications (soccer, natural disasters etc.) Continue work also with still images (see next slides)