Multimedia Indexation
Titus ZAHARIA, Pr.
Titus.Zaharia@telecom-sudparis.eu
Multimedia Indexation Titus ZAHARIA, Pr. - - PowerPoint PPT Presentation
Multimedia Indexation Titus ZAHARIA, Pr. Titus.Zaharia@telecom-sudparis.eu Multimedia indexation Interactive multimedia Still i mage Video Audio Graphics (2D/3D, static or animated) Tera-bytes of digital AV data Multimedia
Titus.Zaharia@telecom-sudparis.eu
Still image Audio Video Graphics (2D/3D, static
[source : TREC trec.nist.gov]
Disposing of huge multimedia databases is useless without the necessary information search and retrieval tools
Associate to multimedia content pertinent descriptions (meta-data) which make it possible to retrieve the desired information in large databases Typical example: Textual indexing: keywords (the web and existing search engines)
Difficulty to find appropriate words for describing an image/video: subjectivity Linguistic barriers Complex multimedia content: poly-semantic character
Define descriptions intrinsically related to the content and to its perceptual characteristics Descriptor: mathematical representation of and audio or video feature
components defined in a data description language (e.g., XML, RDF, OWL…)
Audio attributs (primitives) Speech Music Melody Timbre…
Visual attributs (primitives) Color Shape Texture Motion
0,005 0,01 0,015 0,02 0,025 0,03 0,035 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 Example: a color histogram of a given image
i pi
Define descriptions intrinsically related to the content and to its perceptual characteristics Descriptor mathematical representation of and audio or visual feature Compare images: define a similarity measure in the descriptor space
Example: Lp distances between histograms
i p1
i
i p2
i
i i i
p p p p d
2 1 2 1
) , (
Define descriptions intrinsically related to the content and to its perceptual characteristics Description scheme: more complex meta-data structure, integrating descriptors and other description schemes
What describe ?
characters with yellow clothes, world cup)
Spatial segmentation into
Adapted descriptions associated to each object
Huge and complex amount of information Set of scenes and shots with heterogeneous content Multiples objects Need of structuring
Shot : Temporal segment corresponding to a unique camera shooting Scene : set of shots that are homogeneous w.r.t. to a certain criterion, generally semantic Video object : spatial or spatio-temporal region of arbitrary shape corresponding to a semantically coherent entity
Décomposition (temporelle) (média) Deux segments audio-visuels Deux régions en mouvement Descripteurs : Annotation textuelle Mosaïque Mouvement dominant
AS1 AS2
Deux segments audios Décomposition Décomposition (spatio-temporelle) Un segment vidéo Descripteurs Annotation textuelle Descripteurs visuels de Couleur/Texture/Forme Segment audio-visuel
AS1 AS2
Scene description: elements of
spatio-temporal localisation
AS1 AS2
MPEG-4 standard: first to take into account scene descriptions with compositing of video objects of arbitrary shape, natural and synthetic (graphics) 2D/3D MPEG-4 scene: tree-based representation, each node corresponding to an object
Scenes): binary version of VRML (Virtual Reality Modeling Language)
AS1 AS2
MPEG-4 standard: first to take into account scene descriptions with compositing of video objects of arbitrary shape, natural and synthetic (graphics) MPEG-4 scene description
composition and transmission
spatio-temporal locators)
MPEG-7 standard: indexation of multimedia content
Texte
Shots Key-frames (still images) Spatio-temporal object Spatial object
Music Parole
Multiple descriptions associated to a same AV document Multiple, parallel decompositions corresponding to different criteria and media Make interoperable and re-usable the various indexations Support a large range of media in different formats Offer dedicated tools for visualisation/navigation/annotation
Normalisation Interoperabiliy
Objets AV
Proprietary environments
Objets AV Objets AV
Multimedia Content Description Interface
Offer a standard for multimedia content description Support a large range of potential application
Feature extraction Search engine Description Standard
Elaborate the standard ISO/IEC JTC1/SC29/WG11 - 15938
Multimedia Content Description Interface
A set of descriptors (D)
(color, shape, motion, texture, audio...)
semantics of this representation
A set of description schemes (DS)
semantics of the relations between its components (Ds or DSs)
Descriptors Description schemes
Multimedia Content Description Interface
A description language (DDL)
Encoding schemes
requirements related to compression efficiency, transmission, error resilience, scalability, universal access
Descriptors Description schemes Description language MPEG-7 description Coded description
B.S. Manjunath, P. Salembier, T. Sikora, « The MPEG-7 Book », John Wiley & Sons, 2002 P. Gros, « L’indexation multimédia: description et recherche automatiques », Hermes, Lavoisier, Paris 2007 A. Mostefaoui, F. Prêteux, V. Lecuire, J-M. Moureaux, « Gestion des données multimédias », Hermes, Lavoisier, Paris 2004
Couleur Texture Espace de couleur Histogramme des orientations des contours Quantification de couleur Texture homogène (représentation énergétique par filtrage de Gabor) Histogramme de couleur scalable (représentation par transformée de Haar) Couleur-structure (histogramme d’éléments structuraux) Parcours rapide de texture (caractéristiques Tamura) Couleur d’un groupe de trames (histogramme moyen, médian ou intersection) Mouvement Mouvement paramétrique Trajectoire Couleur dominantes Mouvement de la caméra (modélisation complète d’une camera 3D) Distribution spatiale de couleur (fondée sur la transformée de DCT) Activité de mouvement Localisation Forme Localisation spatiale (région polygonale) Forme-région (2D) (angular radial transform) Localisation spatio-temporelle (ensemble de régions polygonales) Forme-contour (2D) (espace échelle de contour) Autres Reconnaissance de visage (eigenfaces) Forme 3D (spectre de forme 3D)
Multimedia Content Description Interface
Multimedia Content Description Interface
Scalability / geometrie & size Independance w.r.t. pose
A shape descriptor need to be invariant w.r.t. similarity transforms (Euclidian & isotropic scaling)
Multimedia Content Description Interface
(arbitrary topologies)
Multimedia Content Description Interface
f n e
m n jm , ( , )
cos( )
1
harmonic functions {fmn} defined over the unit disc
polar coordinates
Multimedia Content Description Interface
f n e
m n jm , ( , )
cos( )
1
First 36 harmonic functions fmn
Object’s support function (normalized to the unit disc)
Multimedia Content Description Interface
35 coefficients uniformly quantized over 4 bits retained (c0,0 discarded)
1 ,..., 1 , , 1 ,..., 1 , N n M m
c f r d d
m n m n , , ( , ) ( , )
1 2
Multimedia Content Description Interface
in the object’s gravity center
between the origin and the object’s pixels
normalized polar coordinates x y dmax
max
Multimedia Content Description Interface
can vary from one pose to another x y dmax
Multimedia Content Description Interface
absolute values of ART coefficients
intrinsically invariant to rotations
captures perceptually meaningful features of the shape
Multimedia Content Description Interface
Describes shapes that can be represented as an unique contour
(curvilinear abscises)
Multimedia Content Description Interface
t
s y s x s c ) ) ( ), ( ( ) (
1 , s
x y
O
2 3 2 2
)) ( ' ) ( ' ( ) ( ' ) ( " ) ( ' ' ) ( ' ) ( , 1 , s y s x s y s x s y s x s k s
curve the
lenght total
from lenght s
s
Multimedia Content Description Interface x y
O Successive low-pass filtering
Multimedia Content Description Interface
Successive low-pass filtering
Initial shape Convex (prototype) shape
Multimedia Content Description Interface
Multimedia Content Description Interface
(exclusion from the search process of shapes with C and E too different from the query)
Multimedia Content Description Interface
area perimeter C
2
) (
2 11 2 02 20 02 20 2 11 2 02 20 02 20
4 ) ( ) ( 4 ) ( ) ( m m m m m m m m m m E
2 , 1 , , j i
m N x s y s
ij i j s N
1
1
( ) ( )
Multimedia Content Description Interface
Field Number of bits Semantics NumofPeaks 6 Number of curvature peaks GlobalCurvature 2 x 6 Circularity and excentricity of the initial curve PrototypeCurvature 2 x 6 Circularity and excentricity of the smoothed convex curve HighestPeakY 7 Absolute value of the highest curvature peak PeakX[] 6 Position (curvilinear abscisse) of the curvature peaks PeakY[] 3 Absolute values of the curvature peaks
Multimedia Content Description Interface
1 2 2 2 2
i ypeak j ypeak i ypeak j xpeak i xpeak Mcss
set the
sum : peaks matched
set the
sum : peaks unmatched
Multimedia Content Description Interface
Multimedia Content Description Interface
Beyond MPEG-7
2
IR
) sin cos ( ) , ( ) , ( dy dx s y x y x f s G
Beyond MPEG-7
2
IR
) sin cos ( ) , ( ) , ( dy dx s y x y x f s G
domain (x, y) into the parameter space (s, ) characterizing the set of lines in the 2D plane
Beyond MPEG-7
) sin cos ( 2
) , ( ) , ( ds d e s G s y x f
y x j
Beyond MPEG-7
defined over the spatial domain
set of lines with continuous parameters (s, )
computerized tomography
Beyond MPEG-7
x y s O Spatial domain s Line parameter space
2 ,
, s
Beyond MPEG-7
2 N
1
2
N j j
j
1 max
s
N i s i
S i s S s
s s
S N
max
Smax : maximal object size
Beyond MPEG-7
defined for sS and p() x y O
p = (xp,yp)
function construct the set of lines passing through p and with orientations in
sin cos ) , (
p p
y x p s
value si(p, ) in the discrete set S
Beyond MPEG-7
Beyond MPEG-7
Titus.Zaharia@telecom-sudparis.eu
en rotation, vibration …)
Lettre «Z»
temporellement stable
"zero" "Z" "F" "L" "T" "B" "K" , "H" "U" "E" "Q" "I" "X" "O" "W" "A" "D" "G" "J" "M" "S" "V" "Y" "C" "N" "R"
"Z" "F" "Neutre" "L" "T" "B" "K" , "H" "U" "E" "Q" "I" "X" "O" "W" "A" "D" "G" "J" "M" "S" "V" "Y" "C" "N" "R"
t, , l-H T t, -H T t, -H T
Prototype naturel
LETTRE «B»
Prototype de synthèse non calibré
LETTRE «Z» LETTRE «F»
sur les prototypes naturels
sur les prototypes de synthèse calibrés
"I" "W"
5 10 15 20 25 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117
u l e e
k b
K reconnu comme U : Mêmes masques binaires 2D Information 3D Confusion entre B et - : Prototypes similaires en 2D & 3D Information contextuelle
B Trait d’union
R reconnu comme G : Mêmes masques binaires 2D Information 3D Confusion entre B et - : Prototypes similaires en 2D & 3D Information contextuelle
5 10 15 20 25 30 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166
h i k e m i b i g b a r
Prototypes similaires