Multimedia Indexation Titus ZAHARIA, Pr. - - PowerPoint PPT Presentation

multimedia indexation
SMART_READER_LITE
LIVE PREVIEW

Multimedia Indexation Titus ZAHARIA, Pr. - - PowerPoint PPT Presentation

Multimedia Indexation Titus ZAHARIA, Pr. Titus.Zaharia@telecom-sudparis.eu Multimedia indexation Interactive multimedia Still i mage Video Audio Graphics (2D/3D, static or animated) Tera-bytes of digital AV data Multimedia


slide-1
SLIDE 1

Multimedia Indexation

Titus ZAHARIA, Pr.

Titus.Zaharia@telecom-sudparis.eu

slide-2
SLIDE 2

Tera-bytes

  • f digital AV data

Interactive multimedia

 Still image  Audio  Video  Graphics (2D/3D, static

  • r animated)

Multimedia indexation

slide-3
SLIDE 3

Multimedia indexation

Interactive multimedia

[source : TREC trec.nist.gov]

 Disposing of huge multimedia databases is useless without the necessary information search and retrieval tools

Tera-bytes

  • f digital AV data
slide-4
SLIDE 4

Indexation : definition

 Associate to multimedia content pertinent descriptions (meta-data) which make it possible to retrieve the desired information in large databases  Typical example: Textual indexing: keywords (the web and existing search engines)

Multimedia indexation

slide-5
SLIDE 5

Textual Indexation: limitations

 Difficulty to find appropriate words for describing an image/video: subjectivity  Linguistic barriers  Complex multimedia content: poly-semantic character

Multimedia indexation

slide-6
SLIDE 6

Content-based Indexation

 Define descriptions intrinsically related to the content and to its perceptual characteristics  Descriptor: mathematical representation of and audio or video feature

  • Syntax and semantics of its

components defined in a data description language (e.g., XML, RDF, OWL…)

Multimedia indexation

slide-7
SLIDE 7

Content-based Indexation

 Audio attributs (primitives)  Speech  Music  Melody  Timbre…

Multimedia indexation

slide-8
SLIDE 8

Content-based Indexation

 Visual attributs (primitives)  Color  Shape  Texture  Motion

0,005 0,01 0,015 0,02 0,025 0,03 0,035 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

Multimedia indexation

slide-9
SLIDE 9

Content-based Indexation

 Example: a color histogram of a given image

i pi

Multimedia indexation

slide-10
SLIDE 10

Content-based Indexation

 Define descriptions intrinsically related to the content and to its perceptual characteristics  Descriptor mathematical representation of and audio or visual feature  Compare images: define a similarity measure in the descriptor space

Multimedia indexation

slide-11
SLIDE 11

Content-based indexation

 Example: Lp distances between histograms

i p1

i

i p2

i

 

i i i

p p p p d

2 1 2 1

) , (

Multimedia indexation

slide-12
SLIDE 12

Content-based indexation

 Define descriptions intrinsically related to the content and to its perceptual characteristics  Description scheme: more complex meta-data structure, integrating descriptors and other description schemes

  • Signature combination
  • Multimodal descriptions
  • Multi-grannular/hierarchical descriptions

Multimedia indexation

slide-13
SLIDE 13

Content-based Indexation

 What describe ?

  • The whole image
  • Objects of interest (ex. : faces,

characters with yellow clothes, world cup)

 Spatial segmentation into

  • bjects of interest

 Adapted descriptions associated to each object

Multimedia indexation

slide-14
SLIDE 14

Video documents

 Huge and complex amount of information  Set of scenes and shots with heterogeneous content  Multiples objects  Need of structuring

  • Temporal segmentation into shots/scenes
  • Spatio-temporal segmentation (objects)

Multimedia indexation

slide-15
SLIDE 15

Video structuring

 Shot : Temporal segment corresponding to a unique camera shooting  Scene : set of shots that are homogeneous w.r.t. to a certain criterion, generally semantic  Video object : spatial or spatio-temporal region of arbitrary shape corresponding to a semantically coherent entity

Multimedia indexation

slide-16
SLIDE 16

Video structuring: example

Décomposition (temporelle) (média) Deux segments audio-visuels Deux régions en mouvement Descripteurs : Annotation textuelle Mosaïque Mouvement dominant

AS1 AS2

Deux segments audios Décomposition Décomposition (spatio-temporelle) Un segment vidéo Descripteurs Annotation textuelle Descripteurs visuels de Couleur/Texture/Forme Segment audio-visuel

Multimedia indexation

slide-17
SLIDE 17

Video structuring

AS1 AS2

 Scene description: elements of

spatio-temporal localisation

  • Time stamps
  • Support region descriptors
  • Hierarchical descriptions schemes

Multimedia indexation

slide-18
SLIDE 18

Video structuring

AS1 AS2

 MPEG-4 standard: first to take into account scene descriptions with compositing of video objects of arbitrary shape, natural and synthetic (graphics)  2D/3D MPEG-4 scene: tree-based representation, each node corresponding to an object

  • Language BIFS (BInary Format for

Scenes): binary version of VRML (Virtual Reality Modeling Language)

Multimedia indexation

slide-19
SLIDE 19

Video structuring

AS1 AS2

 MPEG-4 standard: first to take into account scene descriptions with compositing of video objects of arbitrary shape, natural and synthetic (graphics)  MPEG-4 scene description

  • Adapted to objectives of video

composition and transmission

  • Too elementary description (temporal and

spatio-temporal locators)

 MPEG-7 standard: indexation of multimedia content

Multimedia indexation

slide-20
SLIDE 20

Texte

Shots Key-frames (still images) Spatio-temporal object Spatial object

Description of video documents

Music Parole

Multimedia indexation

slide-21
SLIDE 21

Description of video documents

 Multiple descriptions associated to a same AV document  Multiple, parallel decompositions corresponding to different criteria and media  Make interoperable and re-usable the various indexations  Support a large range of media in different formats  Offer dedicated tools for visualisation/navigation/annotation

Multimedia indexation

slide-22
SLIDE 22

 Normalisation  Interoperabiliy

How to exchange?

Objets AV

 Proprietary environments

Objets AV Objets AV

Multimedia indexation

slide-23
SLIDE 23

MPEG-7 :

Multimedia Content Description Interface

 Offer a standard for multimedia content description  Support a large range of potential application

Objectives (N2861)

Feature extraction Search engine Description Standard

 Elaborate the standard ISO/IEC JTC1/SC29/WG11 - 15938

slide-24
SLIDE 24

MPEG-7 :

Multimedia Content Description Interface

 A set of descriptors (D)

Standardized items

  • A D is a representation of a feature

(color, shape, motion, texture, audio...)

  • A D defines the syntax and the

semantics of this representation

 A set of description schemes (DS)

  • A DS specifies the structure and the

semantics of the relations between its components (Ds or DSs)

Descriptors Description schemes

slide-25
SLIDE 25

MPEG-7 :

Multimedia Content Description Interface

 A description language (DDL)

Standardized items

  • Express existing DSs and Ds
  • Create new DSs and Ds
  • Extend/modify existing DSs

 Encoding schemes

  • A description is encoded in order to satisfy

requirements related to compression efficiency, transmission, error resilience, scalability, universal access

Descriptors Description schemes Description language MPEG-7 description Coded description

slide-26
SLIDE 26

Bibliography

 B.S. Manjunath, P. Salembier, T. Sikora, « The MPEG-7 Book », John Wiley & Sons, 2002  P. Gros, « L’indexation multimédia: description et recherche automatiques », Hermes, Lavoisier, Paris 2007  A. Mostefaoui, F. Prêteux, V. Lecuire, J-M. Moureaux, « Gestion des données multimédias », Hermes, Lavoisier, Paris 2004

Multimedia indexation

slide-27
SLIDE 27

Descripteurs de forme

slide-28
SLIDE 28

MPEG-7 Visual descriptors

slide-29
SLIDE 29

Couleur Texture Espace de couleur Histogramme des orientations des contours Quantification de couleur Texture homogène (représentation énergétique par filtrage de Gabor) Histogramme de couleur scalable (représentation par transformée de Haar) Couleur-structure (histogramme d’éléments structuraux) Parcours rapide de texture (caractéristiques Tamura) Couleur d’un groupe de trames (histogramme moyen, médian ou intersection) Mouvement Mouvement paramétrique Trajectoire Couleur dominantes Mouvement de la caméra (modélisation complète d’une camera 3D) Distribution spatiale de couleur (fondée sur la transformée de DCT) Activité de mouvement Localisation Forme Localisation spatiale (région polygonale) Forme-région (2D) (angular radial transform) Localisation spatio-temporelle (ensemble de régions polygonales) Forme-contour (2D) (espace échelle de contour) Autres Reconnaissance de visage (eigenfaces) Forme 3D (spectre de forme 3D)

MPEG-7 : Visual descriptors

Multimedia Content Description Interface

slide-30
SLIDE 30

MPEG-7 Shape Descriptors

  • Region shape
  • Contour shape
  • Multiview DS
  • 3D Shape

MPEG-7 : Visual descriptors

Multimedia Content Description Interface

slide-31
SLIDE 31

 Scalability / geometrie & size  Independance w.r.t. pose

Shape: intuitive geometric properties

 A shape descriptor need to be invariant w.r.t. similarity transforms (Euclidian & isotropic scaling)

MPEG-7 : shape descriptors

Multimedia Content Description Interface

slide-32
SLIDE 32

Region-Based Shape

  • Describes complex shapes, with multiple region / holes

(arbitrary topologies)

MPEG-7 : shape descriptors

Multimedia Content Description Interface

f n e

m n jm , ( , )

cos( )     

 1

  • Angular Radial Transform –ART Descriptor
  • Decomposition of the shape region within a family of

harmonic functions {fmn} defined over the unit disc

) , (  

polar coordinates

slide-33
SLIDE 33

Region-Based Shape

MPEG-7 : shape descriptors

Multimedia Content Description Interface

f n e

m n jm , ( , )

cos( )     

 1

  • Harmonic basis functions

First 36 harmonic functions fmn

slide-34
SLIDE 34

Object’s support function (normalized to the unit disc)

Region-Based Shape

MPEG-7 : shape descriptors

Multimedia Content Description Interface

  • Descriptor definition: coefficients cm,n of the decomposition

 35 coefficients uniformly quantized over 4 bits retained (c0,0 discarded)

   

1 ,..., 1 , , 1 ,..., 1 ,       N n M m

c f r d d

m n m n , , ( , ) ( , )

 

     

 1 2

slide-35
SLIDE 35

Region-Based Shape

MPEG-7 : shape descriptors

Multimedia Content Description Interface

  • Normalization to the unit disc
  • Place the coordinate system origin

in the object’s gravity center

  • Compute the maximal distance dmax

between the origin and the object’s pixels

  • Express the object’s support function in

normalized polar coordinates x y dmax

) , (   ) , ( y x ) , (

max

  d

Scale & position invariance

slide-36
SLIDE 36

Region-Based Shape

MPEG-7 : shape descriptors

Multimedia Content Description Interface

  • Extrinsic normalization to the unit disc: drawbacks
  • Sensitivity to noise
  • Sensitivity to shape variability
  • Articulated shapes: maximal distance

can vary from one pose to another x y dmax

Spatial misalignments

slide-37
SLIDE 37

Region-Based Shape

MPEG-7 : shape descriptors

Multimedia Content Description Interface

  • Similarity measure : L1 distance between ART coefficients
  • Rotation invariance: not satisfied
  • Rotation-invariant similarity measure : L1 distance between

absolute values of ART coefficients

  • The absolute values of the decomposition coefficients are

intrinsically invariant to rotations

slide-38
SLIDE 38

Contour-Based Shape

  • Captures characteristic shape features of an object
  • r region based on its contour
  • Uses so-called Curvature Scale-Space representation, which

captures perceptually meaningful features of the shape

MPEG-7 : shape descriptors

Multimedia Content Description Interface

Describes shapes that can be represented as an unique contour

slide-39
SLIDE 39

Contour-Based Shape

  • Curve representation: normalized length parameterization

(curvilinear abscises)

  • Local curvature

MPEG-7 : shape descriptors

Multimedia Content Description Interface

t

s y s x s c ) ) ( ), ( ( ) ( 

  ,

1 ,  s

x y

O

 

2 3 2 2

)) ( ' ) ( ' ( ) ( ' ) ( " ) ( ' ' ) ( ' ) ( , 1 , s y s x s y s x s y s x s k s     

  • An origin point on the curve : s = 0

curve the

  • f

lenght total

  • rigin

from lenght s 

s

slide-40
SLIDE 40

Contour-Based Shape

  • Descriptor: stores the values and the corresponding abscises
  • f the most important curvature peaks
  • Robust descriptor extraction: scale space analysis

MPEG-7 : shape descriptors

Multimedia Content Description Interface x y

O Successive low-pass filtering

slide-41
SLIDE 41

Contour-Based Shape

  • Robust descriptor extraction: scale space analysis

MPEG-7 : shape descriptors

Multimedia Content Description Interface

Successive low-pass filtering

Initial shape Convex (prototype) shape

  • Curvature analysis at different scales
slide-42
SLIDE 42

Contour-Based Shape

  • Robust descriptor extraction: scale space analysis

MPEG-7 : shape descriptors

Multimedia Content Description Interface

slide-43
SLIDE 43

Contour-Based Shape

  • Robust descriptor extraction: scale space analysis

MPEG-7 : shape descriptors

Multimedia Content Description Interface

slide-44
SLIDE 44

Contour-Based Shape

  • Two additional global shape descriptors included
  • Ensure filtering for queries in large databases

(exclusion from the search process of shapes with C and E too different from the query)

MPEG-7 : shape descriptors

Multimedia Content Description Interface

area perimeter C

2

) ( 

  • Circularity
  • Excentricity

2 11 2 02 20 02 20 2 11 2 02 20 02 20

4 ) ( ) ( 4 ) ( ) ( m m m m m m m m m m E         

  ,

2 , 1 , ,   j i

m N x s y s

ij i j s N

1

1

( ) ( )

slide-45
SLIDE 45

Contour-Based Shape

  • Standardized description representation

MPEG-7 : shape descriptors

Multimedia Content Description Interface

Field Number of bits Semantics NumofPeaks 6 Number of curvature peaks GlobalCurvature 2 x 6 Circularity and excentricity of the initial curve PrototypeCurvature 2 x 6 Circularity and excentricity of the smoothed convex curve HighestPeakY 7 Absolute value of the highest curvature peak PeakX[] 6 Position (curvilinear abscisse) of the curvature peaks PeakY[] 3 Absolute values of the curvature peaks

slide-46
SLIDE 46

Contour-Based Shape

  • Similarity measure

MPEG-7 : shape descriptors

Multimedia Content Description Interface

           

 

   

 

    

1 2 2 2 2

i ypeak j ypeak i ypeak j xpeak i xpeak Mcss

1

  • f

set the

  • ver

sum : peaks matched

2

  • f

set the

  • ver

sum : peaks unmatched

slide-47
SLIDE 47

Contour-Based Shape

  • Invariance issues

MPEG-7 : shape descriptors

Multimedia Content Description Interface

  • Intrinsically invariant to position and rotations
  • Extrinsic invariance to scale : bounding box normalization
slide-48
SLIDE 48

3D Shape descriptor

MPEG-7 : shape descriptors

Multimedia Content Description Interface

slide-49
SLIDE 49

2D Hough Transform (HT)

MPEG-7 : shape descriptors

Beyond MPEG-7

  • Discrete version of the Radon Transform



  

2

IR

) sin cos ( ) , ( ) , ( dy dx s y x y x f s G    

  • f (x, y) – 2D function to be analyzed
  • G (s,) – Radon transform
slide-50
SLIDE 50

The RadonTransform (RT)

MPEG-7 : shape descriptors

Beyond MPEG-7



  

2

IR

) sin cos ( ) , ( ) , ( dy dx s y x y x f s G    

  • Integral transform which maps the spatial

domain (x, y) into the parameter space (s, ) characterizing the set of lines in the 2D plane

slide-51
SLIDE 51

The RadonTransform (RT)

MPEG-7 : shape descriptors

Beyond MPEG-7

 

   

   

 

) sin cos ( 2

) , ( ) , ( ds d e s G s y x f

y x j

  • Reversibility of the Radon Transform
  • Inverse Radon Transform

) , ( y x f ) , (  s G

Image reconstruction from projections

slide-52
SLIDE 52

The Radon Transform (RT)

MPEG-7 : shape descriptors

Beyond MPEG-7

  • Complete representation of a function f(x, y)

defined over the spatial domain

  • Image reconstruction : need of an infinity of projections along a

set of lines with continuous parameters (s, )

  • Extensively used in medical imaging :

computerized tomography

  • Discrete version : the Hough transform
slide-53
SLIDE 53

2D Hough Transform (HT)

MPEG-7 : shape descriptors

Beyond MPEG-7

  • Discrete version of the Radon Transform
  • A line  in the 2D plane

 x y s O  Spatial domain s  Line parameter space

 

  2 , 

 

  , s

slide-54
SLIDE 54

2D Hough Transform (HT)

MPEG-7 : shape descriptors

Beyond MPEG-7

  • Quantization of the line parameter space (s, ) into a finite set
  • Example : uniform quantization

       

 

 2 N

1

2

 

          

  

N j j

j

1 max  

         

s

N i s i

S i s S s        

s s

S N

max

Smax : maximal object size

slide-55
SLIDE 55

2D Hough Transform (HT)

MPEG-7 : shape descriptors

Beyond MPEG-7

  • Construct a discrete 2D accumulator HT(s,)

defined for sS and  p() x y O

p = (xp,yp)

  • Initialize HT(s,) to 0, for all sS and 
  • For each point p belonging to the object’s support

function construct the set of lines passing through p and with orientations in 

  • For each line p() compute its distance to the
  • rigin :

   sin cos ) , (

p p

y x p s  

  • If s(p, ) is positive Quantize s(p, ) to its closest

value si(p, ) in the discrete set S

  • Increment the accumulator: HT(si,)++
slide-56
SLIDE 56

2D Hough Transform (HT)

MPEG-7 : shape descriptors

Beyond MPEG-7

  • Identification of the set of dominant lines in a binary image
slide-57
SLIDE 57

2D Hough Transform (HT)

MPEG-7 : shape descriptors

Beyond MPEG-7

  • Identification of the set of dominant lines in a binary image
slide-58
SLIDE 58

Descripteurs de forme

Titus ZAHARIA, Pr.

Titus.Zaharia@telecom-sudparis.eu

slide-59
SLIDE 59

Descripteur de Hough 2D pour la reconnaissance des signes en LSF

Titus ZAHARIA

slide-60
SLIDE 60

Structure sémantique d’un signe

Problématique

  • Nombre de mains utilisées
  • Localisation de la main
  • Configuration de la main
  • Orientation de la paume
  • Direction des doigts
  • Action (vers le haut, la droite,

en rotation, vibration …)

slide-61
SLIDE 61

Structure dynamique d’un signe

Approche développée

Lettre «Z»

  • Configuration

temporellement stable

  • Atlas de prototypes
slide-62
SLIDE 62

Prototypes naturels

"zero" "Z" "F" "L" "T" "B" "K" , "H" "U" "E" "Q" "I" "X" "O" "W" "A" "D" "G" "J" "M" "S" "V" "Y" "C" "N" "R"

slide-63
SLIDE 63

"Z" "F" "Neutre" "L" "T" "B" "K" , "H" "U" "E" "Q" "I" "X" "O" "W" "A" "D" "G" "J" "M" "S" "V" "Y" "C" "N" "R"

Prototypes de synthèse

slide-64
SLIDE 64

Estimation de la pose 3D de la main

Segmentation par MM avancée Descripteur de forme : t, , l - Transformée de Hough

t, , l-H T t, -H T t, -H T

slide-65
SLIDE 65

Indexation par description de forme

Prototype naturel

LETTRE «B»

Prototype de synthèse non calibré

Reconnaissance de la configuration de la main

slide-66
SLIDE 66

Indexation par description de forme

LETTRE «Z» LETTRE «F»

Reconnaissance de la configuration de la main

slide-67
SLIDE 67

Indexation par description de forme

  • 97% de reconnaissance

sur les prototypes naturels

  • 89% de reconnaissance

sur les prototypes de synthèse calibrés

"I" "W"

Reconnaissance de la configuration de la main

slide-68
SLIDE 68

Indexation par description de forme

Reconnaissance de la configuration de la main

5 10 15 20 25 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117

u l e e

  • r

k b

K reconnu comme U : Mêmes masques binaires 2D  Information 3D Confusion entre B et - : Prototypes similaires en 2D & 3D  Information contextuelle

B Trait d’union

slide-69
SLIDE 69

Indexation par description de forme

Reconnaissance de la configuration de la main

R reconnu comme G : Mêmes masques binaires 2D  Information 3D Confusion entre B et - : Prototypes similaires en 2D & 3D  Information contextuelle

5 10 15 20 25 30 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166

h i k e m i b i g b a r

  • A reconnu comme I :

Prototypes similaires

slide-70
SLIDE 70

Descripteur de Hough 2D pour la reconnaissance des signes en LSF

Titus ZAHARIA