Multimedia Queries and Indexing Prof Stefan Rger Multimedia and - - PowerPoint PPT Presentation

multimedia queries and indexing
SMART_READER_LITE
LIVE PREVIEW

Multimedia Queries and Indexing Prof Stefan Rger Multimedia and - - PowerPoint PPT Presentation

Multimedia Queries and Indexing Prof Stefan Rger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis Multimedia queries and indexing 1. What are multimedia queries? 2. Fingerprinting


slide-1
SLIDE 1

Multimedia Queries and Indexing

Prof Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis

slide-2
SLIDE 2

Multimedia queries and indexing

  • 1. What are multimedia queries?
  • 2. Fingerprinting
  • 3. Image search and indexing
  • 4. Evaluation
  • 5. Browsing, search and geography
slide-3
SLIDE 3

New search types

query doc conventional text retrieval hum a tune and get a music piece you roar and get a wildlife documentary type “floods” and get BBC radio news Example

text video images speech music sketches multimedia l

  • c

a t i

  • n

s

  • u

n d humming motion t e x t i m a g e s p e e c h

slide-4
SLIDE 4

Exercise

Organise yourself in groups Discuss with neighbours

  • Two Examples for different query/doc modes?
  • How hard is this? Which techniques are involved?
  • One example combining different modes
slide-5
SLIDE 5

Exercise

query doc

Discuss

  • 2 examples
  • How hard is it?
  • 1 combination

l

  • c

a t i

  • n

s

  • u

n d humming motion t e x t i m a g e s p e e c h l

  • c

a t i

  • n

s

  • u

n d humming motion t e x t i m a g e s p e e c h text video images speech music sketches multimedia

slide-6
SLIDE 6

The semantic gap

1m pixels with a spatial colour distribution faces & vase-like object

slide-7
SLIDE 7

Polysemy

slide-8
SLIDE 8

Multimedia queries and indexing

  • 1. What are multimedia queries?
  • 2. Fingerprinting
  • 3. Image search and indexing
  • Meta-data and piggy back retrieval
  • Automated annotation
  • Content-based retrieval
  • 4. Evaluation
  • 5. Browsing, search and geography
slide-9
SLIDE 9

Metadata Dublin Core simple common denominator: 15 elements such as title, creator, subject, description, … METS Metadata Encoding and Transmission Standard MARC 21 MAchine Readable Cataloguing (harmonised) MPEG-7 Multimedia specific metadata standard

slide-10
SLIDE 10

MPEG-7

Moving Picture Experts Group “Multimedia

Content Description Interface”

Not an encoding method like MPEG-1, MPEG-2 or

MPEG-4!

Usually represented in XML format Full MPEG-7 description is complex and

comprehensive

Detailed Audiovisual Profile (DAVP)

[P Schallauer, W Bailer, G Thallinger, “A description infrastructure for audiovisual media processing systems based on MPEG-7”, Journal of Universal Knowledge Management, 2006]

slide-11
SLIDE 11

MPEG-7 example

<Mpeg7 xsi:schemaLocation="urn:mpeg:mpeg7:schema:2004 ./davp-2005.xsd" ... > <Description xsi:type="ContentEntityType"> <MultimediaContent xsi:type="AudioVisualType"> <AudioVisual> <StructuralUnit href="urn:x-mpeg-7-pharos:cs:AudioVisualSegmentationCS:root"/> <MediaSourceDecomposition criteria="kmi image annotation segment"> <StillRegion> <MediaLocator><MediaUri>http://...392099.jpg</MediaUri></MediaLocator> <StructuralUnit href="urn:x-mpeg-7-pharos:cs:SegmentationCS:image"/> <TextAnnotation type="urn:x-mpeg-7-pharos:cs:TextAnnotationCS: image:keyword:kmi:annotation_1" confidence="0.87"> <FreeTextAnnotation>tree</FreeTextAnnotation> </TextAnnotation> <TextAnnotation type="urn:x-mpeg-7-pharos:cs:TextAnnotationCS: image:keyword:kmi:annotation_2" confidence="0.72"> <FreeTextAnnotation>field</FreeTextAnnotation> </TextAnnotation> </StillRegion> </MediaSourceDecomposition> </AudioVisual> </MultimediaContent> </Description> </Mpeg7>

slide-12
SLIDE 12

MPEG-7 example

<Mpeg7 xsi:schemaLocation="urn:mpeg:mpeg7:schema:2004 ./davp-2005.xsd" ... > <Description xsi:type="ContentEntityType"> <MultimediaContent xsi:type="AudioVisualType"> <AudioVisual> <StructuralUnit href="urn:x-mpeg-7-pharos:cs:AudioVisualSegmentationCS:root"/> <MediaSourceDecomposition criteria="kmi image annotation segment"> <StillRegion> <MediaLocator><MediaUri>http://...392099.jpg</MediaUri></MediaLocator> <StructuralUnit href="urn:x-mpeg-7-pharos:cs:SegmentationCS:image"/> <TextAnnotation type="urn:x-mpeg-7-pharos:cs:TextAnnotationCS: image:keyword:kmi:annotation_1" confidence="0.87"> <FreeTextAnnotation>tree</FreeTextAnnotation> </TextAnnotation> <TextAnnotation type="urn:x-mpeg-7-pharos:cs:TextAnnotationCS: image:keyword:kmi:annotation_2" confidence="0.72"> <FreeTextAnnotation>field</FreeTextAnnotation> </TextAnnotation> </StillRegion> </MediaSourceDecomposition> </AudioVisual> </MultimediaContent> </Description> </Mpeg7>

slide-13
SLIDE 13

Digital libraries

Manage document repositories and their metadata Greenstone digital library suite

http://www.greenstone.org/ interface in 50+ languages (documented in 5) knows metadata understands multimedia

XML or text retrieval

slide-14
SLIDE 14

Piggy-back retrieval

query doc

l

  • c

a t i

  • n

s

  • u

n d humming motion t e x t i m a g e s p e e c h text video images speech music sketches multimedia text

slide-15
SLIDE 15

Music to text

0 +7 0 +2 0 -2 0 -2 0 -1 0 -2 0 +2 -4 ZBZb ZGZB GZBZ

Z G Z B Z b Z b Z a Z b Z B d

[with Doraisamy, J of Intellig Inf Systems 21(1), 2003; Doraisamy PhD thesis 2004]

slide-16
SLIDE 16

[technology licensed by Imperial Innovations] [patent 2004] [finished PhD: Pickering] [with Wong and Pickering, CIVR 2003] [with Lal, DUC 2002] [Pickering: best UK CS student project 2000 – national prize]

slide-17
SLIDE 17

[technology licensed by Imperial Innovations] [patent 2004] [finished PhD: Pickering] [with Wong and Pickering, CIVR 2003] [with Lal, DUC 2002] [Pickering: best UK CS student project 2000 – national prize]

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Multimedia queries and indexing

  • 1. What are multimedia queries?
  • 2. Fingerprinting
  • 3. Image search and indexing
  • Meta-data and piggy back retrieval
  • Automated annotation
  • Content-based retrieval
  • 4. Evaluation
  • 5. Browsing, search and geography
slide-21
SLIDE 21

Automated annotation as machine translation

water grass trees

the beautiful sun le soleil beau

slide-22
SLIDE 22

Automated annotation as machine learning

Probabilistic models:

maximum entropy models models for joint and conditional probabilities evidence combination with Support Vector Machines

[with Magalhães, SIGIR 2005] [with Yavlinsky and Schofield, CIVR 2005] [with Yavlinsky, Heesch and Pickering: ICASSP May 2004] [with Yavlinsky et al CIVR 2005] [with Yavlinsky SPIE 2007] [with Magalhães CIVR 2007, best paper]

slide-23
SLIDE 23

A simple Bayesian classifier

Use training data J and annotations w P(w|I) is probability of word w given unseen image I The model is an empirical distribution (w,J)

Eliezer S. Yudkowsky “An Intuitive Explanation of Bayes' Theorem” http://yudkowsky.net/rational/bayes

slide-24
SLIDE 24

Automated annotation

Automated: water buildings city sunset aerial

[Corel Gallery 380,000] [with Yavlinsky et al CIVR 2005] [with Yavlinsky SPIE 2007] [with Magalhaes CIVR 2007, best paper]

slide-25
SLIDE 25

The good

door

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]

slide-26
SLIDE 26

The bad

wave

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]

slide-27
SLIDE 27

The ugly

iceberg

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]

slide-28
SLIDE 28

Multimedia queries and indexing

  • 1. What are multimedia queries?
  • 2. Fingerprinting
  • 3. Image search and indexing
  • Meta-data and piggy back retrieval
  • Automated annotation
  • Content-based retrieval
  • 4. Evaluation
  • 5. Browsing, search and geography
slide-29
SLIDE 29

Why content-based?

Give examples where we remember details by

  • metadata?
  • context?
  • content (eg, “x” belongs to “y”)?

Metadata versus content-based: pro and con

slide-30
SLIDE 30

Content-based retrieval: features and distances

x x x x

  • Feature space
slide-31
SLIDE 31

Content-based retrieval: Architecture

slide-32
SLIDE 32

Features

Visual Colour, texture, shape, edge detection, SIFT/SURF Audio Temporal How to describe the features? For people For computers

slide-33
SLIDE 33

Digital Images

slide-34
SLIDE 34

Content of an image

145 173 201 253 245 245 153 151 213 251 247 247 181 159 225 255 255 255 165 149 173 141 93 97 167 185 157 79 109 97 121 187 161 97 117 115

slide-35
SLIDE 35

Histogram

1: 0 - 31 2: 32 - 63 3: 64 - 95 4: 96 – 127 5: 128 – 159 6: 160 – 191 7: 192 - 223 8: 224 – 255

1 2 3 4 5 6 7 8 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

slide-36
SLIDE 36

Colour

phenomenon of human perception three-dimensional (RGB/CMY/HSB) spectral colour: pure light of one wavelength

spectral colours: wavelength (nm) blue cyan green yellow red

slide-37
SLIDE 37

Colour histogram

slide-38
SLIDE 38

Exercise

Sketch a 3D colour histogram for

R G B

0 0 0 black 255 0 0 red 0 255 0 green 0 0 255 blue 0 255 255 cyan 255 0 255 magenta 255 255 0 yellow 255 255 255 white

slide-39
SLIDE 39

http://blog.xkcd.com/2010/05/03/color-survey- lt /

slide-40
SLIDE 40

HSB colour model

hue (0°-360°) spectral colour saturation (0% - 100%) = spectral purity brightness (0% - 100%) = energy or luminance chromaticity = hue+saturation

slide-41
SLIDE 41

HSB colour model

slide-42
SLIDE 42

HSB model

disadvantage: hue coordinate is not continuous

0 and 360 degrees have the same meaning but there is a huge difference in terms of numeric distance example: red = (0°,100%,50%) = (360°,100%,50%)

advantage: it is more natural to describe colour changes “brighter blue”, “purer magenta”, etc

slide-43
SLIDE 43

Texture

coarseness contrast directionality

slide-44
SLIDE 44

Texture histograms

Coarseness coNtrast Directionality

[with Howarth, IEE Vision, Image & Signal Proc 15(6) 2004; Howarth PhD thesis]

slide-45
SLIDE 45

Gabor filter

Query

Orientation Scale

[with Howarth, CLEF 2004]

slide-46
SLIDE 46

Shape Analysis

shape = class of geometric objects invariant under

translation scale (changes keeping the aspect ratio) rotations

information preserving description (for compression) non-information preserving (for retrieval)

boundary based (ignore interior) region based (boundary+interior)

slide-47
SLIDE 47

Perimeter and area

parameterised curve x(t), y(t) R count pixels in area boundary pixel count vs

R

slide-48
SLIDE 48

Global vs local

Global histogram also matches polar bears, marble floors, …

slide-49
SLIDE 49

Localisation

0.05 0.1 0.15 0.2 0.25 0.3 1 2 3 4 5 6 7 8

64% centre 36% border

slide-50
SLIDE 50

Tiled Histograms

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4 5 6 7 8 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 2 3 4 5 6 7 8

slide-51
SLIDE 51

Segmentation

0.05 0.1 0.15 0.2 0.25 0.3 1 2 3 4 5 6 7 8

foreground background

slide-52
SLIDE 52

Points of interest

Many PoI, ie, many feature vectors Quantised feature vectors words Bag of word model text retrieval

slide-53
SLIDE 53

gradual transition detection (eg, fade)

accumulate distances long-range comparison

audio cues

silence and/or speaker change

motion detection and analysis camera motion, zoom, object motion

MPEG provides some motion vectors

Video Segmentation

slide-54
SLIDE 54

Movie processing

[Vlad Tanasescu: Anticipation, SCiFi trailer]]

slide-55
SLIDE 55

At time t define distance dn(t)

  • compare frames t-n+i and t+i (i=0,...,n-1)
  • average their respective distances over i

Peak in dn(t) detected if

dn(t)>threshold and dn(t)>dn(s) for all neighbouring s

Shot = near-coincident peaks of d16 and d8

t time n

Long range comparison

slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59

Compute d2(t), d4(t), d8(t), d16(t) as function of t for the following sequence of frames:

feature “values” f(t) for the frames: ..., 9, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, ... distance is assumed to be d(t,s) = |f(t)-f(s)|

d1(t)=0 d1(t)=0 d1(t)=1

Exercise

slide-60
SLIDE 60

Features and distances

x x x x

  • Feature space
slide-61
SLIDE 61

Distances and similarities

assumes coding of MM objects as data vectors

distance measures

Euclidean, Manhattan

correlation measures

Cosine similarity measure histogram intersection for normalised histograms

slide-62
SLIDE 62

L2 vs L1

slide-63
SLIDE 63

p<1?

Mean average precision What happens at p<1? p

[with Howarth, ECIR 2005]

slide-64
SLIDE 64

Other distance measures

  • Squared chord
  • Earth Mover's Distance
  • Chi squared distance
  • Kullback-Leibler divergence (not a true distance)
  • Ordinal distances (for string values)
slide-65
SLIDE 65

Best distance?

Squared chord

[with Liu et al, AIRS 2008; with Hu et al, ICME 2008]

slide-66
SLIDE 66

Content-Based Image Retrieval

Which features are best for searching? Depends on the information need: Looking for sunset holiday pictures in your digital shoebox? Use colour histograms. Want to build a wallpaper customer database? Use colour + texture. Want to build a b/w sketch db for technical designs? Use shape descriptors. Not sure which features are best for a query (eg, if you also have abstract features such as Fourier coefficients)? Deploy relevance feedback and let the system learn the relevant features for this query...

slide-67
SLIDE 67

Exercise

Sketch a block diagram showing how you would implement a Multimedia Information Retrieval system for one of these scenarios:

  • 1. Browsing wallpaper patterns in a home decorator store
  • 2. Finding “interesting” photos in a personal collection of holiday snaps
  • 3. Managing industrial design pattern templates for a manufacturing

company Think about: what types of features you might use what would the query be the user interface