1
EE 6882 Statistical Methods for Video Indexing and Analysis
Fall 2004
- Prof. Shih-Fu Chang
EE 6882 Statistical Methods for Video Indexing and Analysis Fall - - PowerPoint PPT Presentation
EE 6882 Statistical Methods for Video Indexing and Analysis Fall 2004 Prof. Shih-Fu Chang http://www.ee.columbia.edu/~sfchang Lecture 1 - part B (9/7/04) 1 Run-through of a simple image search system Color, Texture, distance metrics, and
1
2 EE6882-Chang
System," ACM Multimedia Conference, Boston, MA, Nov. 1996.
Magazine, Summer, Vol. 4 No. 3, pp.12-20, 1997.
QBIC system. In IEEE Computer, volume 38, pages 23-31, 1995.
William Equitz. Efficient and effective querying by image content. J. of Intelligent Information Systems, 3(3/4):231-262, July 1994. (QBIC System)
Transactions on Circuits and Systems for Video Technology, Volume: 11 Issue: 6 , Page(s): 696 -702, June 2001.
descriptors," IEEE Transactions on Circuits and Systems for Video Technology, Volume: 11 Issue: 6 , Page(s): 703 -715, June 2001.
Applications to Image Databases. Proceedings of the ICCV'98, Bombay, India, January 1998, pages 59-66.
3 EE6882-Chang
User User User interface User interface Image thumbnails Image thumbnails Images & videos Images & videos Network Network Query server Query server Image/video Server Image/video Server Index Index Archive Archive What functionalities should each component have? What are the bottlenecks of the system?
4 EE6882-Chang
Why visual features?
Manual annotation is tedious and insufficient Computers cannot understand images Comparison of visual features enables comparison of visual
scenes
Need tools for organizing filtering and searching through large
amounts of visual data
What visual features?
What is available in the data? What features does the human visual system (HVS) use? Color: suitable for color images Texture: visual patterns, surface properties, cues for depth Shape: boundaries of real world objects, edges Motion: camera motion vs. object motion
5 EE6882-Chang
How to use visual features?
Extraction Representation Discrimination Indexing
Considerations
Complexity Invariance
Rotation, scaling, cropping, occlusion, shift, etc.
Dimension Subjective relevance Distance Metric
6 EE6882-Chang
Fundamental approach is from pattern recognition
Group pixels, process the group and generate a
Discrimination via (transform and ) feature vector
Multidimensional indexing of the feature vectors
Do this for color and texture
Build a content-based image retrieval system
7 EE6882-Chang
The Munsell System (1905)
Colors are arranged so that, as nearly as possible the
perceptual distance between adjacent color is constant. The Munsell Book of Color – color chips
The Natural Color System (NCS) – (1981) Natural Color System Atlas – derived from 60,000
Color are described by the relative amounts of basic colors:
The DIN system (1981) The Coloroid system (1980-1987) Optical Society of American System (OSA) (1981) Hunter LAB System (1981)
8 EE6882-Chang
Advantages of Color Order Systems
Easy to understand, plus samples are available Easy to use and compare colors side-by-side Number and spacing of samples can be adapted to application
Disadvantages
Too many color order systems, can’t translate between them Color comparison is only valid for required illuminant User perception differs Application to self-luminous colors (i.e., monitors and
computer displays) is not easy
9 EE6882-Chang
What is COLOR?
the visible spectrum (form blue=400nm to red=700nm). β ρ γ Examples: λ=500nm (β, γ, ρ)=(20, 40, 20) B=100 (β, γ, ρ)=(100, 5, 4) G=100 (β, γ, ρ)=(0, 100, 75) R=100 (β, γ, ρ)=(0, 0, 100)
[Oberle]
10 EE6882-Chang
P1(λ) P2 (λ) P3 (λ) α1 α2 α3
HVS
Same Response (β, γ, ρ) E.g., use are R, G, B as primary colors P1 , P2 ,P3
11 EE6882-Chang
Color Spaces
RGB – cube in Euclidean space Standard representation used in color displays Drawbacks
RGB basis not related to human color judgments Intensity should for one of the dimensions of color Important perceptual components of color are hue,
brightness and saturation R G B r g b R G B R G B R G B = = = + + + + + +
12 EE6882-Chang
HSI-cone (cylindrical coordinates) Opponent-Cartesian YIQ-NTSC television standard
2 1
1 2 1
−
2 / 1 2 2 2 1
1 2 1 1 1 2 1 1 1 R G R Bl Y G W Bk B − − − = − − −
13 EE6882-Chang
brightness varies along the
hue varies along the
saturation varies along the
14 EE6882-Chang
From Jain’s DIP book
15 EE6882-Chang
16 EE6882-Chang
How many colors to keep
IBM QBIC 16M(RGB) 4096 (RGB) 64 (Munsell) colors Columbia U. VisualSEEK 16M (RGB) 166 (HSV) colors
(18 Hue, 3 Sat, 3 Val, 4 Gray)
Stricker and Orengo (Similarity of Color Images)
16M (RGB) 16 hues, 4 val, 4 sat = 128(HSV) colors 16M (RGB) 8 hues, 2 val, 2 sat = 32 (HSV) colors
Sqain and Ballard (Color Indexing)
16M (RGB) 8 wb, 16rg, 16by = 2048 (OPP) colors
Independent quantization – each color dimension is
Joint quantization – color dimensions are quantized
17 EE6882-Chang
Feature extraction from color images
Choose GOOD color space Quantize color space to reduce number of colors Represent image color content using color
Feature vector IS the color histogram
R G B RGB m n
18 EE6882-Chang
Advantages of color histograms
Compact representation of color information Global color distribution Histogram distance metrics
Disadvantages
High dimensionality No information about spatial positions of colors
19 EE6882-Chang
L1 distance L2 distance Histogram Intersection Quadratic Distance Other histograms
Edge histogram + total edge count Texture Issue: quality of edge, texture extraction, lighting (dark frame)
1 1
( , 1) ( ) ( )
i i j
D i i H j H j
+
+ = −
2 2 1
( , 1) ( ) ( )
i i j
D i i H j H j
+
+ = −
( )
1 1
min ( ), ( ) 1 min ( ), ( )
i i j I i i j j
H j H j D H j H j
+ +
= −
( ) ( )
1 2
1 1 1 1 2 2 1 2 1 2 1 2
( ) ( ) ( , ) ( ) ( ) ( , ) : , .
1 2
Q i i i i j j j ,j
D H j H j j j H j H j j j correlation between colors j j e.g. 1-d α α
+ +
= − −
A B C D E Regions: Color 1 2 1 3 1 Size 12 15 3 1 5
1 1 1 1 1 1
, ,..., , , ,..., ,
I n n I n n n n G i i i i H i i i i i i G H
G G by triangular inequality α β α β α β α β α α β β α α β β
= =
′ ′ ′ ′ ′ = = ′ ′ ′ ′ ∆ − + − ∆ − + − ∆ > ∆
1 2 2 1 1 2 2 1 2 1 1 ... ... B C B B A A B B C B A A Color Quantization B C D B A A Region Segmentaition B B B A E E Labeling B B A A E E B B A A E E → →
Not just count of colors, also check adjacency
1 2 3 17 15 3 1 Color Color Co. Vector: α β
21 EE6882-Chang
Limitations of Euclidean Metric
Cannot distinguish classes Correlation between features Curved boundaries
Change feature Use Mahalanobis dist
Distinctive subclasses
Use clustering
Complex features
Use better features
+ ++ +
+ + +
vs + ++ +
+ + + + + +
+
+ +
+
+ + + +
+ + + +
( ) ( )
2 1 1 2 1 2 1
(1,1) (1,2) ... (1, ) ... ... ... ... ( ,1) ( ,2) ... ( , ) ( , ) ( ) ( ) ( ) ( ) / 1, :
T mah x x N k k k
D x x C x x c c c d covariance matrix C c d c d c d d c i j x i m i x j m j N N number of samples
− =
= − − = = − − −
xj
xj
xj
xj
xj
j
c s s = −
1 2
i j
c s s = −
c =
1 2
i j
c s s =
i j
c s s =
1 2 1 2 1 2 1 1 1 2 1 2 1 2
| ...| ( , ,..., ) | ...| | ...| ( ( , ,..., )) | ...|
T x d d d T x d d d
C e e e diag e e e C e e e diag e e e λ λ λ λ λ λ
− −
= =
e2
si, sj: std. deviation Projects data to the eigen vectors, divide the sd of each eigen dimension, and compute Euclidian distance
23 EE6882-Chang
Advantage of Mahalanobis metric
Account for scaling of coordinate axes Invariant under linear transformation Correct for correlation Prove curved as well as linear decision boundaries
Potential issue
Need enough training data to estimate Cov. Matrix
Need d(d-1)/2 independent elements
2 2
,
T y x y x
If y Ax C AC A D D = ⇒ = =
. km cm . . . .. . . . . .. . . .. .. .
.. . . . ... . . . . . .. . . . .
c1 m1 cc mc xi Minimum Selector Selected class
24 EE6882-Chang
I J cij
I: set of suppliers J: set of consumers cij : cost of shipping a unit of supply from i to j
0, , , ,
i j ij i I i I ij ij j i I ij i j J j i j J i I
minimize c f s.t. f i I j J (No reverse shipping) f y j J (satisfy each consumer need /cacacity) f x i I (each supplier's limit) y x (feasibility)
∈ ∈ ∈ ∈ ∈ ∈
≥ ∈ ∈ = ∈ ≤ ∈ ≤
25 EE6882-Chang
Efficient implementations exist (Simplex Method) Also support partial matching (||I|| >< ||J||, e.g., histogram
If the mass of two distributions equal, then EMD is a true
Allow flexible structures, e.g., matching multiple regions in
Multiple region in one image, each region represented by
individual feature vector
Region set: {R1, R2, R3} Region set: {R1’, R2’, R3’, R4’} Cij = dist(Ri, Rj’), which can be based on EMD also
26 EE6882-Chang
( ) ( ) ( ) ( ) ( ) ( ) ( )
1 1 1 1
, ,..., , , ,..., , ( ) ( ) ,
j i M N ij ij i j M N ij i j
h h 1 h 2 h M g= g 1 g 2 h N assume g j h i C f EMD h g f
= = = =
= ≤ =
Earth Hole
1 1 1
/
M N N ij ij j i j j ij ij ij
= C f g Fill up each hole C : distance between color i in color space h and color j in color space g f : move f units of mass from i in h to j in g
= = =
Normalization by the denominator term
Avoid bias toward low mass distributions Experiment result [Robner, Tomos, Guiba’98]
27 EE6882-Chang
EMD pre pre
x . . . . . .
ij ij j i j i j ij ij j i j i j i j j i i
f p,q f (p,q ) f p f q x p y q ≥
For color histogram
Color i means color of ith bin x: histogram yj: histogram
EMD > Distance between
28 EE6882-Chang
29 EE6882-Chang
What is texture?
Has structure or repetitious pattern, i.e., checkered Has statistical pattern, i.e., grass, sand, rocks
Why texture?
Application to satellite images, medical images Describes contents of real world images, i.e., clouds, fabrics,
surfaces, wood, stone
Challenging issues
Rotation and scale invariance (3D) Segmentation/extraction of texture regions from images Texture in noise
30 EE6882-Chang
Fourier Domain Energy Distribution
Angular features (directionality) Radial features (coarseness)
2 1 1 2
tan , ) , (
2 1
θ θ
θ θ
≤ ≤ =
−
u v where dudv v u F V
2 2 2 1 2
, ) , (
2 1
r v u r where dudv v u F V r
r
< + ≤ = ∫∫
x
ω
y
ω
φ
x
ω
y
ω
r
31 EE6882-Chang
Co-occurrence Matrix - (image with m levels)
Popular early texture approach
) cos( and ) sin( and ] , [ and ] , [ top' ' e.g., pixels,
relation ) , ( , ) , ( ) , ( ) , ( ) , ( ) , (
1 1 1 1 ) , ( ) , ( ) , ( ) , ( ) , (
θ θ θ
θ θ θ θ θ
d x x d y y j y x I i y x I d R where m m Q m Q m Q Q j i Q
d R d R d R d R d R
+ = + = = = = =
1
32 EE6882-Chang
) , (
d R θ
i j d R
) , (θ
∑∑ =
i j R d R
j i Q j i Q d H ) , ( log ) , ( ) , (
) , (θ
θ
∑∑ ⋅ − − =
i j R y x y x
j i Q j i d C ) , ( ) )( ( ) , ( σ σ µ µ θ
i j R
2
∑∑ − + =
i j R
j i Q j i d L ) , ( ) ( 1 1 ) , (
2
θ
Measures
corresponds to a visual component.
33 EE6882-Chang
Non-Fourier type bass Matched better to intuitive texture features Examples of filters (total 12)
− − − − − − − − − 1 4 6 4 2 8 12 8 2 2 8 12 8 2 1 4 6 4 1 − − − − − 1 2 1 2 4 2 2 4 2 1 2 1
− − − − − − − − − − − − 1 4 6 4 1 4 16 24 16 4 6 24 36 24 6 4 16 24 16 4 1 4 6 4 1
Measure energy of output from each filter
m
I
12 outputs
34 EE6882-Chang
pixels
k Best L
2 1 k
− − − −
+ − = + − =
1 1 1
2 2 2 2
k k k k
y y j k x x i K
1 1 ,
k k k k h k − −
= =
=
m j n i Best CRS
j i S mn F
1 1
) , ( 1
38 EE6882-Chang
User User User interface User interface Image thumbnails Image thumbnails Images & videos Images & videos Network Network Query server Query server Image/video Server Image/video Server Index Index Archive Archive What are the bottlenecks of the system? What functionalities should each component have?
39 EE6882-Chang
Ground Truth in DB, A Returned Result, B B A C
, B Precision , Recall ↑ ↓ ↑
Recall Precision
# Image ID Rank Correct Rank 1 5 7 2 20 ... N
i i i
40 EE6882-Chang
Detection False Alarms Misses Correct Dismissals
N images M Benchmark queries K Returned Results
1
" Irrelevant " Relevant" " 1
= n Vn
k N n n k k N n n k k n n k k n n k
− = − = − = − −
1 1 1 1
41 EE6882-Chang
Given size of the returned results K
k
D
k
B
k
C
k
A
“Returned” “Relevant Ground Truth”
k k k k k k k k k k k k
42 EE6882-Chang
) (
k k
R P vs
k
P
k
R
k k
B A vs
k k
F A vs
1
− =
N n n k
k k
Ak (hit) Bk (false)