Diverse Particle Selection for High-Dimensional Inference in Graphical Models
Erik Sudderth
UC Irvine Computer Science Collaborators:
Ø Particle Max-Product: Jason Pacheco, MIT Ø Human Pose: Silvia Zuffi & Michael Black, MPI Tubingen
Diverse Particle Selection for High-Dimensional Inference in - - PowerPoint PPT Presentation
Diverse Particle Selection for High-Dimensional Inference in Graphical Models Erik Sudderth UC Irvine Computer Science Collaborators: Particle Max-Product: Jason Pacheco, MIT Human Pose: Silvia Zuffi & Michael Black, MPI Tubingen
Ø Particle Max-Product: Jason Pacheco, MIT Ø Human Pose: Silvia Zuffi & Michael Black, MPI Tubingen
Probability Model Estimate
x5 x6 x7 x8 x4 x3 x1 x2 x9
PCA Shape
[ Zuffi et al., CVPR 2012 ]
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
LEFT EDGE
77
V~7$J~O~I RIGHT
NOSE
EDGE
MOUTH
(a)
VALUE(X)=(E+F+G+H)-(A+B+C+D)
Note: VALUE(X) is the value assigned to the
L(EV)A corresponding to the location X
as a function of the intensities of locations
A through H in the sensed scene.
(b)
K K2=CONSTANTS
a=(C+D+E+F)/4
p=(A+B+G+H+I+J)/6 p-(X+F)
IF [X<(a-K})
ELSE VALUE (X)
= y
(c)
Reference description of a face. (a) Schematic representation
indicating components and their linkages.
(b) Reference description for left edge of face.
(c) Reference
description for eye.
(noisy) face pictures using two references which
in-
cluded, but differed in, the nose/mouth definitions. In the first series, consisting of 90 experiments, there were 83 completely correct embeddings, and 7 partially incor-
rect embeddings. The errors involved six experiments in which the nose/mouth complex was offset by three to
four resolution cells from its ideal location, and one ex- periment in which both the eyes and the nose/mouth
complex were improperly placed. In the second series,
consisting of 45 experiments, the placement of the nose/
mouth complex was judged incorrect in 3 experiments,
while all the other components were always correctly
embedded.
Analysis of the face experiments led to the following
embedding the hair, eyes, and sides of the face, precise placement of the nose/mouth complex based on strictly
local evaluation was almost impossible in some of the
noisy pictures due to loss of detail [e.g., see Fig. 4(b) ].
With the attribute feature of the LEA not yet opera-
tional, and with the arbitrary decision to use binary
(rather than multivalued) weights in the spring arrays
for these experiments, the LEA restricted the feasible
region over which an optimum value could be selected
for embedding the nose/mouth complex, but did not bias the selection as would genetally be the case. In the
presence of heavy noise, the simple nose/mouth descrip-
tions used in these experiments were not always ade-
quate to produce a local optimum in the L(EV)A at or near the ideal embedding location. (A three-resolution
cell deviation was considered an error.)
Image-Matching Experiments Using Terrain Scenes Approximately 40 experiments have been performed
using terrain scenes (including both aerial and ground
scenes). The object in each case was to create a relatively
simple description of some portion of the scene and then attempt to find the proper embedding of the description
in the image (or some distorted or alternate view of
the image).
The descriptions employed two basic types of com-
ponents: 1) texture components, in which- the "texture value" of a point was defined as a crude statistical func-
tion of the intensity values and gradients in some local
region surrounding the point; and 2) shape components,
which were defined by collections of "edge" points hav-
ing specified gradients.
relative to the computer-stored version of the photo-
graph of the actual terrain segment as shown in Fig.
5 (c). Each coherent piece in reference 5 (a) is represented
by several points enclosed by a dotted line. In this ex-
ample, the points of each enclosure of the reference com-
Fischler & Elschlager, 1973
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
local and global evaluation functions. The global evalu-
ation function, associated with the relative positioning
previously,
has strong syntactic controls on its form to permit its inte-
gration directly into the decision algorithm. This is im- portant because the global evaluation produces the most severe combinatorial problems. A local evaluation func-
tion, associated with how well a given coherent piece is
independently embedded, is easily changed from prob- lem to problem (based on problem-dependent considera-
tions) without requiring any change in the core algo-
can be a (conventional) correlation function together with a pictorial reference component, or a procedure based on linguistic concepts together with a formal
description of a reference component,' or even a series
the core algorithms provides a great deal of flexibility
in' making changes or improvements in the evaluation
functions for a given problem, as well as when switching
separation, the performance of the algorithms (both
local and global) can be independently evaluated in a
direct and intuitively obvious manner. Such an evalua- tion then permits iterative improvement in performance
posed embedding metric. Let the reference be composed
postion of the ith component. Suppose there is a mech-
anism, either a computer program, or possibly a person,
ith component, outputs a numerical value l1(x2) that
indicates how strongly the ith component fits at location
xi of the sensed scene. The smaller li(xi), the better
the fit.
measure the presence of the ith component at a location
in the sensed scene independent of any knowledge of the
.locations of the other components. That is, li(xi) is a
purely local and possibly imprecise measure of the pres-
ence of the ith component at location xi. In addition to the purely local measure li, 1<i.p, there are the following considerations: 1) how well the different components are situated in the required spa-
tial relations to each other; and 2) how relative values
responding measured values in the sensed image (e.g.,
thicker and more greenish than the jth component). The
I Note that we are now further generalizinig the coincept of "com-
ponent." ITt no longer has to be a rigid entity defined pictorially, but
rather may be anv information structure or decision procedure which can be used to define a real-valued function whose domain of defini- tion is the set of all locations in the sensed image.
extent to which the above specifications are not satisfied
is reflected in the "stretching" of the springs between the
corresponding components.
with a two-dimensional vector (e.g., the components of
the vector can be the row and column number of the location in the sensed scene). In that case, xi-xj (usual vector subtraction) is a vector pointing from xj to xi.
ated with the spring joining the ith and jth components.
If there is no spring between these components, then gij is identically zero. If we set gij(xi, xj) =lI(x) when i =j; and let Xi
* , xi }, then the total cost of embedding p
components at locations X, is G(Xp).
p
i
i=i j-1
Expression (1) can also be written as
p
i=j
(1)
(2)
hi(Xi)
g
xj) .
j-l
hi(Xi) can be thought of as the cost of embedding the
ith component at location xi, given that the previous
In this section of the paper, we will present computa-
tional procedures for locating a suitable embedding of
just presented. A discussion of dynamic programing
"linear embedding algorithm" (LEA)] in proper per-
impractical) approach to solving the embedding prob-
tionally feasible approximation to this restricted DP
theoretic interpretation
is
in-
cluded to provide a better intuitive appreciation of the
Let us assume that the sensed image, designated by
the abbreviation SM, is composed of M resolution ele-
ments; while the reference, designated by the abbrevia-
tion RM,
is composed of P pictorially defined com-
ponents (coherent pieces) with a total of N=
ni
resolution elements, ni being the number of resolution
elements in the ith component.
bedding is to select combinationally N resolution ele- ments at a time from the SMT, determine if each sucl- se-
lection
satisfies
the coherent
69
Localize object by minimizing cost or energy defined by synthetic springs.
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
YSM
4
4
5 2
8
3
7 5
1
3
2
8 1 5
7
1 4
3
2
4
1
2
3
4 -_z
C1 (1, 1)
c2 (1,1) C3 (1, 1) C4 (1,1)
4Q-L
C2
2
3
= CI
=36
C2
4
=C3 =5 =C4 =3
I(z Y)
=
I SM(z,Y)
Ci
for I
i
Spring definition when (i, j)
=
(2, 1) or (i,j)
Xi - Xj =(Zi - Zj J
_ yj)
gi (xi- X)
1,0
2,0
1
Spring definition when (i,j)
= (3,2)
xi-=xj
Yi
Yj)
g.i(Xi-x.)
0,1
0,2
1
(b)
Evaluation of g2
x2 x1
61
s2 z2 Y2
z1Y1
I1
12 g21
92
24
14
1
2
3 3 4
1
4
1
4
1
6 2
4 4 4 4 4 2 4
4 4
1
3 4 2 4 6 2 3
1 3
1
1
2 3
3
1:3
1
3
1 2
3
1
3
4 4
3 2 3
1 1 1
3 3 3 5
1
22
1 2
2 3 5 3 2
1
2 2
1 1
4
2 2 5
1
4
2 2 2 5 3 1
32
1
3
4
Evaluation of 93
x3
x2
x
s3
z3 33
z2 Y2
zlyl
13
g32 g2 g3
2 3 2 4
1 4
3 3 3 3 3
4
1 4
4
6 10
4 3 4 4 3 4
2 6 8 2 2 2 3 1 3
4
2 6
2 4
4
1
3
3
2 3
3
2
3 4 4
3 4
1
6
j4
2
4
3 2 3 2 3 5
4 4
2
1
6 2 1 2 2 2 5
2 3
1
3 2
1
2 5 3 1
3 2
1 2
3
4
7
3
3 3 1
4
4 1 4 2 3 2
1
4 5
4 3
1
1
3
Evaluation of g4 = G
x4
23
xl
94
Z4y4
Z y3
3 1 Y
6
g43
g41
S4
14
g3
G
1 3
2 3
1 4
4
3
7 3 3
1 4 1
1 4
10 15
2 3 3
3
1 4
2
4 3
3 4
1
2
3
3
4 3
3 4
2 8
10
1 2 2 2 1 3 5 6
11
3 2 2 3
1
5 2 2 3 2 2 3
2
4 6
4 2
2
3
1 1 2 5 8
3 2
4
2 2 3 2 1 1
2
1
1 3 1 1 1
5
7
3
1 1 2 1 1 1 7 9
21 31 12
41 32
1
31'
4 1 3 2
1 5 6
(c)
An example illustrating the operation of the linear embedding algorithm. The definitions of x, gij, I,
are given
z and y are the components of x; that is, x = (z, y). (a) The sensed image. (b) The reference description. (c) Linear embedding algorithm.
73
x= (z Y)
1,4
2,4
3,4 4,4 1,3 2,3 3,3 4,3 1,2
2,2 3,2
4,2 1,1 2,1 3,1
4,1
SM (Zty)
5 2 8
8 7
5 1
3
8 1
5 7
4
3 2
4
(a)
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES 12345678901234567890 12345678901234567890
1
3
65
__ _ _m_ if8mZ-
6
s+A6ffISlSOO@SSG 1
7 366 I{f3E66ISII§@
7
9
AOI@8866eUSS640531X
10
_
v3U6G9SMAA8G6I66Z
11 _
__ XKSfiIIItl
}+-__+11166.__
.-__ 12
XflftfE+- +met68+
13
AIIIE X)-
1M6410- 14
AGOM +XI
+ 15
.11Z
8
+
11
16 8IMAMPMX+
=))++Si6X
17
3 ZXLxIS+ Is 8
} SSXAMMZXZ =IAXI=+Zii+
iS
3M6ZlZl=Xl = I) 1A
20 A81=+==Z) IX 21 4-61- 4Z4 Zt 22
+4- Al 23
6X81
)Z-=M+ 24 MA1= -ZZ==
6e25 *XZ++..+r-
ZA 26 )32111)+= 1t 27 4+X11X2++=4+
28
+XZ8113=
++ 29
lAXZ3l)'
=1+)
+
30
lAMAZI)=)
331 +XMPAZ1+-+)1= 32 =43XAMA211134
4+
33 += =IZXXZI+--
1
34 + +XZ12Z4 lXZ 35 1Z362)33+- +AXW- 36
11MMAI-
1AM6 37 511AMX+ XX88- 3e =1 zzz) ZZZZ 1234567890123456789012345678901234567890
Original picture.
* e* eew *
ssss ww.si{
06@ 1.*S~@ i*I66M6e@
I3 Z346060(
'U Z 5t)IIIIUL(5I4OO
0SW
U Z10 1CDO I 4 0*MiN
I*@ ii0@l00
016
7116.*
$6
iiB @ *
661*11600I@@
9~
0600
1M166M#ei#@ B@ R
s86 *!k
I3*I0i10C
10 1ti{ i i 6 1 1CEf81M111666M01184I0S6001S666
1~2 -f
56B
M 0g 6
10 6B 0- 30O101i9*ElE6Ol00
14 _00010631(46*116116000*11160069100i
161{{
0v 16{i! 910
1 7 00!* il@X4l@
19 * E*IEH 8l*80M 6088@ESMI 2E(8601 00O 2 C *6*66M66<f
MeE 1M 886MIM
2 20O00OOPf
HAA MAA AA A889MM6§6866#@EM 8i1*10
2 3.* {e 88
AM688 M 8eMA686A6881*OIO" 24 'AA 8AX 8M MMMs X 48MAX 00Jt@ 25 6606@OAM MXZMAhZA 86I8PRA M@868 X AP-EMMI1IO 2 616*fI1AdAM88M X2ZX2X AAZ AM8M AA6AAMA1MI*6
2 7 0*iAA MM(86068M 66fSE Q
>M'.4M>A7@
28 OOOOO0AA XXXZ1XAA4AAAXXAAAAAPAI'AMYM-E0101 26 600060A KXX XAMAX 2AA 2X 8XXXAAXX 2182IOOI1C 3CSOOS06XXXAAZXAA8X2XXZ
XXXZXXAAKA8O0O00Ci#
31006000XX2ZX
) ZA66228821 ZAXX23Z) ZZ100000 32 *06601X AA8XZX1 4-33))1XAXAXI8131Z ZX 34"*0000000611 *4*068§*66600W100010000
37 6f f i 4 lcl
00a l 38*06006000016000*@@e*1e*06116060000000000 1-2
12345678'C123456T8901234567893 1234557890L(EV)A for nose. (Density at a point is proportional
to probability that nose is present at that loca- tion.)1234567$Ol123456780 12345678901234567Q9qO
1 )M =+-I
_=
2 =
3- Z- + = 1
33 z =1
+ +m 4
1ZiW IA
11=
X + 5 Z
1lIZMSS4l Z=
X = -L
3-
=
8XA@* 14
ifiVMM
X) 7 1- +
1MMXs1Ul1.IiHX
Z *
1+ 8
1Z
) 1flSA.MZMZ0Se
+ Z == +
44zs
ws
~~~--+)- -~~
JC ZM =
,,W81-MPLZAl4§.MlMX3
2Z
111 z-
+796S50)
+L 12
1)
1 1 At4oGN XK
)RaiSo
13
3=X-Pf8XI8
A + + + iV@ X X8 +X M
14
1 =ZiHA9ZfS 1 1 Z Z61MX +
15
A
I+ *I,RXX
X M+ I= afXM=+ = Z
16
3Z,Z+@Z
48
84
66-) xz--
17
Z7-iXA61=AL
Z 68-ZI
3%6F3M
+
)12
18s-+- 4+---+}1
ISZ@1@= -lX-- -
Al
17--
I19
1+AEIlZ4-IA
= XM4X+
2c
+=
xx81z+3= Z= Z=+)l XSA=
z
2I e3 11mx
I M 1 +=R+
1
22
Z I 1*X3Ll-A 1 *
3X3
1
X)3
I
23
11
+e-)
e-i1
A
1)
r24
*13 WI) e
1A
ICZ =4
+A
1638-Z
+ ) 36 Z1
26 =
1X1ZA
6 L= )ZI X++
1 1-
3217 1-2+) 3M+X84
I2T3
TT1
A 8 =+
28
+-
=1X AB3
Z -ZI + A -
I=
29
2 14
++Z
3 1 +3C = = + AAXA- A ++
=-1
+
31
I-
1+-361 )=
3+- +X=1
+ =X
32
M 10A6X1388XX M4
A
=M+
I+=
33
JA
X8JZ1 +A=
1
+1 34
1
§ZA=
ZZXAZ
+
ZA=)
+4-
414+4 35
1A I -+ MAU2 Z
ABA
136 + ==
M6U))
1+)
X
37 I+)
1=
11xAZ1 =-4A++A))
138 P1 -=
=36M 1=XZ 1
3=
1=
12345678901234567S93 1234567a9C1234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8,21) L/EDGE WAS LOCATED AT (17, 11)
R/EDGE WAS LOCATED AT (17,25) L/EYE WAS LOCATED AT (17,14) R/EYE WAS LOCATED AT (17,20) NOSE WAS LOCATED AT (21,16) MOUTH WAS LOCATED AT (23,16) (b)
(b) Incorrect embedding of nose under random noise.
79
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES 12345678901234567890 12345678901234567890
1
3
65
__ _ _m_ if8mZ-
6
s+A6ffISlSOO@SSG 1
7 366 I{f3E66ISII§@
7
9
AOI@8866eUSS640531X
10
_
v3U6G9SMAA8G6I66Z
11 _
__ XKSfiIIItl
}+-__+11166.__
.-__ 12
XflftfE+- +met68+
13
AIIIE X)-
1M6410- 14
AGOM +XI
+ 15
.11Z
8
+
11
16 8IMAMPMX+
=))++Si6X
17
3 ZXLxIS+ Is 8
} SSXAMMZXZ =IAXI=+Zii+
iS
3M6ZlZl=Xl = I) 1A
20 A81=+==Z) IX 21 4-61- 4Z4 Zt 22
+4- Al 23
6X81
)Z-=M+ 24 MA1= -ZZ==
6e25 *XZ++..+r-
ZA 26 )32111)+= 1t 27 4+X11X2++=4+
28
+XZ8113=
++ 29
lAXZ3l)'
=1+)
+
30
lAMAZI)=)
331 +XMPAZ1+-+)1= 32 =43XAMA211134
4+
33 += =IZXXZI+--
1
34 + +XZ12Z4 lXZ 35 1Z362)33+- +AXW- 36
11MMAI-
1AM6 37 511AMX+ XX88- 3e =1 zzz) ZZZZ 1234567890123456789012345678901234567890
Original picture.
* e* eew *
ssss ww.si{
06@ 1.*S~@ i*I66M6e@
I3 Z346060(
'U Z 5t)IIIIUL(5I4OO
0SW
U Z10 1CDO I 4 0*MiN
I*@ ii0@l00
016
7116.*
$6
iiB @ *
661*11600I@@
9~
0600
1M166M#ei#@ B@ R
s86 *!k
I3*I0i10C
10 1ti{ i i 6 1 1CEf81M111666M01184I0S6001S666
1~2 -f
56B
M 0g 6
10 6B 0- 30O101i9*ElE6Ol00
14 _00010631(46*116116000*11160069100i
161{{
0v 16{i! 910
1 7 00!* il@X4l@
19 * E*IEH 8l*80M 6088@ESMI 2E(8601 00O 2 C *6*66M66<f
MeE 1M 886MIM
2 20O00OOPf
HAA MAA AA A889MM6§6866#@EM 8i1*10
2 3.* {e 88
AM688 M 8eMA686A6881*OIO" 24 'AA 8AX 8M MMMs X 48MAX 00Jt@ 25 6606@OAM MXZMAhZA 86I8PRA M@868 X AP-EMMI1IO 2 616*fI1AdAM88M X2ZX2X AAZ AM8M AA6AAMA1MI*6
2 7 0*iAA MM(86068M 66fSE Q
>M'.4M>A7@
28 OOOOO0AA XXXZ1XAA4AAAXXAAAAAPAI'AMYM-E0101 26 600060A KXX XAMAX 2AA 2X 8XXXAAXX 2182IOOI1C 3CSOOS06XXXAAZXAA8X2XXZ
XXXZXXAAKA8O0O00Ci#
31006000XX2ZX
) ZA66228821 ZAXX23Z) ZZ100000 32 *06601X AA8XZX1 4-33))1XAXAXI8131Z ZX 34"*0000000611 *4*068§*66600W100010000
37 6f f i 4 lcl
00a l 38*06006000016000*@@e*1e*06116060000000000 1-2
12345678'C123456T8901234567893 1234557890L(EV)A for nose. (Density at a point is proportional
to probability that nose is present at that loca- tion.)1234567$Ol123456780 12345678901234567Q9qO
1 )M =+-I
_=
2 =
3- Z- + = 1
33 z =1
+ +m 4
1ZiW IA
11=
X + 5 Z
1lIZMSS4l Z=
X = -L
3-
=
8XA@* 14
ifiVMM
X) 7 1- +
1MMXs1Ul1.IiHX
Z *
1+ 8
1Z
) 1flSA.MZMZ0Se
+ Z == +
44zs
ws
~~~--+)- -~~
JC ZM =
,,W81-MPLZAl4§.MlMX3
2Z
111 z-
+796S50)
+L 12
1)
1 1 At4oGN XK
)RaiSo
13
3=X-Pf8XI8
A + + + iV@ X X8
+X M 14
1 =ZiHA9ZfS 1 1 Z Z61MX +
15
A
I+ *I,RXX
X M+ I= afXM=+ = Z
16
3Z,Z+@Z
48
84
66-) xz--
17
Z7-iXA61=AL
Z 68-ZI
3%6F3M
+
)12
18s-+- 4+---+}1
ISZ@1@= -lX-- -
Al
17--
I19
1+AEIlZ4-IA
= XM4X+
2c
+=
xx81z+3= Z= Z=+)l XSA=
z
2I e3 11mx
I M 1 +=R+
1
22
Z I 1*X3Ll-A 1 *
3X3
1
X)3
I
23
11
+e-)
e-i1
A
1)
r24
*13 WI) e
1A
ICZ =4
+A
1638-Z
+ ) 36 Z1
26 =
1X1ZA
6 L= )ZI X++
1 1-
3217 1-2+) 3M+X84
I2T3
TT1
A 8 =+
28
+-
=1X AB3
Z -ZI + A -
I=
29
2 14
++Z
3 1 +3C = = + AAXA- A ++
=-1
+
31
I-
1+-361 )=
3+- +X=1
+ =X
32
M 10A6X1388XX M4
A
=M+
I+=
33
JA
X8JZ1 +A=
1
+1 34
1
§ZA=
ZZXAZ
+
ZA=)
+4-
414+4 35
1A I -+ MAU2 Z
ABA
136 + ==
M6U))
1+)
X
37 I+)
1=
11xAZ1 =-4A++A))
138 P1 -=
=36M 1=XZ 1
3=
1=
12345678901234567S93 1234567a9C1234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8,21) L/EDGE WAS LOCATED AT (17, 11)
R/EDGE WAS LOCATED AT (17,25) L/EYE WAS LOCATED AT (17,14) R/EYE WAS LOCATED AT (17,20) NOSE WAS LOCATED AT (21,16) MOUTH WAS LOCATED AT (23,16) (b)
(b) Incorrect embedding of nose under random noise.
79
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES 12345678901234567890 12345678901234567890
1
3
65
__ _ _m_ if8mZ-
6
s+A6ffISlSOO@SSG 1
7 366 I{f3E66ISII§@
7
9
AOI@8866eUSS640531X
10
_
v3U6G9SMAA8G6I66Z
11 _
__ XKSfiIIItl
}+-__+11166.__
.-__ 12
XflftfE+- +met68+
13
AIIIE X)-
1M6410- 14
AGOM +XI
+ 15
.11Z
8
+
11
16 8IMAMPMX+
=))++Si6X
17
3 ZXLxIS+ Is 8
} SSXAMMZXZ =IAXI=+Zii+
iS
3M6ZlZl=Xl = I) 1A
20 A81=+==Z) IX 21 4-61- 4Z4 Zt 22
+4- Al 23
6X81
)Z-=M+ 24 MA1= -ZZ==
6e25 *XZ++..+r-
ZA 26 )32111)+= 1t 27 4+X11X2++=4+
28
+XZ8113=
++ 29
lAXZ3l)'
=1+)
+
30
lAMAZI)=)
331 +XMPAZ1+-+)1= 32 =43XAMA211134
4+
33 += =IZXXZI+--
1
34 + +XZ12Z4 lXZ 35 1Z362)33+- +AXW- 36
11MMAI-
1AM6 37 511AMX+ XX88- 3e =1 zzz) ZZZZ 1234567890123456789012345678901234567890
Original picture.
* e* eew *
ssss ww.si{
06@ 1.*S~@ i*I66M6e@
I3 Z346060(
'U Z 5t)IIIIUL(5I4OO
0SW
U Z10 1CDO I 4 0*MiN
I*@ ii0@l00
016
7116.*
$6
iiB @ *
661*11600I@@
9~
0600
1M166M#ei#@ B@ R
s86 *!k
I3*I0i10C
10 1ti{ i i 6 1 1CEf81M111666M01184I0S6001S666
1~2 -f
56B
M 0g 6
10 6B 0- 30O101i9*ElE6Ol00
14 _00010631(46*116116000*11160069100i
161{{
0v 16{i! 910
1 7 00!* il@X4l@
19 * E*IEH 8l*80M 6088@ESMI 2E(8601 00O 2 C *6*66M66<f
MeE 1M 886MIM
2 20O00OOPf
HAA MAA AA A889MM6§6866#@EM 8i1*10
2 3.* {e 88
AM688 M 8eMA686A6881*OIO" 24 'AA 8AX 8M MMMs X 48MAX 00Jt@ 25 6606@OAM MXZMAhZA 86I8PRA M@868 X AP-EMMI1IO 2 616*fI1AdAM88M X2ZX2X AAZ AM8M AA6AAMA1MI*6
2 7 0*iAA MM(86068M 66fSE Q
>M'.4M>A7@
28 OOOOO0AA XXXZ1XAA4AAAXXAAAAAPAI'AMYM-E0101 26 600060A KXX XAMAX 2AA 2X 8XXXAAXX 2182IOOI1C 3CSOOS06XXXAAZXAA8X2XXZ
XXXZXXAAKA8O0O00Ci#
31006000XX2ZX
) ZA66228821 ZAXX23Z) ZZ100000 32 *06601X AA8XZX1 4-33))1XAXAXI8131Z ZX 34"*0000000611 *4*068§*66600W100010000
37 6f f i 4 lcl
00a l 38*06006000016000*@@e*1e*06116060000000000 1-2
12345678'C123456T8901234567893 1234557890L(EV)A for nose. (Density at a point is proportional
to probability that nose is present at that loca- tion.)1234567$Ol123456780 12345678901234567Q9qO
1 )M =+-I
_=
2 =
3- Z- + = 1
33 z =1
+ +m 4
1ZiW IA
11=
X + 5 Z
1lIZMSS4l Z=
X = -L
3-
=
8XA@* 14
ifiVMM
X) 7 1- +
1MMXs1Ul1.IiHX
Z *
1+ 8
1Z
) 1flSA.MZMZ0Se
+ Z == +
44zs
ws
~~~--+)- -~~
JC ZM =
,,W81-MPLZAl4§.MlMX3
2Z
111 z-
+796S50)
+L 12
1)
1 1 At4oGN XK
)RaiSo
13
3=X-Pf8XI8
A + + + iV@ X X8 +X M
14
1 =ZiHA9ZfS 1 1 Z Z61MX +
15
A
I+ *I,RXX
X M+ I= afXM=+ = Z
16
3Z,Z+@Z
48
84
66-) xz--
17
Z7-iXA61=AL
Z 68-ZI
3%6F3M
+
)12
18s-+- 4+---+}1
ISZ@1@= -lX-- -
Al
17--
I19
1+AEIlZ4-IA
= XM4X+
2c
+=
xx81z+3= Z= Z=+)l XSA=
z
2I e3 11mx
I M 1 +=R+
1
22
Z I 1*X3Ll-A 1 *
3X3
1
X)3
I
23
11
+e-)
e-i1
A
1)
r24
*13 WI) e
1A
ICZ =4
+A
1638-Z
+ ) 36 Z1
26 =
1X1ZA
6 L= )ZI X++
1 1-
3217 1-2+) 3M+X84
I2T3
TT1
A 8 =+
28
+-
=1X AB3
Z -ZI + A -
I=
29
2 14
++Z
3 1 +3C = = + AAXA- A ++
=-1
+
31
I-
1+-361 )=
3+- +X=1
+ =X
32
M 10A6X1388XX M4
A
=M+
I+=
33
JA
X8JZ1 +A=
1
+1 34
1
§ZA=
ZZXAZ
+
ZA=)
+4-
414+4 35
1A I -+ MAU2 Z
ABA
136 + ==
M6U))
1+)
X
37 I+)
1=
11xAZ1 =-4A++A))
138 P1 -=
=36M 1=XZ 1
3=
1=
12345678901234567S93 1234567a9C1234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8,21) L/EDGE WAS LOCATED AT (17, 11)
R/EDGE WAS LOCATED AT (17,25) L/EYE WAS LOCATED AT (17,14) R/EYE WAS LOCATED AT (17,20) NOSE WAS LOCATED AT (21,16) MOUTH WAS LOCATED AT (23,16) (b)
(b) Incorrect embedding of nose under random noise.
79
FISCHLER AND ELSCHLAGER: PICTORIAL STRUCTURES
12345678901234567890 12345678901234567890
1
3
I lZAMAI+
65
__ _ _m_
if8mZ-
6
s+A6ffISlSOO@SSG 1
7
366 I{f3E66ISII§@
7
9
AOI@8866eUSS640531X
10
_
v3U6G9SMAA8G6I66Z
11 _
__ XKSfiIIItl
}+-__+11166.__
.-__ 12
XflftfE+- +met68+
13
AIIIE X)-
1M6410- 14
AGOM +XI
+ 15
.11Z
8
+
11
16 8IMAMPMX+
=))++Si6X
17
3 ZXLxIS+ Is
8
} SSXAMMZXZ
=IAXI=+Zii+ iS 3M6ZlZl=Xl = I) 1A 20 A81=+==Z)
IX 21 4-61- 4Z4 Zt 22
+4-
Al 23
6X81
)Z-
=M+ 24 MA1= -ZZ==
6e
25 *XZ++..+r-
ZA 26
)32111)+=
1t 27 4+X11X2++=4+
28
+XZ8113=
++
29 lAXZ3l)'
=1+)
+
30
lAMAZI)=)
3
31 +XMPAZ1+-+)1= 32 =43XAMA211134
4+
33
+= =IZXXZI+--
1
34
+ +XZ12Z4
lXZ 35 1Z362)33+- +AXW- 36
11MMAI-
1AM6 37 511AMX+
XX88- 3e =1 zzz)
ZZZZ
1234567890123456789012345678901234567890 Original picture.
* e* ee
w
*
ssss ww.si{
06@ 1.*S~@ i*I66M6e@
I3 Z346060(
'U
Z 5t)
IIIIUL(5I4OO
0SW
U
Z10 1CDO I4 0*
MiN
I*@ ii
0@l00
016
7
116.*
$6
ii
B @ *
661*
11600I@@
9~
0600
1M166M#ei#@ B@ R
s86 *!k
I3*I0i10C
10
1ti{ i i 6
1 1CEf81M111666M01184I0S6001S666
1~2 -
f
56B
M 0g 6
10
6B
0- 3
0O101i9*ElE6Ol00
14 _00010631(46*
116116000*11160069100i
16
1{{
0v
1
6{i! 910
1 7
00!* il@X
4l@
19
*
E*IEH 8l*80M 6088@ESMI 2E(860100O 2 C *6*66M
66<f
MeE 1M 886MIM
2 2
0O00OOPf
HAA
MAA AA A889MM6§6866#@EM 8i1*10
2 3.
* {e 88
AM688 M 8eMA686A6881*OIO" 24
'AA 8AX 8M
MMMs X
48MAX 00Jt@ 25 6606@OAM MXZMAhZA 86I8PRA M@868 X AP-EMMI1IO
2 6
16*fI1AdAM88M X2ZX2X AAZ AM8M AA6AAMA1MI*6
2 7
0*iAA MM(86068M
66fS
E Q
>M'.4M>
A7@
28 OOOOO0AA XXXZ1XAA4AAAXXAAAAAPAI'AMYM-E0101 26 600060A KXX XAMAX 2AA
2X 8XXXAAXX 2182IOOI1C3CSOOS06XXXA
AZXAA8X2XXZ
XXXZX
XAAKA8O0O00Ci#
31
006000XX2ZX
) ZA66228821 ZAXX23Z) ZZ100000
32 *06601X AA8XZX1
4-33))1XAXAXI8131Z ZX
34
35
"*0000000611 *4*068§*66600W100010000
37 6f f i 4 l
cl
00a
l 38
*06006000016000*@@e*1e*06116060000000000
1-2
12345678'C123456T8901234567893 1234557890
L(EV)A for nose. (Density at a point is proportional
to probability that nose
is present at that loca-
tion.)
1234567$Ol123456780 12345678901234567Q9qO
1 )M =
+-I
+
_=
2 =
3- Z- + =
1
3
3 z
=1
+ +
m 4
1
1ZiW
IA
1
1=
X + 5 Z
1
lIZMSS4l Z=
X
= -L
3-
=
8X
A@* 14
ifiVMM
X) 7 1-
+
1MMXs1Ul1.IiHX
Z *
1
+ 8
1Z
) 1flSA.MZMZ0Se
+ Z == +
44zs
ws
~~~--+)- -~~
JC ZM =
,,W81-MPLZAl4§.MlMX3
2
Z
1
11 z-
+796S50)
+L
12
1)
1
1 At4oGN X
K
)RaiSo
13
3=X-Pf8XI8
A + + + iV@ X X8
+X M 14
1 =ZiHA9ZfS 1 1
Z Z61M
X +
15
A
I+ *I,RXX
X M+
I= afXM=+
=
Z
16
3
Z,Z+@Z
48
84
66-) xz--
17
Z7-iXA61=AL
Z 68-ZI
3%6F3M
+
)
12
18s-+- 4+---+}1
ISZ@1@= -lX-- -
Al
17--
I
19
1
+AEIlZ4-IA
= XM
4X+
2c
+=
xx81z+3= Z= Z=+)l XSA=
z
2I e3 11mx
I M
1 +
=R+
1
22
Z I 1*X3Ll-A 1 *
3X3
1
X)3
I
23
11
+e-)
e-i1
A
1)
r24
*
13 WI) e
1A
I
CZ =4
+
A
1638-Z
+
) 3
6 Z1
26 =
1
X1ZA
6 L= )ZI X++
1 1-
3217
1-2+) 3M+X84
I
2T3
TT1
A 8 =+
28
+-
=
1X AB3
Z -ZI + A -
I
=
29
2 14
++Z
31 +
3C
= = +
AAXA-
A ++
=-1
+
31
I-
1+-361 )=
3
+- +X=1
+ =X
32
3
M 10A6X1388XX M4
A
=M+
I
+=
33
J
A
X8JZ1
+A=
1
+1 34
1
§ZA=
ZZXAZ
+
ZA=)
+4-
4
14+4 35
1
A I -+ MAU2 Z
ABA
1
36
+ ==
M6U)) -36AA=1 +1
1+)
X
37 I+)
1=
1
1xAZ1 =-4A++A))
1
38 P1 -=
=36M 1=XZ 1
3=
1
=
12345678901234567S93 1234567a9C1234567890
Noisy picture (sensed scene) as used in experiment.
HAIR WAS LOCATED AT (8,21)
L/EDGE WAS LOCATED AT (17, 11) R/EDGE WAS LOCATED AT (17,25) L/EYE WAS LOCATED AT (17,14) R/EYE WAS LOCATED AT (17,20) NOSE WAS LOCATED AT (21,16) MOUTH WAS LOCATED AT (23,16) (b)
(b) Incorrect embedding of nose under random noise.
79
Localize object via MAP estimate in pairwise MRF with rigid geometry.
p(L | θ) =
p(I | L, θ) = p(I | L, u) ∝
n
p(I |li, ui).
Felzenszwalb & Huttenlocher, 2005
Shape Completion and Animation of People, Anguelov et al. 2004
Zuffi, Freifeld, & Black, CVPR 2012
Zuffi, Freifeld, & Black, CVPR 2012
von Mises distribution
Gaussian distribution
PCA model of part shape
Matrix-vector multiplication and discrete maximization.
Messages are functions with no analytic form. Nonlinear optimization.
Message Update: Message Update:
Head Upper Arm Lower Arm Torso
1 2 3 4 5 T
T-1
Augment Particles 1
Augmented Set
Head Upper Arms Lower Arms Torso Proposal
Augment Particles 1
Max-Product Update
2
Colors
Augment Particles 1
Max-Product Update
2 Select Particles 3
Colors
Chamfer Distance Likelihood Random Initialization
Colors
Example Runs
G-PMP: Trinh & McAllester 2009
T-PMP: Generalization of PatchMatch BP , Besse et al. 2012
Colors
Example Runs
LP : Linear Program relaxation IP: Optimal solution by brute force Greedy: Efficient approximation
All Particles
All Particles Selected
All Particles Selected
All Particles Selected
Margin
Margin Maximum
Margin Maximum
Margin Maximum
Margin Maximum
Example Runs
Colors
[ Pacheco et al., ICML 2014 ]
Pose Error of MAP Estimate
G-PMP T-PMP D-PMP D/T-PMP
Log Probability of MAP Estimate
G-PMP T-PMP D-PMP D/T-PMP
Top 3 arm hypotheses MAP estimate, 2nd and 3rd modes for upper arm (magenta, cyan), lower arm (green, ).
Detection
# Solutions
Park & Ramanan 2011] [ Pacheco, Zuffi, Black & Sudderth, ICML 2014 ]
D-PMP Particles
Colors
Mode Estimates
[ Pacheco, Zuffi, Black & Sudderth, ICML 2014 ]
Frame t Frame t+1
Data and Optical Flow
t t+1
Gradients: Encode object and motion boundaries via HOG / HOF . Appearance: 2D histogram of A/B color channels in L*a*b*
HOF HOG
Structural prior identical to DS. Part Motion: Scale mixture captures heavy tailed statistics of motion between frames.
Color
Extension of the Flowing Puppets model [Zuffi et al., 2013]
1 2 3
[Wainwright et al., 2005]
[Wainwright et al., 2005]
Left Right Disparity
Augment Particles 1 RMP Update 2 Select Diverse 3
Colors
D-PMP T-PMP
Colors
Both right arm hypotheses
Trp (W) Phe (F) Leu (L) Val (V)
Backbone Sidechain
Sidechains Backbone
300o 180o 60o
Rotamers
[Shapovalov & Dunbrack 2007] Truth Rotamers
x5 x6 x7 x8 x4 x3 x1 x2 x9
[ Image: Harder et al., BMC Informatics 2010 ]
[ Image: Harder et al., BMC Informatics 2010 ]
Atomic Interaction Rotamer Likelihood
Augment Particles 1 RMP Update 2 Select Diverse 3
Gradient Optimization
2 Rosetta Energy 3
Rotamer Proposal
1 Accept / Reject 3
[ Pacheco et al., ICML 2015 ]
Truth Rotamers Rosetta D-PMP
Penicillin Acylase Complex, Trp154 [Shapovalov & Dunbrack 2007]