1
play

1 The Solution Approach Recognition Framework Model Pictorial - PDF document

Pictorial Structures Efficient Matching of Pictorial Structures Collection of parts arranged in a deformable configuration Local appearance Part models Parts feature detection 01/30/2007 Global geometry


  1. Pictorial Structures Efficient Matching of Pictorial Structures “Collection of parts arranged in � a deformable configuration” Local appearance � Part models – Parts ≠ feature detection – 01/30/2007 Global geometry � Not necessarily fully connected – Pushkala Iyer graph Joint optimization � Combine appearance – and geometry without hard constraints � “Stretch and fit” � Qualitative Sparse representation + Computationally tractable (10 5 pixels � 10 1 -- 10 2 parts) + Generative representation of class History of related work + Avoid modeling global variability + Success in specific object recognition � Fischler and Elschlager original 1973 paper � Burl, Weber and Perona ECCV 1998 – Probabilistic formulation – Full joint Gaussian spatial model – Computational challenges led to feature-based � Felzenszwalb and Huttenlocher CVPR 2000 – Explicit revisiting of FE73 for trees – Probabilistic MAP estimates – Efficient algorithms using distance transforms - Throw away most image information - Parts need to be distinctive to separate from other classes The Matching Problem Matching Problem � Find the best placement of parts in an image – How well does each part match the image ? – How well do all they all fit together ? � Minimize a certain energy function 1

  2. The Solution Approach Recognition Framework Model Pictorial Structure model [EF73] � Graph Model G = (V, E) � Restrictions on relationships � – Parts are the vertices V = {v 1 , v 2 , … v n } Tree structure – Natural skeletal structure of many animate objects – If v i , v j are connected, then (v i , v j ) є E. – Dynamic programming – � Instance of a part in an image specified by location l . Pairwise relationships � – Position, Rotation, Scale for 2D parts. Broad range of objects – � Match cost function m i (I, l ) measures how well the Generalized Distance Transforms – part matches the image I when placed at location l . Globally best match of generic objects � FH2000 vs other approaches � Deformation cost function d ij ( l i , l j ) for every edge (v i , � Perona et al – central coordinate system, limited to one articulation point. – v j ) measures how well the locations l i of v i and l j of v j No hard decisions. – Valid configurations are not treated as being equally good. agree with the object model. – Model Framework Problem reduction � A configuration L = ( l 1 , l 2 , …, l n ) specifies a � Minimization of L* = arg min L ( ∑ (vi,vj) є E d ij ( l i , l j ) + ∑ vi є V m i (I, l i )) is O(m n ) location for each of the parts v i in V w.r.t the Where m is the number of discrete values for each l i and n is the – number of vertices in the graph. image. Markov Random Fields, Dynamic Contour Models (snakes). – � Best configuration is the configuration that � Restricted graphs reduce time complexity For first order snakes (chain) reduces to O(m 2 n) from O(m n ) minimizes the total cost: match cost of – Dynamic programming - is a method of solving problems exhibiting – individual parts + pair wise cost of the the properties of overlapping subproblems and optimal substructure in a way better than naïve methods.(Wikipedia) connected pairs of parts. Memoization and bottom-up approach. – � L* = arg min L ( ∑ (vi,vj) є E d ij ( l i , l j ) + ∑ vi є V m i (I, l i )) Tree structured graphs enable similar reduction to be achieved. – Problem Reduction Efficient Minimization O(m 2 n) algorithm is not practical – large number of possible locations � � Dynamic programming to find the for each part. Restriction on pairwise cost function dij yields a minimization algorithm configuration L* = ( l 1 *, …. l n *) that minimizes � that is O(mn). the cost. d ij ( l i , l j ) = || T ij ( l i ) – T ji ( l j ) || � d ij measures the degree of deformation. – � Computation involves n-1 functions, each of Is restricted to be a Norm. – A norm is a function which assigns a positive length or size to all vectors in – which specifying the best location of one part a vector space, other than the zero vector. (wikipedia) T ij and T ji should be invertible, together capture the ideal relative – w.r.t the possible locations of another part. configurations of parts v i and v j . T ij ( l i ) = T ji ( l j ) => l i and l j are ideal locations for v i and v j – T ji ( l j ) should be discretized in a grid. – 2

  3. Efficient Minimization Efficient Minimization The best location of a leaf node v j (6) � For non leaf vertices v j ≠ v r , if B c ( l j ) � can be computed as a function of the is known for every child v c є of C j , location of just its parent v i (5). then the best location of v j given its Only contribution of l j to the energy is � parent v i is d ij (l i ,l j ) + m j (I,l j ) – the contribution of the edge (5,6) and the position of 6. Best location of v j given location l i of � B j (l i ) = v i is min lj (d ij ( l i , l j ) + m j (I, l j ) + ∑ vc є Cj B c ( l j )) B j (l i ) = min lj (d ij (l i ,l j ) + m j (I, l j )) � Replacing min with arg-min yeilds Replacing min by argmin, we get the � the best location of v j as a function best location of v j as a function of the of l i . location l i of its parent v i . Efficient Minimization Algorithm � For the root node v r , if B c ( l r ) is � Recursive equations specify an algorithm. � For every leaf node, compute its best location as a function of known for every child v c є C r , the location of its parent. then the best location of the � For every non leaf node X, compute its best location as a root is function of the location of X’s parent, also taking into consideration the cost of placing X’s children (previous step). L r * = � Repeat until the best location of the root is calculated. arg min l r (m r (I, l r ) + � Now traverse the tree starting at the root, to find the optimum ∑ vc є Cr B c ( l j )) configuration. � O(nM) – n (# nodes) M (time to compute B j (l i ) and B’ j (l i )) Distance Transforms Generalized Distance Transforms D B (Z) = min w є B || z- w || where B is a subset of G A distance transform , also � � � D B (Z) = min w є G (|| z- w || + 1 B (w)) where 1 B (w) is an indicator function for membership in known as distance map or B. distance field , is a representation of a digital Algorithm (G.Borgefors) computes this in O(mD) time for a D dimensional grid. � image. (Wikipedia) - Two pass, local neighborhoods of 7x7 pixels used. Meijster, Roerdink & Hesselink: � The map supplies each pixel of � - Generic distance transform algorithm in linear time. the image with the distance to - 2 Phases, first columnwise, second rowwise, each phase 2 scans. the nearest obstacle pixel . A - Per row computation independent of per column computation, can be parallelized. most common type obstacle pixel is a boundary pixel in a D f (Z) = min w є G (|| z- w || + f(w)) � binary image. How that helps: � An example of a chessboard � distance transform on a binary Given d ij ( l i , l j ) = || T ij ( l i ) – T ji ( l j ) || � image. B j (l i ) = D f (T ij (l i )) where f(w) = m j (I, T ji -1 (w)) + ∑ T ji -1 (w)) � 3

  4. Computation Person Model D is 4 (x, y, rotation, scale) � Flexible revolute joints. � D[x,y, θ ,s] is initialized to the values of function f(w) � � Ideal relative orientation given by θ ij . D[x,y, θ ,s] = min(D[x, y, θ ,s], � � Deformation cost measures observed D[x-1, y, θ , s] + k x , deviation from ideal. D[x, y-1, θ , s] + k y , D[x, y, θ -1, s] + k θ , � Given the observed locations l i = ( θ i , s i , x i , y i ) D[x, y, θ , s-1] + k s ) and l j = ( θ j , s j , x j , y j ) D[x,y, θ ,s] = min(D[x, y, θ ,s], � θ |( θ j – θ i ) – θ ij | � d ij ( l i , l j ) = w ij D[x+1, y, θ , s] + k x , s |(log s j – log s i ) – log s ij | D[x, y+1, θ , s] + k y , + w ij D[x, y, θ +1, s] + k θ , x | x’ ij – x’ ji | + w ij D[x, y, θ , s+1] + k s ) y |y’ ij – y’ ji | + w ij Doesn’t consider periodic θ . � Special handling of boundary cases, additional passes. s ,w ij x ,w ij y and Small w ij � � Large w ij θ B j (l i ) computable in O(m) time. � Recognition Results Car Model � Flexible prismatic joints � d ij ( l i , l j ) = Infinity |( θ j – θ i )| s |(log s j – log s i ) – + w ij log s ij | x | x’ ij – x’ ji | + w ij y |y’ ij – y’ ji | + w ij Bayesian Formulation Summary � Best match by MAP estimation No decisions until the end. � No feature detection – L* = arg max L (Pr(L|I)) � Quality maps or likelihoods – No hard geometric constraints � Applying Bayes rule, � Deformation costs or priors Efficient algorithms. � L* = arg max L (Pr(I|L) Pr(L)) Dynamic programming critical – Not applicable to all problems, need good factorizations of geometry and appearance – � Prior information – from spring connections Good for categorical object recognition. � Qualitative descriptions of appearance – � Likelihood – approx product of match qualities for Factoring variability in appearance and geometry – Deals well with occlusion. individual parts. � In contrast to hard feature detection – Most applicable to 2D objects defined by relatively small number of parts. � To minimize the energy function, take the negative � � Unclear how to extend to large number of transformation parameters per part. logarithm: Explicit representation grows exponentially – No known way of using to index into model databases. � L* = arg min L ( ∑ (vi,vj) є E d ij ( l i , l j ) - ∑ vi є V ln g i (I, l i )) 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend