6 869 model based vision
play

6.869 Model-based Vision Topics: Advances in Computer Vision - PDF document

6.869 Model-based Vision Topics: Advances in Computer Vision Hypothesize and test Interpretation Trees Prof. Bill Freeman Alignment Interpretation trees Model-based vision Hypothesis generation methods Hypothesize


  1. 6.869 Model-based Vision Topics: Advances in Computer Vision – Hypothesize and test • Interpretation Trees Prof. Bill Freeman • Alignment – Interpretation trees Model-based vision – Hypothesis generation methods • Hypothesize and test • Interpretation Trees • Pose clustering • Alignment • Invariances • Pose Clustering • Geometric hashing • Geometric Hashing – Verification methods Readings: F&P Ch 18.1-18.5 1 2 Object recognition as a function of Paths to computer vision research time in computer vision research Recognizing object Recognizing instances Picking identical classes, material of textured objects parts from a pile properties Computer science Electrical engineering, physics Tools: Tools: Computer vision Binary numbers, Real numbers, Counting, Probabilities, Threshold tests, Soft decisions, Graph cuts. Belief propagation. ~1985 ~1995 ~2005 dollarfifty.tripod.com/ pho/004lg.jpg http://images.google.com/imgres?imgurl=http://www.displayit- 3 4 http://www.fanuc.co.jp/en/product/robot/rob info.com/food/images/desserts/2131.JPG&imgrefurl=http://www.displayit- otshow2003/image/m-16ib20_3dv_e.gif info.com/food/dessert6.html&h=504&w=501&sz=181&tbnid=FXJATGzVyA4J:&tbnh=128&tbnw=127&st art=13&prev=/images%3Fq%3Dice%2Bcream%2Bsundae%26hl%3Den%26lr%3D%26sa%3DG Approach Hypothesize and Test Recognition • Given • Hypothesize object identity and correspondence – Recover pose – CAD Models (with features) – Render object in camera – Detected features in an image – Compare to image • Hypothesize and test recognition… • Issues – Guess – where do the hypotheses come from? – Render – How do we compare to image (verification)? – Compare 5 6 1

  2. Features? How to generate hypotheses? • Points • Brute force but also, – Construct a correspondence for all object features to • Lines every correctly sized subset of image points • Conics – Expensive search, which is also redundant. • Other fitted curves – L objects with N features • Regions (particularly the center of a region, etc.) – M features in image – O(LM N ) ! • More descriptive local features (eg work by Schmid and Lowe). “…of intermediate complexity, which means that they are distinctive enough to determine likely matches in a large database of features, but are sufficiently local to be insensitive to clutter and occlusion”. (Lowe, CVPR01) 7 8 Brute force method Ways around that combinatorial L models image explosion A B C M pts N pts • Add geometric constraints to prune search, leading to interpretation tree search Try all M image feature points for a model point, • Try subsets of features (frame groups)… Then try all M-1 remaining image feature points for another model point, then all M-2 for the next, etc. M * (M-1) * (M-2) …* (M-N+1) for each of L models= O(LM N ) 9 10 Frame groups Adding constraints • A group of features that can yield a camera hypothesis. • If you know the intrinsic parameters of your camera, then these are the set of features needed to specify the object’s • Correspondences between image features and pose relative to the camera. model features are not independent. • With a perspective camera model, known intrinsic camera parameters, some frame groups are: • A small number of good correspondences yields a reliable pose estimation --- the others must be consistent with this. • Generate hypotheses using small numbers of correspondences (e.g. triples of points for a 3 points Trihedral vertex, and a Dihedral vertex, calibrated perspective camera, etc., etc.) point (for scale) and a point 11 12 2

  3. Pose consistency / Alignment Rendering an object into the image Perspective camera • Given known camera type in some unknown configuration (pose) – Hypothesize configuration from set of initial features – Backproject – Test 13 14 Rendering an object into the image A frame group for an affine camera model Affine camera Affine camera = Π = Π Rendering ith 3d pt to 2d Rendering ith 3d pt to 2d p AP p AP image position image position i i i i General affine ⎛ ⎞ a a a a ⎜ ⎟ Orthographic 00 01 02 03 General affine transformation transformation ⎛ ⎞ ⎛ ⎞ 1 0 0 0 ⎜ ⎟ a a a a a a a a camera ⎜ ⎟ = 10 11 12 13 Orthographic camera A ⎜ ⎟ Π = ⎜ ⎟ 00 01 02 03 ⎜ ⎟ 0 1 0 0 a a a a ⎜ ⎟ ⎜ ⎟ ⎛ ⎞ 20 21 22 23 1 0 0 0 ⎜ ⎟ ⎜ ⎟ a a a a ⎝ 0 0 0 1 ⎠ ⎜ ⎟ ⎝ 0 0 0 1 ⎠ = 10 11 12 13 A ⎜ ⎟ Π = ⎜ ⎟ 0 1 0 0 a a a a Relating observed 2-d positions to 3-d model positions ⎜ ⎟ ⎜ ⎟ + + + 20 21 22 23 ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ p a P a P a P a P ⎝ ⎠ 0 0 0 1 ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ = 0 0 0 1 i 0 00 i 0 01 i 1 02 i 2 03 i 3 ⎜ ⎟ ⎜ ⎟ + + + ⎝ p ⎠ ⎝ a P a P a P a P ⎠ i 1 10 i 0 11 i 1 12 i 2 13 i 3 Need at least 4 points in general position to determine the affine camera parameters. (Note: only the 1 st 2 rows of A contribute to the projection, so we only need to 15 16 estimate them.) Alignment algorithm 17 18 3

  4. More than 1 object in image Model-based Vision Topics: • Require same intrinsic camera parameters – Hypothesize and test for each object. • Interpretation Trees • Alignment – Interpretation trees – Hypothesis generation methods • Pose clustering • Invariances • Geometric hashing – Verification methods 19 20 Interpretation Trees Interpretation Trees • Tree of possible model-image feature assignments • Depth-first search • Prune when unary (binary, …) constraint violated – length – area – orientation (a,1) (b,2) … “Wild cards” handle spurious image features … [ A.M. Wallace. 1988. ] 21 22 http://faculty.washington.edu/cfolson/papers/pdf/icpr04.pdf Model-based Vision Topics: • How does the hypothesize and test method – Hypothesize and test fail? • Interpretation Trees – False matches • Alignment – Too many hypotheses to consider – Interpretation trees – Hypothesis generation methods • Pose clustering • To add robustness and efficiency, use other • Invariances heuristics to select candidate object poses • Geometric hashing – Verification methods 23 24 4

  5. Pose Clustering Pose clustering • Each model leads to many correct sets of correspondences, each of which has the same pose • Vote on object pose, in an accumulator array (per object) • This is a computer science approach to doing a more probabilistic thing: treating each set of feature observations as statistically independent and multiplying together their probabilities of occurrence to obtain a likelihood function. 25 26 Two models used in an early pose clustering system Pose clustering Problems – Clutter may lead to more votes than the target! – Difficult to pick the right bin size Confidence-weighted clustering – See where model frame group is reliable (visible!) – Downweight / discount votes from frame groups at poses where that frame group is unreliable… – Again, we can make this more precise in a probabilistic framework later. 27 28 Test image, with edge points marked pick feature pair dark regions show reliable-pose-estimate views of those 29 30 features over the viewing sphere 5

  6. Image with edges of found models overlaid 31 32 A more recent pose/view clustering example Detected airplanes, rerendered at their detected poses. (Note mis-estimated • “Local feature view clustering for 3D object recognition”, pose of plane on runway.) by David Lowe (see his web page for copy). • Schmid, Lowe incorporate “super-features”, point features with robust local image descriptors 33 34 Detecting 0.1% inliers among 99.9% outliers? Lowe’s Model verification step • Example: David Lowe’s SIFT-based Recognition system • Examine all clusters with at least 3 features • Goal: recognize clusters of just 3 consistent features • Perform least-squares affine fit to model. among 3000 feature match hypotheses • Discard outliers and perform top-down check for • Approach additional features. – Vote for each potential match according to model ID and pose • Evaluate probability that match is correct – Insert into multiple bins to allow for error in similarity – Use Bayesian model, with probability that features approximation would arise by chance if object was not present – Using a hash table instead of an array avoids need to – Takes account of object size in image, textured regions, form empty bins or predict array size model feature count in database, accuracy of fit (Lowe, CVPR 01) 35 36 [Lowe] [Lowe] 6

  7. Solution for affine parameters Models for planar surfaces with SIFT keys: • Affine transform of [x,y] to [u,v]: • Rewrite to solve for transform parameters: 37 38 [Lowe] [Lowe] 3D Object Recognition Planar recognition • Extract outlines with background • Planar surfaces can be subtraction reliably recognized at a rotation of 60° away from the camera • Affine fit approximates perspective projection • Only 3 points are needed for recognition 39 40 [Lowe] [Lowe] 3D Object Recognition Recognition under occlusion • Only 3 keys are needed for recognition, so extra keys provide robustness • Affine model is no longer as accurate 41 42 [Lowe] [Lowe] 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend