today
play

Today Some logistics Overview lecture on recognition models - PDF document

Today Some logistics Overview lecture on recognition models Visual Recognition and Search Discussion of bag-of-words and constellation model approaches January 25, 2008 Schedule Demo guidelines Implement/download code for a core


  1. Today • Some logistics • Overview lecture on recognition models Visual Recognition and Search • Discussion of bag-of-words and constellation model approaches January 25, 2008 Schedule Demo guidelines Implement/download code for a core idea in the paper and show us toy examples: • Experiment with different types of (mini) training/testing data sets • Evaluate sensitivity to parameter settings • Show (on a small scale) an example in practice that highlights a strength/weakness of the approach • Want to consider illustrative example, not a system Timetable for presenters Demo presentation format • By the Wednesday the week before: • Give algorithm, relevant technical details – email slides to me, schedule time to meet • Describe scope of experiments and discuss. • Present the experiments, explain rationale • Week of: for outcomes – refine slides, practice presentation, know about how long each part requires. • Conclude with a summary of the • Day of: messages – send me final slides as PDF file For Feb 1 and Feb 8 presenters: by upcoming Wednesday and Friday

  2. Reviews • Submit one review per week unless you are presenting (but read all assigned papers) • Evaluation: 0 none 1 “check –”: little effort/reflection 2 “check”, good review 3 “check+”, very good review Possible levels of recognition Recognition questions Categories – How to represent a category or object – How to perform the recognition building butterfly butterfly building (classification, detection) with that representation – How to learn models, new Specific objects categories/objects Wild card Tower Bridge Bevo Functional Representations Learning • What defines a category/class? • What distinguishes classes from one another? • How to understand the connection between the real world and what we observe? Parts + structure Model-based Appearance-based • What features are most informative? • What can we do without human intervention? • Does previous learning experience help learn the next category? Multi-view Bag of features

  3. Learning situations • Varying levels of supervision Contains a motorbike – Unsupervised – Image labels – Object centroid/bounding box – Segmented object – Manual correspondence (typically sub-optimal) Inputs/outputs/assumptions Inputs/outputs/assumptions • What input is available? • What is the goal ? – Static grayscale image – Say yes/no as to whether an object present in image – 3D range data – Determine pose of an object, e.g. for robot – Video sequence to grasp it – Multiple calibrated cameras – Categorize all objects – Segmented data, unsegmented data – Forced choice from pool of categories – CAD model – Bounding box on object – Labeled data, unlabeled data, partially – Full segmentation labeled data – Build a model of an object category Outline Model-based recognition • Overview of recognition background • Which image features correspond to which features on which object model in the – Model-based “modelbase”? – Appearance-based • If enough match, and they match well with a – Local feature-based particular transformation for given camera • Features and interest operators model, then • Bags of words – Identify the object as being there • Constellation models/part-based models – Estimate pose relative to camera

  4. Hypothesize and test: main idea How to form a hypothesis? • Given model of object Given a particular model object, we can • New image: hypothesize object identity and pose estimate the correspondences between • Render object in camera image and model features • Compare rendering to actual image: if close, good hypothesis. Use correspondence to estimate camera pose relative to object coordinate frame Brute force hypothesis generation Generating hypotheses • For every possible model, try every possible We want a good correspondence between subset of image points as matches for that model features and image features. model’s points. – Brute force? • Say we have L objects with P features, N features found in the image N pts P pts Generating hypotheses Pose consistency / alignment • Key idea: We want a good correspondence between – If we find good correspondences for a small model features and image features. set of features, it is easy to obtain correspondences for a much larger set. – Brute force? • Strategy: – Prune search via geometric or relational constraints: interpretation tree – Generate hypotheses using small numbers of correspondences (how many depends on – Pose consistency: use subsets of features to camera type) estimate larger correspondence – Backproject: transform all model features to – Voting, pose clustering image features – Verify

  5. 2d affine mappings 2d affine mappings • Say camera is looking down perpendicularly on [image point] [model point] ⎡ ⎤ planar surface ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ t u m m x In non- = + 1 2 x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ homogenous ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ t coordinates v m m y ⎣ ⎦ P 1 in image P 1 in object y 3 4 [scale, rotation, shear] [translation] P 2 in image P 2 in object • We have two coordinate systems (object and image), and they are related by some affine mapping (rotation, scale, translation, shear). Solving for the ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ u m m x t = 1 2 + x Alignment: backprojection transformation ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ t v m m y ⎣ ⎦ y 3 4 parameters • Having solved for this transformation from some number of detected matches (3+ here), can compute (hypothesized) location of any other model points in the image space. = = ( model ) ( image ) P [ 200 , 100 ] P [ 100 , 60 ] 1 1 = ( model ) = P [ 300 , 200 ] P ( image ) [ 380 , 120 ] 2 2 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ t u m m x . . = + 1 2 x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ . . ⎣ v ⎦ ⎣ m m ⎦ ⎣ y ⎦ ⎣ t ⎦ . . 3 4 y image point model point Rewrite in terms of unknown parameters Alignment: backprojection Alignment: verification Similar ideas for camera models (3d->2d) • Given the backprojected model in the image: – Check if image edges coincide with predicted • Perspective camera ⋅ model edges M P = x 1 w p = ⋅ – May be more robust if also require edges to im MP M P w 3 w have the same orientation ⋅ M P image model = 2 w y – Consider texture in corresponding regions? coordinates coordinates im ⋅ M P 3 w • Simpler calibration possible with simpler camera models

  6. Alignment: verification Alignment: verification Edge-based verification can be brittle Pose clustering (voting) Application: Surgery • To minimize damage by operation planning • Narrow down the number of hypotheses to • To reduce number of operations by planning surgery verify: identify those model poses that a lot of • To remove only affected tissue features agree on. • Problem – Use each group’s correspondence to estimate – ensure that the model with the operations planned on it and pose the information about the affected tissue lines up with the patient – Vote for that object pose in accumulator array – display model information supervised on view of patient (one array per object if we have multiple – Big Issue : coordinate alignment, as above models) Computer Vision - A Modern Approach Set: Model-based Vision Slide by D.A. Forsyth Segmentation Regions used to break assembled single MRI into 3d slice into model regions. Figures by kind permission of Eric Grimson; Figures by kind permission of Eric Grimson; http://www.ai.mit.edu/people/welg/welg.html. http://www.ai.mit.edu/people/welg/welg.html.

  7. Patient with model superimposed. Note that view of model is registered to patient’s pose here. Figures by kind permission of Eric Grimson; Figures by kind permission of Eric Grimson; http://www.ai.mit.edu/people/welg/welg.html. http://www.ai.mit.edu/people/welg/welg.html. Summary: model-based recognition • Hypothesize and test: looking for object and pose that fits well with image – Use good correspondences to designate hypotheses – Limit verifications performed by voting • Requires model for the specific objects – Searching a modelbase – Registration tasks • Requires camera model selection Figures by kind permission of Eric Grimson; http://www.ai.mit.edu/people/welg/welg.html. Outline Limits of model-based recognition? • Overview of recognition background – Model-based – Appearance-based – Local feature-based • Features and interest operators • Bags of words • Constellation models

  8. Global measure of appearance Global measure of appearance • e.g., Color histogram – vector of pixel intensities – grayscale / color histogram – bank of filter responses ,… Slide credit: Stan Sclaroff: http://www.ai.mit.edu/courses/6.801/Fall2002/lect/lect24.pdf Slide credit: Stan Sclaroff: http://www.ai.mit.edu/courses/6.801/Fall2002/lect/lect24.pdf Slide credit: Stan Sclaroff: http://www.ai.mit.edu/courses/6.801/Fall2002/lect/lect24.pdf Learning with global Global measure of appearance representations e.g., responses to linear filters • In addition to sorting images based on nearness in feature space, can learn classifiers Feature dimension 2 Feature dimension 1 Slide credit: David Forsyth

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend