instance recognition
play

Instance recognition Thurs Oct 29 Last time Depth from stereo: - PDF document

10/28/2015 Instance recognition Thurs Oct 29 Last time Depth from stereo: main idea is to triangulate from corresponding image points. Epipolar geometry defined by two cameras Weve assumed known extrinsic parameters relating their


  1. 10/28/2015 Instance recognition Thurs Oct 29 Last time • Depth from stereo: main idea is to triangulate from corresponding image points. • Epipolar geometry defined by two cameras – We’ve assumed known extrinsic parameters relating their poses • Epipolar constraint limits where points from one view will be imaged in the other – Makes search for correspondences quicker • To estimate depth – Limit search by epipolarconstraint – Compute correspondences, incorporate matching preferences 1

  2. 10/28/2015 Virtual viewpoint video C. Zitnick et al, High-quality video view interpolation using a layered representation, SIGGRAPH 2004. Virtual viewpoint video C. Larry Zitnick et al, High-quality video view interpolation using a layered representation, SIGGRAPH 2004. http://research.microsoft.com/IVM/VVV/ 2

  3. 10/28/2015 Review questions: What stereo rig yielded these epipolar lines? e’ e Epipole has same coordinates in both images. Points move along lines radiating from e: “Focus of expansion” Figure f rom Hartley & Zisserman Review questions • When solving for stereo, when is it necessary to break the soft disparity gradient constraint? • What can cause a disparity value to be undefined? • What parameters relating the two cameras in the stereo rig must be known (or inferred) to compute depth? 3

  4. 10/28/2015 Today • Instance recognition – Indexing local features efficiently – Spatial verification models Recognizing or retrieving specific objects Example I: Visual search in feature films Visually defined query “Groundhog Day” [Rammis, 1993] “Find this clock” “ Find this place ” Slide credit: J. Sivic 4

  5. 10/28/2015 Recognizing or retrieving specific objects Example II: Search photos on the web for particular places Find these landmarks ...in these images and 1M more Slide credit: J. Sivic 5

  6. 10/28/2015 Recall: matching local features ? Image 1 Image 2 T o generate candidate matches , find patches that have the most similar appearance (e.g., lowest SSD) Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) Multi-view matching ? vs … Matching two given Search for a matching views for depth view for recognition 6

  7. 10/28/2015 Indexing local features … Indexing local features • Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) Descriptor’s feature space 7

  8. 10/28/2015 Indexing local features • When we see close points in feature space, we have similar descriptors, which indicates similar local content. Query Descriptor’s image feature space Database images Indexing local features • With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? • Possible solutions: – Inverted file – Nearest neighbor data structures • Kd-trees • Hashing 8

  9. 10/28/2015 Indexing local features: inverted file index • For text documents, an efficient way to find all pages on which a w ord occurs is to use an index… • We want to find all images in which a feature occurs. • To use this idea, we’ll need to map our features to “visual words”. Visual words • Map high-dimensional descriptors to tokens/words by quantizing the feature space • Quantize via clustering, let cluster centers be the prototype “words” Word #2 • Determine which word to assign to Descriptor’s each new image feature space region by finding the closest cluster center. 9

  10. 10/28/2015 Visual words: main idea • Extract some local features from a number of images … e.g., SIFT descriptor space: each point is 128-dimensional Slide cr edit: D. Nister , CVPR 2006 Visual words: main idea 10

  11. 10/28/2015 Visual words: main idea Visual words: main idea 11

  12. 10/28/2015 Each point is a local descriptor, e.g. SIFT vector. 12

  13. 10/28/2015 Visual words • Example: each group of patches belongs to the same visual word Figure from Sivic & Zisserman, ICCV 2003 Visual words • Also used for describing scenes and object categories for the sake of indexing or classification. Sivic & Zisserman 2003; Csurka, Bray , Dance, & Fan 2004; many others. 13

  14. 10/28/2015 Visual words and textons • First explored for texture and material representations • Texton = cluster center of filter responses over collection of images • Describe textures and materials based on distribution of prototypical texture elements. Leung & Malik 1999; Varma & Zisserman, 2002 Recall: Texture representation example Windows with primarily horizontal Both edges Dimension 2 (mean d/dy value) mean mean d/dx d/dy value value Win. #1 4 10 Win.#2 18 7 … Win.#9 20 20 Dimension 1 (mean d/dx value) … Windows with Windows with statistics to small gradient in primarily vertical both directions edges summarize patterns in small windows 14

  15. 10/28/2015 Visual vocabulary formation Issues: • Sampling strategy: where to extract features? • Clustering / quantization algorithm • Unsupervised vs. supervised • What corpus provides features (universal vocabulary?) • Vocabulary size, number of words Inverted file index • Database images are loaded into the index mapping words to image numbers 15

  16. 10/28/2015 Inverted file index When w ill this give us a significant gain in efficiency? • New query image is mapped to indices of database images that share a word. Instance recognition: remaining issues • How to summarize the content of an entire image? And gauge overall similarity? • How large should the vocabulary be? How to perform quantization efficiently? • Is having the same set of visual words enough to identify the object/scene? How to verify spatial agreement? • How to score the retrieval results? Kristen Grauman 16

  17. 10/28/2015 Analogy to documents Of all the sensory impressions proceeding to China is f orecasting a trade surplus of $90bn the brain, the v isual experiences are the (£51bn) to $100bn this y ear, a threef old dominant ones. Our perception of the world increase on 2004's $32bn. The Commerce around us is based essentially on the Ministry said the surplus would be created by messages that reach the brain f rom our ey es. a predicted 30% jump in exports to $750bn, For a long time it was thought that the retinal compared with a 18% rise in imports to sensory, brain, China, trade, image was transmitted point by point to v isual $660bn. The f igures are likely to f urther centers in the brain; the cerebral cortex was a annoy the US, which has long argued that visual, perception, surplus, commerce, mov ie screen, so to speak, upon which the China's exports are unf airly helped by a retinal, cerebral cortex, exports, imports, US, image in the ey e was projected. Through the deliberately underv alued y uan. Beijing discov eries of Hubel and Wiesel we now eye, cell, optical agrees the surplus is too high, but say s the yuan, bank, domestic, know that behind the origin of the v isual y uan is only one f actor. Bank of China nerve, image foreign, increase, perception in the brain there is a considerably gov ernor Zhou Xiaochuan said the country Hubel, Wiesel trade, value more complicated course of ev ents. By also needed to do more to boost domestic f ollowing the v isual impulses along their path demand so more goods stay ed within the to the v arious cell lay ers of the optical cortex, country. China increased the v alue of the Hubel and Wiesel hav e been able to y uan against the dollar by 2.1% in July and demonstrate that the m essage about the permitted it to trade within a narrow band, but im age falling on the retina undergoes a step- the US wants the y uan to be allowed to trade wise analysis in a system of nerve cells f reely. Howev er, Beijing has made it clear that stored in colum ns. In this system each cell it will take its time and tread caref ully bef ore allowing the y uan to rise f urther in v alue. has its specific function and is responsible for a specific detail in the pattern of the retinal im age. ICCV 2005 short course, L. Fei-Fei Bag of ‘words’ Object ICCV 2005 short course, L. Fei-Fei 17

  18. 10/28/2015 Bags of visual words • Summarize entire image based on its distribution (histogram) of word occurrences. • Analogous to bag of words representation commonly used for documents. 18

  19. 10/28/2015 Comparing bags of words • Rank frames by normalized scalar product between their (possibly weighted) occurrence counts--- nearest neighbor search for similar images. [1 8 1 4] [5 1 1 0] 𝑒 𝑘 ,𝑟 𝑡𝑗𝑛 𝑒 𝑘 ,𝑟 = 𝑒 𝑘 𝑟 𝑊 𝑗=1 𝑒 𝑘 𝑗 ∗ 𝑟(𝑗) = 𝑒 𝑘 (𝑗) 2 ∗ 𝑊 𝑊 𝑟(𝑗) 2 𝑗=1 𝑗=1   for vocabulary of V words q d j tf-idf weighting • T erm f requency – i nverse d ocument f requency • Describe frame by frequency of each word within it, downweight words that appear often in the database • (Standard weighting for text retrieval) Total number of Number of documents in occurrences of word database i in document d Number of documents Number of words in word i occurs in, in document d whole database 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend