EE 6882 Visual Search Engine Feb. 27 th , 2012 Lecture #6 Object - PDF document

2/27/2012 EE 6882 Visual Search Engine Feb. 27 th , 2012 Lecture #6  Object Search Using Local Features  Applications of Mobile Visual Search  Mid ‐ Term Project Reading List • Sivic, J. and A. Zisserman, Video Google: A text retrieval approach to object matching in videos, in ICCV. 2003. • Nister, D. and H. Stewenius. Scalable recognition with a vocabulary tree. in CVPR. 2006. • Chum, O., et al. Total recall: Automatic query expansion with a generative feature model for object retrieval. in ICCV. 2007. • Felix X. Yu, Rongrong Ji, Tongtao Zhang, Shih ‐ Fu Chang. Active Query Sensing for mobile location search . In Proceeding of ACM International Conference on Multimedia (ACM MM), 2011. • Junfeng He, Tai ‐ Hsu Lin, Jinyuan Feng, Shih ‐ Fu Chang. Mobile product search with bag of hash bits . In Proceeding of ACM International Conference on Multimedia (ACM MM), demo paper, 2011. • Nokia. Nokia Point and Find. 2006; Available from: http://www.pointandfind.nokia.com. • Kooaba. Available from: http://www.kooaba.com. 1

2/27/2012 Local Appearance Descriptor (SIFT) [Lowe, ICCV 1999] Compute gradient in a local patch Histogram of oriented gradients over local grids • e.g., 4x4 grids and 8 directions ‐ > 4x4x8=128 dimensions • Rotation ‐ aligned, Scale invariant There are many other local features, e.g., SUFR, HOG, BRIEF, MSER. STIP Example of Local Feature Matching Initial matches Spatial consistency required Slide credit: J. Sivic 2

2/27/2012 Application: Large Scale Mobile Visual Search Ricoh, HotPaper (by Mac Funamizu) digital video | multimedia lab Mobile Visual Search 4. Feature matching with database images Image Database 0.5 0.4 0.3 0.5 0.2 0.4 0.1 3. Send to 0.3 0.2 0.5 0 0 20 40 60 80 100 120 140 0.1 0.4 2. Send 0.3 0 server via MMS 0 20 40 60 80 100 120 140 0.5 0.2 0.4 0.1 0.3 image or 0 0 20 40 60 80 100 120 140 0.2 0.1 5. Send results back 0 0 20 40 60 80 100 120 140 features 1. Take a picture 3

2/27/2012 Application: particular object retrieval Example I: Visual search in feature films Visually defined query “Groundhog Day” [Rammis, 1993] “Find this clock” “Find this place” Slide credit: J. Sivic Example II: Search photos on the web for particular places Find these landmarks ...in these images and 1M more Slide credit: J. Sivic 4

2/27/2012 Global vs. Local Feature Matching  Global  Convert query and database images to global representations such as Bags of Words  Perform global matching  Local  Use each local feature as query  Search matched local features in the database  Rank images  Perform spatial verification Outline of a local feature retrieval strategy Local Features invariant descriptor vectors frames invariant descriptor vectors 1. Compute affine covariant regions in each frame independently 2. “Represent” each region by a vector of descriptors 3. Finding corresponding regions is transformed to finding nearest neighbour vectors 4. Rank retrieved frames by number of corresponding regions 5. Verify retrieved frame based on spatial consistency Slide credit: J. Sivic 5

2/27/2012 Bottleneck: nearest-neighbor matching over gigantic database Solve following problem for all feature vectors, x j , in the query image: where x i are features in database images. Nearest-neighbour matching is the major computational bottleneck • Linear search performs dxn operations for n features in the database and d dimensions • n may be as high as billions • No exact methods are faster than linear search for d>10 • Explore approximate methods (e.g., tree based indexing) Slide credit: J. Sivic K-d tree construction Simple 2D example 4 l 1 6 l 9 l 1 7 l 5 l 6 l 3 8 l 2 l 2 l 3 5 9 10 l 10 3 l 8 l 7 l 4 l 5 l 7 l 6 2 1 l 4 11 l 8 l 10 l 9 2 5 4 11 8 1 3 9 10 6 7 Slide credit: Anna Atramentov 6

2/27/2012 K-d tree query 4 l 1 6 l 9 l 1 7 l 5 l 6 q l 3 8 l 2 l 3 l 2 5 9 10 l 10 3 l 4 l 5 l 7 l 6 l 8 l 7 1 2 l 4 11 l 8 l 10 l 9 2 5 4 11 8 1 3 9 10 6 7 Slide credit: Anna Atramentov Approximate nearest neighbour K-d tree search Issues • Need backtracing to find exact NN • Exponential cost when dimension grows • Remedy: limit the number of neighboring bins to explore • Search k-d tree bins in order of distance from query Slide credit: J. Sivic 7

2/27/2012 Alternative method: mapping local features to Visual Words 128 ‐ D feature space visual word vocabulary clustering … Visual words: main idea Extract some local features from a number of images … e.g., S IFT descriptor space: each point is 128-dimensional 16 S lide credit: D. Nister K. Grauman, B. Leibe 8

2/27/2012 Visual words: main idea 17 S lide credit: D. Nister K. Grauman, B. Leibe Visual words: main idea 18 S lide credit: D. Nister K. Grauman, B. Leibe 9

2/27/2012 Visual words: main idea 19 S lide credit: D. Nister K. Grauman, B. Leibe 20 S lide credit: D. Nister K. Grauman, B. Leibe 10

2/27/2012 21 S lide credit: D. Nister K. Grauman, B. Leibe 11

2/27/2012 Visual Words: Image Patch Patterns Corners Blobs eyes letters Sivic and Zisserman, “Video Google”, 2006 Inverted file index for images comprised of visual words Word List of image number numbers • Score each image by the number of common visual words (tentative correspondences) • But: does not take into account spatial layout of regions Image credit: A. Zisserman K. Grauman, B. Leibe Slide credit: J. Sivic 12

2/27/2012 How to create visual words? Clustering / quantization methods • k-means (typical choice), agglomerative clustering, mean- shift,… • Hierarchical clustering: allows faster insertion / word assignment while still allowing large vocabularies • Vocabulary tree [Nister & Stewenius, CVPR 2006] 25 K. Grauman, B. Leibe Slide credit: J. Sivic Quantization using K-means K-means overview:  Iterate Initialize cluster Find nearest cluster to each Re-compute cluster centres datapoint ( slow ) O(N K) centres as centroid K-means provably locally minimizes the sum of squared  errors (SSE) between a cluster centre and its points  But: The quantizer depends on the initialization.  The nearest neighbour search is the bottleneck Slide credit: J. Sivic 13

2/27/2012 Approximate K-means Use the approximate nearest neighbour search (randomized  forest of kd-trees) to determine the closest cluster centre for each data point. Original K-means complexity = O(N K)   Approximate K-means complexity = O(N log K)  Can be scaled to very large K. Slide credit: J. Sivic Clustering / quantization methods • k-means (typical choice), agglomerative clustering, mean- shift,… • Hierarchical clustering: allows faster insertion / word assignment while still allowing large vocabularies • Vocabulary tree [Nister & Stewenius, CVPR 2006] 28 K. Grauman, B. Leibe Slide credit: J. Sivic 14

2/27/2012 Example: Recognition with Vocabulary Tree Tree construction: [Nister & S tewenius, CVPR’ 06] 29 K. Grauman, B. Leibe S lide credit: David Nister Vocabulary Tree Training: Filling the tree [Nister & S tewenius, CVPR’ 06] 30 K. Grauman, B. Leibe S lide credit: David Nister 15

2/27/2012 Vocabulary Tree Training: Filling the tree [Nister & S tewenius, CVPR’ 06] 31 K. Grauman, B. Leibe S lide credit: David Nister Vocabulary Tree Training: Filling the tree [Nister & S tewenius, CVPR’ 06] 32 K. Grauman, B. Leibe S lide credit: David Nister 16

2/27/2012 Vocabulary Tree Training: Filling the tree [Nister & S tewenius, CVPR’ 06] 33 K. Grauman, B. Leibe S lide credit: David Nister Vocabulary Tree Training: Filling the tree [Nister & S tewenius, CVPR’ 06] 34 K. Grauman, B. Leibe S lide credit: David Nister 17

2/27/2012 Vocabulary Tree Recognition Verification on spatial layout [Nister & S tewenius, CVPR’ 06] 35 K. Grauman, B. Leibe S lide credit: David Nister Voc Tree can also be used to score images efficiently q: query feature vector d: database feature vector i Incremental update of scores for every query visual word 18

2/27/2012 Vocabulary Tree: Performance Evaluated on large databases • Indexing with up to 1M images Online recognition for database of 50,000 CD covers • Retrieval in ~1s Find experimentally that large vocabularies can be beneficial for recognition [Nister & S tewenius, CVPR’ 06] 37 K. Grauman, B. Leibe Slide credit: J. Sivic Beyond Bag of Words Use the position and shape of the underlying features  to improve retrieval quality Both images have many matches – which is correct?  Slide credit: J. Sivic 19

2/27/2012 Beyond Bag of Words We can measure spatial consistency between the  query and each result to improve retrieval quality Many spatially consistent Few spatially consistent matches – correct result matches – incorrect result Slide credit: J. Sivic Beyond Bag of Words Extra bonus – gives localization of the object  Slide credit: J. Sivic 20

2/27/2012 Spatial Verification  Check consistency of relative distance (shift)  Check consistency of scale change  Check consistency of transformation (RANSAC) Feature-space outliner rejection Can we now compute H from the blue points? • No! Still too many outliers… • What can we do? Slide of A. Efros 21

EE 6882 Visual Search Engine Feb. 27 th , 2012 Lecture #6 Object - PDF document

2/27/2012 EE 6882 Visual Search Engine Feb. 27 th , 2012 Lecture #6 Object Search Using Local Features Applications of Mobile Visual Search Mid Term Project Reading List Sivic, J. and A. Zisserman, Video Google: A text retrieval

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

EE 6882 Visual Search Engine Prof. Shih Fu Chang, Jan. 30, 2012 Lecture #2 Visual Features:

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

EE 6882 Visual Search Engine Prof. Shih Fu Chang, Feb. 13 th 2012 Lecture #4 Local Feature

EE 6882 Visual Search Engine Prof. Shih Fu Chang, Feb. 6 th 2012 Lecture #3 Evaluation

EE 6882 Visual Search Engine March 5 th , 2012 Lecture #7 Relevance Feedback Graph

Efficient visual search of local features Efficient visual search of local features Cordelia

The Economics of Internet Search Hal R. Varian Sept 31, 2007 Search engine use Search

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Technologies behind Internet Search Engine Ming-Jer Lee CTO VisionNEXT Inc. Type of Search

search engine optimization ABOUT ME HOLISTIC SEARCH 2.0 ECOSYSTEM eRetail Search Platform

How to Rank Your Website on Page #1 of Google SEARCH ENGINE OPTIMISATION (SEO) Search Results

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

EE 6882 Statistical Methods for Video Indexing and Analysis Fall 2004 Prof. Shih-Fu Chang

eyeShot Multimedia Search Engine Multimedia Search Engine eyeShot Extracting text patterns

Sam's String Metrics Links HomePage Natural Language Processing Group , Research Links

Charm (and DengueInfo) http://dengueinfo.org/ Holland R.C.G., Ong S.H., Verhoef F., Mitchell

OddCI: On-Demand Distributed Computing Infrastructure Rostand Costa Francisco Brasileiro Guido

Developing and Using Special Developing and Using Special Developing and Using Special Purpose

Windows Azure as a Platform as a Service (PaaS) 17.7. 22.7. 2011 Jared Jackson Microsoft

Local invariant feature Would like discussion section, more review Careful about tangential

Mitigation Needs Assessment 1 CDBG-MIT Webinar Series HUD and FEMA role (National Mitigation

MAXIMIZING THE POTENTIAL OF CORPORATE PARTNERSHIPS October 20, 2016 2016 Collaborative Mentoring