ee 6882 visual search engine lec 1 introduction
play

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo - PDF document

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search mobile search Google Image Google Goggles photo copy search Demos: Jan. 23 2012 Topics of Interest How is visual information represented?


  1. EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search mobile search Google Image Google Goggles photo copy search Demos: Jan. 23 2012 Topics of Interest  How is visual information represented?  How are images matched? How to handle distortion and occlusion?   How to handle gigantic database? 36 billions photos uploaded to Facebook per year   Possibility of semantic image tagging? How to combine multimodal information?   How to design search interfaces for multimedia? For different purposes: information, entertainment, networking   How to present multimedia search results? Summarization and augmented reality  EE6882-Chang 2

  2. Visual Information Generation illumination scene Sensing device image 3 Visual Representation and Features R G R G R DSP G B G B G Image (White irradiance R G R G R Balance, G B G B G intensity R G R G R Contrast Camera Enhancement CCD Additive Demosaicking Lens Response Noise Sensor … etc) Filter Function S.-F . Chang, Columbia U. 4

  3. Image quality not always perfect  Image quality variations  Exposure  Shadow  Distance  Obstruction  Blur  Weather  Day/Night Navteq NYC Data digital video | multimedia lab Visual Representation: Global Features Texture Shape Color energy in filter banks http://www.cs.princeton.edu/gfx/proj/shape/ 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100

  4. Local Features: Keypoint Localization • Keypoint properties: – Interesting content – Precise localization – Repeatable detection under variations of scale, rotation, etc S.-F. Chang, Columbia U. 7 (Slide of K. Grauman) Example: Hessian Detector [ Beaudet78] • Hessian determinant I xx   I I  xx xy   Hessian ( I ) I I   xy yy I yy I xy   2 det( Hessian ( I )) I I I xx yy xy In Matlab:   . ( )^ 2 I I I xx yy xy 8 (Slide of K. Grauman) S.-F. Chang, Columbia U.

  5. Local Appearance Descriptor (SIFT) Compute gradient in a local patch Histogram of oriented gradients over local grids • e.g., 2x4, or 4x4 grids and 8 directions ‐ > 4x4x8=128 dimensions • Scale invariant [Lowe, ICCV 1999] S.-F. Chang, Columbia U. 9 Image representation • Image content is transformed into local features that are invariant to geometric and photometric transformations Local Features, e.g. SIFT 10 K. Grauman, B. Leibe S lide: David Lowe

  6. Example Initial matches Spatial consistency required Slide credit: J. Sivic Match regions between frames using SIFT descriptors and spatial consistency Multiple regions overcome problem of partial occlusion Shape adapted regions Maximally stable regions Slide credit: J. Sivic

  7. Clustering of Image Patch Patterns Corners Blobs eyes letters Sivic and Zisserman, “Video Google”, 2006 From local features to Visual Words 128 ‐ D feature space visual word vocabulary clustering …

  8. Represent Image as Bag of Words keypoint features visual words clustering … BoW histogram … … Content Based Image Search  Demo: Object Retrieval  Demo 2: Flickr Image Search Demos of Junfeng He S.-F. Chang, Columbia U. 16

  9. Application of Image matching: search result summary Find duplicate images, Issue a text query Explore history/trend merge into clusters Get top 1000 results from web search engine Rank clusters (size?, original rank?) Slide of Lyndon Kennedy Matching Reveals Image Provenance Biggest Clusters Contain Iconic Images Smallest Clusters Contain Marginal Images digital video | multimedia lab

  10. Scale Up: Find similar images over Internet  Billions of images online as dense sampling of the world  For every image taken, likely to find images that look alike 80 Million Tiny Images, Torralba, Fergus & Freeman, PAMI 2008 IM2GPS : where is this photo taken? (Hays & Efros, 2008) Similar images Most likely locations

  11. IM2GPS : where is this photo taken? (Hays & Efros, 2008) Similar images Most likely locations digital video | multimedia lab IM2GPS : where is this photo taken? (Hays & Efros, 2008) Similar images Most likely locations digital video | multimedia lab

  12. Images on Social Networks  Understanding social behaviors by media mining  Crandall et al, WWW 2009, 35 million Flickr photos, 300,000 users, photographer movement paths Indexing Gigantic Dataset • Exhaustive matching of every image is infeasible • Use hierarchical clustering to speedup – Reduce clustering complexity from O(dk 2 ) to O(d*log(k)) d: feature dimension, k: clusters • Each local feature mapped to a path in the tree • Each image represented as a sub ‐ tree plus occurrence frequency of nodes • Each node linked with an inverted file of images • Similarity between query and database images = similarity between two sub ‐ trees Nister and Stewenius ‘06

  13. Search over Billions: Scalability is a Big Issue  Similarity Search: traditional tree ‐ based methods (e.g., kd ‐ tree) not suitable in high dimension, because of back tracing  Need accurate, sublinear solutions (o(N), O(log(N)), O(1) )  Recent trends: h +1 -1 Hashing based index  Random projection: Locality Sensitive Hash (LSH) [Indyk & Motwani 98, Charikar 02] x 2  Principal projection: x 1 Spectral Hashing [Weiss et al 08]  Restricted boltzman machines P(h(x 1 ) = h(x 2 )) = 1- cos -1 (x 1 · x 2 )/ π = Sim(x 1 , x 2 ) [Hinton et al. 06, Torralba et al. 08] random projection h with N(0,1)  Kernel LSH [Kulis et al. 09 & Mu et al. 10] Beyond Tree Indexing: Locality Sensitive Hashing (LSH) Choose a random projection  Project points  Points close in the original space  remain close under the projection Unfortunately, converse not true   Answer: use multiple quantized projections which define a high-dimensional “grid” Slide credit: J. Sivic

  14. Probabilistic guarantee of finding true targets within ε distance range [Indyk & Motwani 98] Slide of Sanjiv Kumar Going to Higher Level: Text-based Search Current system still flawed, e.g., keyword: Manhattan Cruise digital video | multimedia lab

  15. Auto Image Tagging May Help Fill the Gap Audio-visual features Rich semantic description   based on content recognition User social features  Camera/location info  + - Anchor Snow Soccer . . . Building Outdoor Statistical models S.-F. Chang, Columbia U. 29 Machine Learning: Build Classifier Find separating hyperplane: w to maximize margin Airplane w T x + b = 0 Decision function: f (x) = sign(w T x + b ) w T x i + b > 0 if label y i = +1 w T x i + b < 0 if label y i = ‐ 1

  16. TRECVID: Detection Examples • Top five classification results Classroom Demonstration Or Protest Cityscape Airplane flying Singing Object Localization (PASCAL VOC) bird aeroplane bicycle boat train chair Person cat car bottle dining table tv/monitor horse cow dog potted plant sheep motorbike sofa bus

  17. High ‐ Level Multimedia Event Detection TRECVID 2010 Example 1 Example 2 Example 3 Example 4 MED Events: Assembling a shelter Batting a run ‐ in Making a cake Need fusion of multimodal analysis: visual, audio, text, temporal Model Event Context Batting a run in Action Scene Concepts Concepts Running Sky Walking Grass Cheering Baseball Clapping Field Speech Audio Concepts Understanding contexts is critical for event modeling.

  18. Classifiers Enable Concept ‐ Level Search • Offline concept detection Anchor person Person, Visual Concept Meeting, … Classifiers Military action, • Online search Vehicle, Road, Building… Meeting Person Vehicle Classifier Pool Military action Anchor person Find “people Building Road talking” 35 Explore Concept Correlation: Semantic Diffusion via Graph Individual Classifiers: Concept Desert: 0.68; Sky: 0.60; correlation Weapon: 0.38; Car: 0.43; graph Vehicle: 0.35 … correlation matrix Classifier c j score

  19. Adapting Graph Weights to New Domain Broadcast News Documentary The correlation Need to adapt to Graph optimization model does not fit correlation new test the new domain domain on the fly (Jiang, Ngo, and Chang, ICCV09) Iteration: 4 Iteration: 0 Iteration: 12 Iteration: 8 Iteration: 20 Iteration: 16 Columbia CuZero: 400+ classifiers concept detection models: objects, people, location, scenes, events, etc airplane airplane_takeoff airport_or_airfield armed_person building car cityscapecrowd desert dirt_gravel_road entertainment explosion _fire forest highway hospital insurgents landscape maps military military_base military_personnel mountain nighttime people- marching person powerplants riot river road rpg shooting smoke tanks urban vegetation vehicle waterscape_waterfront weapons weather Columbia Runs Evaluation of 20 concepts at TRECVID 2008

  20. Demos: classifier ‐ based search • Find lake front buildings in the park • Find person walking around building • Find a car on a road in a snowy condition 39 When User in the Loop: Interactive Query Refinement Results 1 3 2 Query Formulation Query Processing  Feature Selection  Query Examples  Distance Metric  Classifiers  Ranking Model  Key Words Updated Online Update/ Rerank results  Relevance Feedback (shot, track, track interval) Handle  Feature/Attributes New Classifiers novel data  Interaction Log

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend