EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo - PDF document

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search mobile search Google Image Google Goggles photo copy search Demos: Jan. 23 2012 Topics of Interest  How is visual information represented?  How are images matched? How to handle distortion and occlusion?   How to handle gigantic database? 36 billions photos uploaded to Facebook per year   Possibility of semantic image tagging? How to combine multimodal information?   How to design search interfaces for multimedia? For different purposes: information, entertainment, networking   How to present multimedia search results? Summarization and augmented reality  EE6882-Chang 2

Visual Information Generation illumination scene Sensing device image 3 Visual Representation and Features R G R G R DSP G B G B G Image (White irradiance R G R G R Balance, G B G B G intensity R G R G R Contrast Camera Enhancement CCD Additive Demosaicking Lens Response Noise Sensor … etc) Filter Function S.-F . Chang, Columbia U. 4

Image quality not always perfect  Image quality variations  Exposure  Shadow  Distance  Obstruction  Blur  Weather  Day/Night Navteq NYC Data digital video | multimedia lab Visual Representation: Global Features Texture Shape Color energy in filter banks http://www.cs.princeton.edu/gfx/proj/shape/ 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100

Local Features: Keypoint Localization • Keypoint properties: – Interesting content – Precise localization – Repeatable detection under variations of scale, rotation, etc S.-F. Chang, Columbia U. 7 (Slide of K. Grauman) Example: Hessian Detector [ Beaudet78] • Hessian determinant I xx   I I  xx xy   Hessian ( I ) I I   xy yy I yy I xy   2 det( Hessian ( I )) I I I xx yy xy In Matlab:   . ( )^ 2 I I I xx yy xy 8 (Slide of K. Grauman) S.-F. Chang, Columbia U.

Local Appearance Descriptor (SIFT) Compute gradient in a local patch Histogram of oriented gradients over local grids • e.g., 2x4, or 4x4 grids and 8 directions ‐ > 4x4x8=128 dimensions • Scale invariant [Lowe, ICCV 1999] S.-F. Chang, Columbia U. 9 Image representation • Image content is transformed into local features that are invariant to geometric and photometric transformations Local Features, e.g. SIFT 10 K. Grauman, B. Leibe S lide: David Lowe

Example Initial matches Spatial consistency required Slide credit: J. Sivic Match regions between frames using SIFT descriptors and spatial consistency Multiple regions overcome problem of partial occlusion Shape adapted regions Maximally stable regions Slide credit: J. Sivic

Clustering of Image Patch Patterns Corners Blobs eyes letters Sivic and Zisserman, “Video Google”, 2006 From local features to Visual Words 128 ‐ D feature space visual word vocabulary clustering …

Represent Image as Bag of Words keypoint features visual words clustering … BoW histogram … … Content Based Image Search  Demo: Object Retrieval  Demo 2: Flickr Image Search Demos of Junfeng He S.-F. Chang, Columbia U. 16

Application of Image matching: search result summary Find duplicate images, Issue a text query Explore history/trend merge into clusters Get top 1000 results from web search engine Rank clusters (size?, original rank?) Slide of Lyndon Kennedy Matching Reveals Image Provenance Biggest Clusters Contain Iconic Images Smallest Clusters Contain Marginal Images digital video | multimedia lab

Scale Up: Find similar images over Internet  Billions of images online as dense sampling of the world  For every image taken, likely to find images that look alike 80 Million Tiny Images, Torralba, Fergus & Freeman, PAMI 2008 IM2GPS : where is this photo taken? (Hays & Efros, 2008) Similar images Most likely locations

IM2GPS : where is this photo taken? (Hays & Efros, 2008) Similar images Most likely locations digital video | multimedia lab IM2GPS : where is this photo taken? (Hays & Efros, 2008) Similar images Most likely locations digital video | multimedia lab

Images on Social Networks  Understanding social behaviors by media mining  Crandall et al, WWW 2009, 35 million Flickr photos, 300,000 users, photographer movement paths Indexing Gigantic Dataset • Exhaustive matching of every image is infeasible • Use hierarchical clustering to speedup – Reduce clustering complexity from O(dk 2 ) to O(d*log(k)) d: feature dimension, k: clusters • Each local feature mapped to a path in the tree • Each image represented as a sub ‐ tree plus occurrence frequency of nodes • Each node linked with an inverted file of images • Similarity between query and database images = similarity between two sub ‐ trees Nister and Stewenius ‘06

Search over Billions: Scalability is a Big Issue  Similarity Search: traditional tree ‐ based methods (e.g., kd ‐ tree) not suitable in high dimension, because of back tracing  Need accurate, sublinear solutions (o(N), O(log(N)), O(1) )  Recent trends: h +1 -1 Hashing based index  Random projection: Locality Sensitive Hash (LSH) [Indyk & Motwani 98, Charikar 02] x 2  Principal projection: x 1 Spectral Hashing [Weiss et al 08]  Restricted boltzman machines P(h(x 1 ) = h(x 2 )) = 1- cos -1 (x 1 · x 2 )/ π = Sim(x 1 , x 2 ) [Hinton et al. 06, Torralba et al. 08] random projection h with N(0,1)  Kernel LSH [Kulis et al. 09 & Mu et al. 10] Beyond Tree Indexing: Locality Sensitive Hashing (LSH) Choose a random projection  Project points  Points close in the original space  remain close under the projection Unfortunately, converse not true   Answer: use multiple quantized projections which define a high-dimensional “grid” Slide credit: J. Sivic

Probabilistic guarantee of finding true targets within ε distance range [Indyk & Motwani 98] Slide of Sanjiv Kumar Going to Higher Level: Text-based Search Current system still flawed, e.g., keyword: Manhattan Cruise digital video | multimedia lab

Auto Image Tagging May Help Fill the Gap Audio-visual features Rich semantic description   based on content recognition User social features  Camera/location info  + - Anchor Snow Soccer . . . Building Outdoor Statistical models S.-F. Chang, Columbia U. 29 Machine Learning: Build Classifier Find separating hyperplane: w to maximize margin Airplane w T x + b = 0 Decision function: f (x) = sign(w T x + b ) w T x i + b > 0 if label y i = +1 w T x i + b < 0 if label y i = ‐ 1

TRECVID: Detection Examples • Top five classification results Classroom Demonstration Or Protest Cityscape Airplane flying Singing Object Localization (PASCAL VOC) bird aeroplane bicycle boat train chair Person cat car bottle dining table tv/monitor horse cow dog potted plant sheep motorbike sofa bus

High ‐ Level Multimedia Event Detection TRECVID 2010 Example 1 Example 2 Example 3 Example 4 MED Events: Assembling a shelter Batting a run ‐ in Making a cake Need fusion of multimodal analysis: visual, audio, text, temporal Model Event Context Batting a run in Action Scene Concepts Concepts Running Sky Walking Grass Cheering Baseball Clapping Field Speech Audio Concepts Understanding contexts is critical for event modeling.

Classifiers Enable Concept ‐ Level Search • Offline concept detection Anchor person Person, Visual Concept Meeting, … Classifiers Military action, • Online search Vehicle, Road, Building… Meeting Person Vehicle Classifier Pool Military action Anchor person Find “people Building Road talking” 35 Explore Concept Correlation: Semantic Diffusion via Graph Individual Classifiers: Concept Desert: 0.68; Sky: 0.60; correlation Weapon: 0.38; Car: 0.43; graph Vehicle: 0.35 … correlation matrix Classifier c j score

Adapting Graph Weights to New Domain Broadcast News Documentary The correlation Need to adapt to Graph optimization model does not fit correlation new test the new domain domain on the fly (Jiang, Ngo, and Chang, ICCV09) Iteration: 4 Iteration: 0 Iteration: 12 Iteration: 8 Iteration: 20 Iteration: 16 Columbia CuZero: 400+ classifiers concept detection models: objects, people, location, scenes, events, etc airplane airplane_takeoff airport_or_airfield armed_person building car cityscapecrowd desert dirt_gravel_road entertainment explosion _fire forest highway hospital insurgents landscape maps military military_base military_personnel mountain nighttime people- marching person powerplants riot river road rpg shooting smoke tanks urban vegetation vehicle waterscape_waterfront weapons weather Columbia Runs Evaluation of 20 concepts at TRECVID 2008

Demos: classifier ‐ based search • Find lake front buildings in the park • Find person walking around building • Find a car on a road in a snowy condition 39 When User in the Loop: Interactive Query Refinement Results 1 3 2 Query Formulation Query Processing  Feature Selection  Query Examples  Distance Metric  Classifiers  Ranking Model  Key Words Updated Online Update/ Rerank results  Relevance Feedback (shot, track, track interval) Handle  Feature/Attributes New Classifiers novel data  Interaction Log

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo - PDF document

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search mobile search Google Image Google Goggles photo copy search Demos: Jan. 23 2012 Topics of Interest How is visual information represented?

EE 6882 Visual Search Engine Feb. 27 th , 2012 Lecture #6 Object Search Using Local Features

EE 6882 Visual Search Engine Prof. Shih Fu Chang, Jan. 30, 2012 Lecture #2 Visual Features:

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

EE 6882 Visual Search Engine Prof. Shih Fu Chang, Feb. 13 th 2012 Lecture #4 Local Feature

EE 6882 Visual Search Engine Prof. Shih Fu Chang, Feb. 6 th 2012 Lecture #3 Evaluation

EE 6882 Visual Search Engine March 5 th , 2012 Lecture #7 Relevance Feedback Graph

Efficient visual search of local features Efficient visual search of local features Cordelia

The Economics of Internet Search Hal R. Varian Sept 31, 2007 Search engine use Search

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Technologies behind Internet Search Engine Ming-Jer Lee CTO VisionNEXT Inc. Type of Search

search engine optimization ABOUT ME HOLISTIC SEARCH 2.0 ECOSYSTEM eRetail Search Platform

How to Rank Your Website on Page #1 of Google SEARCH ENGINE OPTIMISATION (SEO) Search Results

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

EE 6882 Statistical Methods for Video Indexing and Analysis Fall 2004 Prof. Shih-Fu Chang

eyeShot Multimedia Search Engine Multimedia Search Engine eyeShot Extracting text patterns

P e o p l e A very sparse semantical vector for frame: Emphasize primary object Vector

Convolutional Neural Networks for Language CS 6956: Deep Learning for NLP Features from text

SPRING 2017 FOR Guy Loveday 1 WHAT IS COMING UP? TRIENNIAL REVIEW OF FRS 102 REGIME

Secondment immersed in community life Transformation from fisheries to experience economy - and

09 Expansions and Regular Expressions CS 2043: Unix Tools and Scripting, Spring 2019 [2]

Guanxi L U O e H x e I d s Guanxi TERENA, Barcelona September 8th 2005 Alistair Young

Whats in a name? The syntax of passive participles Since Wasow 1977, the broad consensus

Communication, Services, and Coordination Communication, Services, and Coordination Communication