Statistical Paradigm Many problems can be posed as pattern - PDF document

EE 6882 Overview of Statistical Models for Video Indexing Prof. Shih-Fu Chang Columbia University TA: Eric Zavesky Fall 2007, Lecture 4 Course web site: http:/ / www.ee.columbia.edu/ ~ sfchang/ course/ svia 1 Statistical Paradigm Many problems can be posed as pattern recognition � � Image classification: indoor vs. outdoor? Face? � shot boundary detection, story segmentation � Is the current point a boundary? Statistical models to handle uncertainty and provide flexibility � Image processing tools available � � E.g., homework # 1 Rich tools for learning and prediction � � See course web site Increasing data available � � NIST TREC Video: 300+ hours � Consumer and youtube videos 1

A Very High-Level Stat. Pattern Recog. Architecture (From Jain, Duin, and Mao, SPR Review, ’99) Important issues (1) � Image/video processing � What’s the adequate quality, resolution, etc? � Feature extraction � Color, texture, motion, region, shape, interest points, audio, speech, text, etc � Feature representation � Histogram, bag, graph etc � Invariance to scale, rotation, translation, view, illumination, … � How to reduce dimensions? 2

Important issues (2) Distance measurement � � How to measure similarity between images/videos? � L1, L2, Mahalanobis, Earth Mover’s distance, vector/graph matching Classification models � � Generative vs. discriminative � Multi-modal fusion, early fusion vs. late fusion � E.g., how to use joint audio-visual features to detect events (dancing, wedding…) Efficiency issues � � how to speed up training and testing processes? � How to rapidly build a model for new domains Validation and evaluation � � How to measure performance? � Are models generalizable to new domains? Three related problems � Retrieval, Ranking � Given a query image, find relevant ones � May apply rank threshold to decide relevance � Classification, categorization, detection � Given an image x, predict class label y � Clustering, grouping � Group images/videos into clusters of distinct attributes 3

� An example � News story segmentation using multi- modal, multi-scale features First Understand Data Types and Explore Unique Characteristics Percentage (32.0%) (8.8%) of content (c): multi-story in an anchor seg. (b) different anchor (a): regular anchor segment (21.3%) (15.0%) (f) sep. by music or anim. (d): conti. sports briefings (e): cont. short briefings (i): comm. after sports (g): weather report (h): anchor lead-in before comm. : story : weather : commercial : misc./animation : visual anchors 4

News Story Segmentation τ Objective: a story boundary at time ? � k τ = { shot boundaries or significant pauses} � k observation time τ − τ + τ k 1 k 1 k {video, audio} An anchor face? motion changes? change from music to speech? speech segment? {cue words} i appear {cue words} j appear Need to decide how to formulate features Challenge: diverse features Modality Raw Features Data Value Type text seg. score motion segment continuous music shot boundary point binary Video comm. face segment continuous commercial segment binary sigpas pause point continuous face pitch jump point continuous shot Audio significant pause point continuous musc./spch. disc. segment binary motion spch seg./rapidity segment continuous candidate point ASR cue terms point binary � One way is to use binary Text V-OCR cue terms point binary predicate: text seg. score point continuous combinatorial point binary Misc. if x > threshold, then predict sports segment binary segment boundary (b=1) 5

Example Predicates Predicates no raw feature set An anchor face segment just starts after the candidate point 1 Anchor Face A significant pause within the non-commercial section appears in the 2 Significant pause & surrounding observation window. non-commercial An audio pause with the duration larger than 2.0 second appears after 3 Pause the boundary point. The surrounding observation window has a significant pause with the 4 Significant pause pitch jump intensity larger than the normalized pitch threshold 1.0 and the pause duration larger than 0.5 second. A speech segment before the candidate point 5 Speech segment A speech segment starts in the surrounding observation window 6 Speech segment A commercial starts in 15 to 20 seconds after the candidate point. 7 Commercial A speech segment ends after the candidate point 8 Speech segment An anchor face segment occupies at least 10% of next window 9 Anchor face The surrounding observation window has a pause with the duration 10 Pause larger than 0.25 second. Collect Features from Training Samples One training sample b 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 f Each row represents one predicate i 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 � Face 2 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 3 0 0 0 0 0 1 0 1 1 0 1 0 1 0 1 1 0 1 0 � Motion 4 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 � Significant Pause 5 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 0 6 0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 � Speech segment 7 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 � Commercial 8 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 9 0 0 1 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 � Text segmentation score … 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 0 � ASR cue terms 6

Choose Model ∑ λ ⋅ f ( , ) x b 1 i i = q b x ( | ) e i Maximum entropy model λ Z ( ) x λ ∈ where f x b b ( , ), {0,1} i For example = = predicate f ' anchor face ' f ' significant pause ' 1 2 = if current observation: face YES pause = NO = = λ λ + λ q b ( YES x | ) e /( e e ) 1 1 2 = = λ λ + λ q b ( NO x | ) e /( e e ) 2 1 2 Classification: if q(b=YES|x) > 0.5, then predict YES. Background: Entropy m ∑ = − Entropy (bits) � H p log p i 2 i = i 1 Kullback-Leibler (K-L) Distance � � A measure of ‘distance’ between 2 distributions ∑ ( ) q ( x ) = D p ( x ), q ( x ) q ( x ) log KL p ( x ) x ∞ ∫ q ( x ) = or q ( x ) log dx ∞ p ( x ) - ( ) ( ) � ≥ = ⋅ = ⋅ D KL 0 , and 0 i ff p q � Not necessarily symmetric, may not satisfy triangular inequality 7

How to Determine the Weights in the Model? = T {( x b , )} q b x ( | ) Estimate from training data by � λ k k minimizing Kullback-Leibler divergence, defined as � p b x ( | ) ∑∑ = � � � p q D p q ( || ) p b x ( , )log λ q ( | ) b x λ � x b D p q ( || ) ∑∑ = − � + � p x b ( , )log q ( | ) b x constant( ) p λ x b λ Find to maximize the ‘entropy’ � i empirical ≡ ∑∑ � L ( q ) p x b ( , )log q b x ( | ) estimated distribution � λ λ p model from data x b λ Iteratively find � i ∑ ⎛ ⎞ � p x b f x b ( , ) ( , ) ′ 1 log λ = λ + Δ λ ⎜ ⎟ Δ λ = i x b , ∑ i i i ⎜ ⎟ i � M p x q b x f x b ( ) ( | ) ( , ) ⎝ ⎠ λ i x b , • The objective function is convex. So the iterative process can reach the optimum. The same model used to select features I nput: collection of candidate features, training � samples, and the desired model size Output: optimal subset of features and their � corresponding exponential weights q h Current model augmented with feature with � α weight , α h x b ( , ) e q b x ( | ) = q ( | ) b x α , h Z ( ) x α q Select the candidate which improves current model � the most, in each iteration; { } { } = � − � * Reduction of divergence h arg max sup D p q ( || ) D p q ( || ) α , h ∈ α h C { } { } = − arg max sup L ( q ) L ( ) q � α � p , h p ∈ α h C Increase of log-likelihood 8

Statistical Paradigm Many problems can be posed as pattern - PDF document

EE 6882 Overview of Statistical Models for Video Indexing Prof. Shih-Fu Chang Columbia University TA: Eric Zavesky Fall 2007, Lecture 4 Course web site: http:/ / www.ee.columbia.edu/ ~ sfchang/ course/ svia 1 Statistical Paradigm Many

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Prolog Declarative/logic paradigm Functional paradigm No assignment statement

PARADIGM Erkin Otles CS 838 PARADIGM Approach We developed an approach called PARADIGM

Probabilistic Foundations of Statistical Network Analysis Chapter 5: Statistical modeling paradigm

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

ESG Criteria: ESG Criteria: ESG Criteria: ESG Criteria: New paradigm that will redefine the

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

TOWARDS AN EXCITING NEW TOWARDS AN EXCITING NEW PARADIGM IN TEACHING PARADIGM IN TEACHING

Catch-up growth and productive capacities: A new development paradigm? development paradigm?

Technology: Changing the Genealogical Paradigm - 1 T echnology: T echnology: Shifting the

Paradigm Paradigm Network Services Summit Shepherd Center, Atlanta GA October 9 11 2011 October

New quality paradigm: New quality paradigm: Quality by Design Quality by Design ICH

A Rendezvous-based Paradigm A Rendezvous-based Paradigm for Analysis of Solicited and for

New Paradigm in Himalayan Foreland Exploration: New Paradigm in Himalayan Foreland Exploration:

Opportunistic Computing Opportunistic Computing : A New Paradigm : A New Paradigm for Scalable

Digital Signal Processing Solutions Digital Signal Processing Solutions SIGNAL PROCESSING

Spacetime finite element methods in biomedical applications Olaf Steinbach Institut f ur

Optimization of Decision Trees for TCP Performance RCA Marco Weiss advised by Simon Bauer,

P t rs r

Abelian varieties without algebraic geometry (revised slides) Everett W. Howe Center for

Architecting and Programming a Hardware-Incoherent Multiprocessor Cache Hierarchy Wooil Kim,

Estimating Treatment Effects in the Presence of Correlated Binary Outcomes and Contemporaneous

Embeddability and Decidability in the Turing Degrees Antonio Montalb an. University of

CSE 341 Lecture 21 delayed evaluation; thunks; streams slides created by Marty Stepp