EE 6882 Statistical Methods for Video Indexing and Analysis Fall 2004 Prof. Shih-Fu Chang http://www.ee.columbia.edu/~sfchang Lecture 1 part A (9/8/04) 1
EE E6882 SVIA Lecture #1 Part I � Introduction � Course Syllabus � Readings � A. Jain et al, "Statistical Pattern Recognition: A Review," IEEE Tran. on Pattern � Analysis and Machine Intelligence, vol 22, No 1, Jan. 2000. Gonzalez and Woods, Digital Image Processing, 2nd edition, Prentice Hall, 2001 � (Chapter 12, Object recognition) Anil K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989. � (Chapter 9.14) Part II � Introduction of a simple image search system � Image feature extraction � Similarity matching, Performance metrics � Readings � J. R. Smith and S.-F. Chang, "Visually Searching the Web for Content," IEEE � Multimedia Magazine, Summer, Vol. 4 No. 3, pp.12-20, 1997. John R. Smith, Shih-Fu Chang. “VisualSEEk: a Fully Automated Content-Based � Image Query System,” In ACM Multimedia, Boston, MA, November 1996. EE6882-Chang 2
Problems in Video Indexing and Analysis Indexing, search, and retrieval for images and videos � � See Columbia’s WebSEEk and EdSearch demos � Goggle image search? � “find video clips of basketball going through the hoop” � “find images containing shape shown in the sketch” Automatic annotation of visual content � � (e.g., recognition of text, face, scene, vehicle, location, etc) Automatic parsing of video programs into structures � � (e.g., break videos into shots, scenes, and stories) Event detection � � (e.g., sports events, human activities, meetings, medical, and other spatio-temporal patterns) Summary � � e.g., topic clustering, highlight generation � See Columbia’s sports highlight, news topic clustering demo EE6882-Chang 3
Examples of object recognition and structure parsing problems How to detect and recognize the characters and words? (Demo) How to detect the boundaries of programs, stories, and story shot commercials? anchor shot EE6882-Chang 4
Statistical Paradigm � Many problems can be posed as pattern recognition � (e.g., Matlab statistical classification demo) � Statistical models to handle uncertainty and provide flexibility � Rich tools for learning and prediction � Image processing toolkits available � Increasing benchmark data � (e.g., NIST TREC Video) EE6882-Chang 5
A Very High-Level Stat. Pattern Recog. Architecture (From Jain, Duin, and Mao, SPR Review, ’99) EE6882-Chang 6
Important issues Image/video pre-processing – quality, resolution etc � Feature extraction � � Color, texture, motion, shape, layout, regions, parts, etc Feature representation � � Discrete vs. continuous, vectorization, dimension � Invariance to scale, rotation, translation … Feature selection � � PCA, MDS, Kernel PCA, etc Classification models � � Generative vs. discriminative � Multi-modal fusion, early fusion vs. late fusion Size of training/test data and manual supervision efforts � Validation and evaluation processes � Complexity � EE6882-Chang 7
Some examples of feature representation Features determine the patterns � and their separability E.g., � Angular distance for closed � shapes Part features for iris flowers � EE6882-Chang 8
Another example of feature Bankers Asso. Font used on � personal checks Use magnetic ink and reader � to simplify segmentation Feature: the horizontal scan � of the rate of increase/decrease of the character area Peaks and zeros are � arranged to be located at the vertical grid lines � can be sampled accurately Patterns can be easily � distinguished EE6882-Chang 9
Classification Paradigms x 2 Likelihood Decision f(x) > 0 P ( x|C=1 ) > or < P ( x|C=2 ) Boundary + ++ + + Class 1 Class 2 + + + + + + + + + + + + + + - + + + - - + - - + + - + + + - -- - -- - - - - - - - - + - - - - - - - - - - + - - - - - f(x) < 0 - + - - - - - - - - x 1 x x 0 (Height, f(x) discriminant function income, …) C ( x 0 ) = ? Discriminative Probabilistic EE6882-Chang 11
Training / Validation / Testing Training Validation Testing x(2) x(2) x(2) + + + + - - ++ - + + + + - + + + - - - - - + - - - - - x(1) x(1) x(1) Select optimal Evaluate optimal features, hypothesis performance models, parameters through over test data validation � Assume the same distribution in different set, otherwise the optimal solution from validation may not be optimal in test data EE6882-Chang 12
Training / Validation / Testing (cont.) � Multiple validation sets can be used for different optimization steps. Optimal classifier using feature 1 Val - 1 Optimal classifier Optimal classifier Val - 2 fusing multiple features using feature 2 Val - 1 … … � Cross validation, leave-one-out 1 2 … K Rotate the choice of the test set and average the Training Testing performance over runs EE6882-Chang 13
Curse of Dimensionality and Overtraining x(2) Overtraining A case of overtraining + + - + + + + - + + + - + - - - - - - - - x(1) Rule of thumb – (# of training patterns per class) / (# of features) > 10 EE6882-Chang 14
About the course Objectives: � � Learn how to formulate and solve problems in this field � Feature extraction, object/event recognition, structure detection, video search and retrieval � Get insights and experience of recent machine learning techniques � Statistical, Bayesian, Neural Network, PCA, HMM, SVM � Have fun in experimenting with actual visual classification/indexing problems Intended Audience � � Beginning graduate students or professionals � familiar with signal/image processing � comfortable with probability, statistics, linear algebra, and some machine learning EE6882-Chang 15
Course Format Overview Lectures + student presentations + final projects � I will give several overview lectures at the beginning. � Student paper presentation � One paper assigned to each student � assignments determined 3 weeks in advance � CVN students present over the phone � Everyone writes comments before and after class on the class wiki site � (starting the 3 rd week) One written exam after all presentations � test understanding of concepts discussed throughout the course � One term project at the end of the course � Grading � Paper presentation/demo 30% � Exam 30% Final Project 40% EE6882-Chang 16
Paper review and demo � Each student discusses paper and demos with me and TA 2 weeks before class � Week 1: review and research � Week 2: simulate a toy problem using available data set and tools � Week 3: prepare presentation � Upload the slide and codes to the class wiki site before class � Presentation � 30 mins each paper (including demo) � I will provide additional materials about the subject. EE6882-Chang 17
Paper Review and Demo (2) Review � � Background review and examples � Problem addressed and main ideas � Insights about why it works � Limitation, generality, and repeatability � Alternatives and comparisons Demo � � Software and data available and repeatable? � Reconstruct the method and try on toy data set? (from some available generic toolkit) � Analysis of results (not just accuracy numbers, offer explanations and verifiable theories about observations) � Demo code archived on class site and shared with others EE6882-Chang 18
Resources and Matlab Links on the class web site � � Tutorials on paper writing, Matlab, etc Software links on web site to � Matlab, Neural Network, HMM, Netlab, SVM SVIA EE6882 Class Dataset � � Benchmark data set, a few thousands of images from broadcast news and stock photos � Extracted features and labels � Will distribute on a DVD for class project use only Matlab is recommended for programming � � Accessible in Mudd 251 Computer Lab � Need CU ACIS account � Very brief introduction next week EE6882-Chang 19
Paper categories Problems � � Feature extraction and image search � Image/video classification � Interactive image retrieval � Video structure parsing � Multimedia information retrieval Statistical Techniques � � Bayesian, factor graph, graphical model � SVM and variations � Language model, relevance model from IR � HMM and variations � others EE6882-Chang 20
� A few papers reviewed last year EE6882-Chang 21
Maximum Entropy Fusing τ Objective: a story boundary at time ? � k (Hsu and Chang) τ = { shot boundaries or significant pauses} � k observation time τ − τ + τ k 1 k 1 k {video, audio} a static face? motion energy changes? change from music to speech? speech segment? {cue words} i appear {cue words} j appear EE6882-Chang 22
Bayesian Image Classification (Valaiya et al 98 and 01) How to select the categories � and tree? How to estimate the � distributions of features for each class? EE6882-Chang 23
Concept (In)Dependence (Naphade et al) EE6882-Chang 25
Boosting (Tieu and Viola) Extract > 45K selective efficient features by multi-scale filtering Classifier combination and sample re-weighting EE6882-Chang 26
Recommend
More recommend