EE 6882 Statistical Methods for Video Indexing and Analysis Fall - - PowerPoint PPT Presentation

ee 6882 statistical methods for video indexing and
SMART_READER_LITE
LIVE PREVIEW

EE 6882 Statistical Methods for Video Indexing and Analysis Fall - - PowerPoint PPT Presentation

EE 6882 Statistical Methods for Video Indexing and Analysis Fall 2004 Prof. Shih-Fu Chang http://www.ee.columbia.edu/~sfchang Lecture 1 part A (9/8/04) 1 EE E6882 SVIA Lecture #1 Part I Introduction Course Syllabus Readings


slide-1
SLIDE 1

1

EE 6882 Statistical Methods for Video Indexing and Analysis

Fall 2004

  • Prof. Shih-Fu Chang

http://www.ee.columbia.edu/~sfchang Lecture 1 part A (9/8/04)

slide-2
SLIDE 2

2 EE6882-Chang

EE E6882 SVIA Lecture #1

  • Part I
  • Introduction
  • Course Syllabus
  • Readings
  • A. Jain et al, "Statistical Pattern Recognition: A Review," IEEE Tran. on Pattern

Analysis and Machine Intelligence, vol 22, No 1, Jan. 2000.

  • Gonzalez and Woods, Digital Image Processing, 2nd edition, Prentice Hall, 2001

(Chapter 12, Object recognition)

  • Anil K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989.

(Chapter 9.14)

  • Part II
  • Introduction of a simple image search system
  • Image feature extraction
  • Similarity matching, Performance metrics
  • Readings
  • J. R. Smith and S.-F. Chang, "Visually Searching the Web for Content," IEEE

Multimedia Magazine, Summer, Vol. 4 No. 3, pp.12-20, 1997.

  • John R. Smith, Shih-Fu Chang. “VisualSEEk: a Fully Automated Content-Based

Image Query System,” In ACM Multimedia, Boston, MA, November 1996.

slide-3
SLIDE 3

3 EE6882-Chang

Problems in Video Indexing and Analysis

  • Indexing, search, and retrieval for images and videos

See Columbia’s WebSEEk and EdSearch demos Goggle image search? “find video clips of basketball going through the hoop” “find images containing shape shown in the sketch”

  • Automatic annotation of visual content

(e.g., recognition of text, face, scene, vehicle, location, etc)

  • Automatic parsing of video programs into structures

(e.g., break videos into shots, scenes, and stories)

  • Event detection

(e.g., sports events, human activities, meetings, medical, and

  • ther spatio-temporal patterns)
  • Summary

e.g., topic clustering, highlight generation See Columbia’s sports highlight, news topic clustering demo

slide-4
SLIDE 4

4 EE6882-Chang

Examples of object recognition and structure parsing problems

shot story anchor shot

How to detect and recognize the characters and words? (Demo) How to detect the boundaries

  • f programs,

stories, and commercials?

slide-5
SLIDE 5

5 EE6882-Chang

Statistical Paradigm

Many problems can be posed as pattern

recognition

(e.g., Matlab statistical classification demo)

Statistical models to handle uncertainty

and provide flexibility

Rich tools for learning and prediction Image processing toolkits available Increasing benchmark data

(e.g., NIST TREC Video)

slide-6
SLIDE 6

6 EE6882-Chang

A Very High-Level Stat. Pattern Recog. Architecture

(From Jain, Duin, and Mao, SPR Review, ’99)

slide-7
SLIDE 7

7 EE6882-Chang

Important issues

  • Image/video pre-processing – quality, resolution etc
  • Feature extraction

Color, texture, motion, shape, layout, regions, parts, etc

  • Feature representation

Discrete vs. continuous, vectorization, dimension Invariance to scale, rotation, translation …

  • Feature selection

PCA, MDS, Kernel PCA, etc

  • Classification models

Generative vs. discriminative Multi-modal fusion, early fusion vs. late fusion

  • Size of training/test data and manual supervision efforts
  • Validation and evaluation processes
  • Complexity
slide-8
SLIDE 8

8 EE6882-Chang

Some examples of feature representation

  • Features determine the patterns

and their separability

  • E.g.,
  • Angular distance for closed

shapes

  • Part features for iris flowers
slide-9
SLIDE 9

9 EE6882-Chang

Another example of feature

  • Bankers Asso. Font used on

personal checks

  • Use magnetic ink and reader

to simplify segmentation

  • Feature: the horizontal scan
  • f the rate of

increase/decrease of the character area

  • Peaks and zeros are

arranged to be located at the vertical grid lines can be sampled accurately

  • Patterns can be easily

distinguished

slide-10
SLIDE 10

11 EE6882-Chang

Classification Paradigms

x

Likelihood

Probabilistic

Class 1 Class 2

x0

(Height, income, …)

P(x|C=1) > or < P(x|C=2) C(x0 )=?

x1

Decision Boundary

+ + + + + + + + + + + + + + ++ + + + + + + +

  • -
  • -
  • -
  • x2

Discriminative

+ + + + + + + +

f(x) < 0 f(x) > 0

f(x) discriminant function

slide-11
SLIDE 11

12 EE6882-Chang

Training / Validation / Testing

Assume the same distribution in different set,

  • therwise the optimal solution from validation

may not be optimal in test data

x(1) x(2) Training

  • +

+ + + + -

  • ptimal features,

models, parameters x(1) x(2) Validation

  • +

+ + +

  • Select optimal

hypothesis through validation x(1) x(2) Testing

  • +

+ ++ - +

  • Evaluate

performance

  • ver test data
slide-12
SLIDE 12

13 EE6882-Chang

Training / Validation / Testing (cont.)

Multiple validation sets can be used for different

  • ptimization steps.

Val - 1 Val - 1

Optimal classifier using feature 1

Val - 2

Optimal classifier using feature 2 Optimal classifier fusing multiple features … …

Cross validation, leave-one-out

1 2 … K

Training Testing

Rotate the choice of the test set and average the performance over runs

slide-13
SLIDE 13

14 EE6882-Chang

Curse of Dimensionality and Overtraining

Rule of thumb – (# of training patterns per class) / (# of features) > 10

x(1) x(2) Overtraining

  • +

+ + + +

  • +

+ + + +

  • -
  • A case of overtraining
slide-14
SLIDE 14

15 EE6882-Chang

About the course

  • Objectives:

Learn how to formulate and solve problems in this field

Feature extraction, object/event recognition, structure

detection, video search and retrieval

Get insights and experience of recent machine learning

techniques

Statistical, Bayesian, Neural Network, PCA, HMM, SVM

Have fun in experimenting with actual visual

classification/indexing problems

  • Intended Audience

Beginning graduate students or professionals familiar with signal/image processing comfortable with probability, statistics, linear algebra, and

some machine learning

slide-15
SLIDE 15

16 EE6882-Chang

Course Format

  • Overview Lectures + student presentations + final projects
  • I will give several overview lectures at the beginning.
  • Student paper presentation
  • One paper assigned to each student
  • assignments determined 3 weeks in advance
  • CVN students present over the phone
  • Everyone writes comments before and after class on the class wiki site

(starting the 3rd week)

  • One written exam after all presentations
  • test understanding of concepts discussed throughout the course
  • One term project at the end of the course
  • Grading
  • Paper presentation/demo 30%

Exam 30% Final Project 40%

slide-16
SLIDE 16

17 EE6882-Chang

Paper review and demo

Each student discusses paper and demos with me

and TA 2 weeks before class

Week 1: review and research Week 2: simulate a toy problem using available

data set and tools

Week 3: prepare presentation

Upload the slide and codes to the class wiki site

before class

Presentation

30 mins each paper (including demo) I will provide additional materials about the

subject.

slide-17
SLIDE 17

18 EE6882-Chang

Paper Review and Demo (2)

  • Review

Background review and examples Problem addressed and main ideas Insights about why it works Limitation, generality, and repeatability Alternatives and comparisons

  • Demo

Software and data available and repeatable? Reconstruct the method and try on toy data set?

(from some available generic toolkit)

Analysis of results (not just accuracy numbers, offer

explanations and verifiable theories about observations)

Demo code archived on class site and shared with others

slide-18
SLIDE 18

19 EE6882-Chang

Resources and Matlab

  • Links on the class web site

Tutorials on paper writing, Matlab, etc

  • Software links on web site to

Matlab, Neural Network, HMM, Netlab, SVM

  • SVIA EE6882 Class Dataset

Benchmark data set, a few thousands of images from

broadcast news and stock photos

Extracted features and labels Will distribute on a DVD for class project use only

  • Matlab is recommended for programming

Accessible in Mudd 251 Computer Lab Need CU ACIS account Very brief introduction next week

slide-19
SLIDE 19

20 EE6882-Chang

Paper categories

  • Problems

Feature extraction and image search Image/video classification Interactive image retrieval Video structure parsing Multimedia information retrieval

  • Statistical Techniques

Bayesian, factor graph, graphical model SVM and variations Language model, relevance model from IR HMM and variations

  • thers
slide-20
SLIDE 20

21 EE6882-Chang

A few papers reviewed last year

slide-21
SLIDE 21

22 EE6882-Chang

Maximum Entropy Fusing

  • Objective: a story boundary at time

?

  • = { shot boundaries or significant pauses}
  • bservation

time {video, audio}

a static face? motion energy changes? change from music to speech? speech segment? {cue words}j appear {cue words}i appear

k

τ

k

τ

1 k

τ +

1 k

τ −

(Hsu and Chang)

k

τ

slide-22
SLIDE 22

23 EE6882-Chang

Bayesian Image Classification

(Valaiya et al 98 and 01)

  • How to select the categories

and tree?

  • How to estimate the

distributions of features for each class?

slide-23
SLIDE 23

25 EE6882-Chang

Concept (In)Dependence

(Naphade et al)

slide-24
SLIDE 24

26 EE6882-Chang

Boosting

Extract > 45K selective efficient features by multi-scale filtering

Classifier combination and sample re-weighting (Tieu and Viola)

slide-25
SLIDE 25

27 EE6882-Chang

Boosting retrieval interface

User selected examples 20 retrieval results Negative images in the training set close to decision boundary Images in the testing set close to the decision boundary

Real-time evaluation of 20 features over millions of images Two class problem: relevant vs. irrelevant

slide-26
SLIDE 26

28 EE6882-Chang

Object-Word Correspondence

(Duygulu et al)

  • Model the joint distribution

between words and blobs

  • Used in automatic annotation

and retrieval

slide-27
SLIDE 27

29 EE6882-Chang

Unsupervised Video Structure Discovery: Hierarchical Hidden Markov Model

time

… … …

top-level states running pitching break bottom-level states

bench close up batter audience field bird view pitcher 1st base

Learning Multi-Level Markovian Temporal Dependence

  • High-level states represent distinct events
  • Presence of each event produces observations modeled by low-level HMMs

Baseball Example

(Xie et al)