Dictionaries, Manifolds and Domain Adaptation for Image and Video- - - PowerPoint PPT Presentation

dictionaries manifolds and domain adaptation for image
SMART_READER_LITE
LIVE PREVIEW

Dictionaries, Manifolds and Domain Adaptation for Image and Video- - - PowerPoint PPT Presentation

Dictionaries, Manifolds and Domain Adaptation for Image and Video- based Recognition Rama Chellappa University of Maryland Student and the teachers Major points Training and testing data come from different distributions.


slide-1
SLIDE 1

Dictionaries, Manifolds and Domain Adaptation for Image and Video- based Recognition

Rama Chellappa University of Maryland

slide-2
SLIDE 2

Student and the teachers

slide-3
SLIDE 3

Major points

  • Training and testing data come from different

distributions.

– Distributions are complex due to variations in patterns – Domain adaptation

  • Robust representations and distance measures

– Vector space vs manifolds – Euclidean vs geodesics

  • Will develop these points for two representations
  • f images and videos.

– Dictionaries – Manifolds

slide-4
SLIDE 4

Outline of the talk

  • Dictionaries

– Learning and applications to image and video-based recognition.

  • Manifolds

– Representation, inference and applications to image and video- based recognition. – Analytical and empirical

  • Domain adaptation

– How to adapt representations to new domains – Domain shifts could be due to pose, illumination, rate, time lapse, views,.. – Semi-supervised, unsupervised

  • Relies on works of Prof. Amari and Chikuse.
slide-5
SLIDE 5

Motivation - 1

slide-6
SLIDE 6

Motivation – 2

  • Task: Given a probe video of one or more subjects, retrieve their

IDs from a gallery of still face images or face videos.

  • Challenges: Getting a face image is more than half the problem

Low resolution

Blur

Pose Variation Uncontrolled Illumination Camera motion

slide-7
SLIDE 7

Dictionaries for signal and image analysis

  • Matching Pursuit algorithms Mallat (early 90’s)
  • Orthogonal matching pursuits (Pati, et al,1993, Tropp

2004)

  • Saito and Coifman, 1997
  • Etemad, Chellappa, 1997
  • Represent signals using wavelets, wavelet packets,..
  • Learning dictionary from data instead of using off-the-

shelf bases. (Olshausen and Field, 1997), …

slide-8
SLIDE 8

Modern day dictionaries

  • Represent Signals and images using signals

and images.

  • Sparse coding has neural backings.
  • Allow compositional representations
  • Dictionary updates

– Batch (Method of Optimal directions) – K-SVD

  • Dictionaries for images are more complicated

– Need to account for pose, illumination, resolution variations.

slide-9
SLIDE 9

Basic formulation

  • Assume L classes and n images per class in gallery.
  • The training images of the kth class is represented as
  • Dictionary D is obtained by concatenating all the training

images

  • The unknown test vector can be represented as a linear

combination of the training images as

  • The coefficient vector α is sparse.

Wright et al, 2009 Wagner et al, 2011

slide-10
SLIDE 10

Dictionary-based face recognition

Find the reconstruction error while representing the test image with coefficients of each class separately. Select the class giving the minimum reconstruction error.

α can be recovered by Basis Pursuit as

slide-11
SLIDE 11

Learning dictionaries – K-SVD

Training faces K-SVD Learned dictionary

  • M. Aharon, M. Elad, and A. M. Bruckstein,

2006

slide-12
SLIDE 12

Outlier rejection

slide-13
SLIDE 13

The illumination problem

  • Robust albedo estimation (Biswas et al. PAMI

2009)

– Estimate albedo – Relight images with different light source direction – Use relighted images for training

slide-14
SLIDE 14

Robust estimation of albedo

+ +

Light Source Intensity Image Albedo Surface Normals Single Intensity Image Albedo + Shape Biswas, et al ICCV 2007 PAMI 2009

Inverse problem

slide-15
SLIDE 15

Albedo estimation

 Lambertian assumption

Intensity Albedo Surface Normal Light Source

 Light Source Estimated : Initial Albedo Estimate  Initial Surface Normal : Error in initial albedo estimate

slide-16
SLIDE 16

Albedo estimation

Initial albedo estimate Signal Dependent Additive Noise

Non-stationary Mean Non-stationary Variance (NMNV) model for the true unknown albedo Unbiased source assumption Uncorrelated Noise

slide-17
SLIDE 17

Estimated albedo – PIE dataset

slide-18
SLIDE 18

Relighting using the estimated albedo

slide-19
SLIDE 19

Experimental results

  • DFR – 99.17 %
  • SRC – 98.1 %
  • CDPCA – 98.83 %
  • V. M. Patel, T. Wu, S. Biswas, P. J. Phillips, and R. Chellappa, “

Dictionary-based face recognition under variable lighting and pose”, IEEE Trans, Information Forensics and Security, 2011.

Yale B data set

slide-20
SLIDE 20

 An outdoor dataset with 18 subjects with 5 gallery images each and 90 low resolution images.

Outdoor face dataset

Method Recognition

SLRFR 67% Reg. .LDA+SVM 60% CLPM 16.1% Gallery – 120 x 120 Probe – 20 x 20

BTAS 2011

slide-21
SLIDE 21

21

Video dictionaries for face recognition

Constructing distance/similarity matrices Recognition / verification ECCV 2012 Preprocessing (extract frames and detect/crop face regions) Using summarization algorithm to partition cropped face images Dictionary learning for each partition and finding sequence- specific dictionaries

ECCV 2012

1] N. Shroff, P. Turaga, and R. Chellappa, “Video precis: High lighting diverse aspects of videos,” IEEE Transactions on Multimedia, 2010., NIPS 2011

slide-22
SLIDE 22

Dictionary learning

(build sequence-specific dictionaries)

  • Let be the gallery matrix of the k-th partition of

the j-th video sequence of subject i.

  • Given , use K-SVD [2] algorithm to build a

(partition level) sub-dictionary such that

  • Concatenate the (partition-level) sub-dictionaries to

form a sequence-specific dictionary

[2] M. Aharon, M. Elad and A. M. Bruckstein, “The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006

slide-23
SLIDE 23

Recognition/Identification

  • Given the m-th query video sequence
  • We generate the partition as
  • The distance , between and (i.e.

dictionary of the p-th video sequence) is calculated as where

  • We select the best matched with such that
slide-24
SLIDE 24

MBGC recognition results

MBGC dataset: 397 walking (frontal-face) videos: 198 SDs + 199 HDs 371 activity (profile-face) videos: 185 SDs + 186 HDs

slide-25
SLIDE 25
  • Facial expression analysis using AUs and high-level knowledge

available in FACS regarding AUs composition and expression decomposition

  • AUs have ambiguous semantic descriptions so it is difficult to

accurately model them

 AU-Dictionary

– We use local features to model each AU

25

slide-26
SLIDE 26

26

slide-27
SLIDE 27

 We learn separate dictionaries

for each AU

  • AU-Dictionary is then formed

using all the individual AU dictionaries.

27

D =

AU-1 AU-2 AU-5 AU-10 AU-12 AU-23

slide-28
SLIDE 28

28

slide-29
SLIDE 29
  • Objective function to be minimized:

29

B

slide-30
SLIDE 30
  • Goal:

– To simultaneously learn structures on the expressive face and corresponding subspace representations – We want the final subspaces to be as separate as possible

  • Objective:

structures  disjoint subsets of local patch descriptors

  • :learned dictionaries for the structures
  • Learned structures for the universal expressions from the CK+ dataset

30

slide-31
SLIDE 31

31

  • Min residual error
slide-32
SLIDE 32

32

slide-33
SLIDE 33

Some additional results

  • Competitive results for iris recognition. Enables cancelability. (PAMI 2011)
  • Non-linear dictionaries through kernelization produces improvements of 5-

10% depending on the problem. (ICASSP 2012)

– Illustrated using the USPS dataset, Caltech 101 and 256 datasets.

  • Building dictionaries in the Radon transform domain yields robustness to in-

plane rotation and scale in CBIR applications. (IEEE TIP)

  • Characteristic views (Chakravarthy and Freeman) can be built using sparse

representation theory. (ICIP 2012)

  • Joint sparsity driven dictionary learning produces improvements in multi-

modal biometrics applications. (Under review)

  • Reconstruction from sparse gradients (IEEE TIP 2012) in collaboration with

Anna Gilbert.

slide-34
SLIDE 34

Domain adaptation: Motivation

Image credit: Saenko et al., ECCV 2010, Bergamo et al., NIPS 2010

1 S. J. Pan and Q. Yang. A survey on transfer learning.

IEEE Trans. Knowledge and Data Engineering, 22:1345 –1359, October 2010.

Transfer Learning1  P(Y|X) ≠ P(Y’|X’), P(X) ≈ P(X’) Domain adaptation  P(X) ≠ P(X’), P(Y|X) ≈ P(Y’|X’) Source domain Data: X, Labels: Y Target domain Data: X’, Labels: Y’

slide-35
SLIDE 35

Domain adaptation - Related work

Semi-supervised

  • Learns domain change

through correspondence

– Daume and Marcu, JAIR ’06 – Duan et al., ICML ’09 – Xing et al., KDD ’07 – Saenko et al., ECCV 2010, Kulis et al., CVPR 2011 – Bergamo and Torresani, NIPS 2010 – Lai and Fox, IJRR 2010

Unsupervised

  • No correspondence, no

knowledge of domain change

– Ben-David et al., AISTATS ’10 – Blitzer et al., NIPS ’08 – Wang and Mahadevan, IJCAI ’09 – Gopalan, Li and Chellappa, ICCV 2011 – Gong et al, .. CVPR 2012 – Zheng and Chellappa, ICPR 2012

  • D. Xu’s group, 2012
slide-36
SLIDE 36

Unsupervised domain adaptation*

Domain 1 (labeled) Domain 2 (unlabeled)

GN,d

Labeled source domain (X)

S1 S1.3 S1.6 S2

Unlabeled target domain (X~)

Intermediate domains (Incremental learning)

*R. Gopalan, R. Li, R. Chellappa, “Domain adaptation for object recognition: An unsupervised approach”,

International Conference on Computer Vision, ICCV 2011 (Oral)

Generative subspace from X Generative subspace from X~ (no labels)

slide-37
SLIDE 37

Domain adaptation of dictionaries

  • Assume there exist K intermediate domains which smoothly

bridge the information gap between the source and target domain. A domain dependent dictionary is learned for each intermediate domain .

  • We learn the intermediate data to approximate the observations in

the corresponding intermediate domains. The intermediate data is then utilized to build classifiers.

  • Intermediate domains can be derived as solutions to an optimization

problem on a Grassmannian (ongoing work)

The top half of the figure shows some intermediate images synthesized from a given source image of frontal view (in red box). The bottom half shows the intermediate images generated from a given target image of side view (in green box).

{Sk}k=1

K

Sk Dk

slide-38
SLIDE 38

Learn the intermediate domains

  • The reconstruction residue of the target data decomposed with

the source domain dictionary is utilized as an estimate of the information gap between two domains.

  • The dictionaries for intermediate domains are learned

sequentially until the information gap falls below a threshold.

slide-39
SLIDE 39

Learn the intermediate data

  • The intermediate data are generated pertaining to the transition path

formed by the learned intermediate domains.

Generate intermediate data from the source data

Generate intermediate data from target data

slide-40
SLIDE 40
  • DA Classifier Invariant Codes (DAC-IC).
  • Sparse codes are demonstrated to be invariant across the

source, the target, and the intermediate domains.

  • DA Classifier Transition Path (DAC-TP).
  • The intermediate data are exploited to define the distance

between the labeled source data and the target data

  • Extension to semi-supervised adaptation (with a small

amount of labels in the target domain)

  • Given unlabeled target data, compute its distance with labeled

source data and labeled target data.

Recognition under domain shift

xs,i xt,i

d = || xs,i

(k) − xt,i (k) ||2 2 k=0 K +1

⋅⋅⋅

(∗)

(∗)

slide-41
SLIDE 41

. .

Experiments on face recognition

  • FR across pose variation on PIE (Source: frontal images; Target:

non-frontal images)

  • FR across blur and illumination variation on PIE (Source: sharp images,

illuminated by one set of light sources; Target: blurred images, illuminated by a different set of light sources)

slide-42
SLIDE 42

Experiments on object recognition

Comparison of top five matches of our method (smaller images in the top row) and Grassmann manifold based DA (smaller images in the bottom row).

slide-43
SLIDE 43

Calculate the geodesic path starting from the source domain to the target domain. Project each

sample from both domains onto all the intermediate subspaces. Design classifiers using the projected data.

PCA geodesic

t

X S

1

S

PCA

s

X

samples from the source domain samples from the target domain

source subspace target subspace Grassmann manifold

Grassmann manifold-based domain adaptation

slide-44
SLIDE 44

Creating intermediate subspaces on manifolds

Algorithm1: Numerical computation of the velocity matrix – The inverse exponential map [1]

[1] K. Gallivan, A. Srivastava, X. Liu, and P. Van Dooren, Efficient algorithms for inferences on Grassmann manifolds, IEEE Statistical Signal Processing Workshop, pp. 315-318, St. Louis, MO USA, Sep 2003

Algorithm2: Computing the exponential map and sampling points along the geodesic [1]

J B t Q t ) ' exp( ) ' ( = Ψ

The sub-matrix A specifies the direction and the speed of geodesic flow.

slide-45
SLIDE 45
  • Project each sample from both domains onto all the intermediate subspaces

geodesic source subspace target subspace

S

1

S

1

t

S

2

t

S

Grassmann manifold Collection of infinite intermediate subspaces

slide-46
SLIDE 46

Legend: Case-A: Linear domain representation with standardized features (recently made available with the dataset) Case-B, B1: Linear and non-linear domain representation with standardized (B) and protocol-based (B1) features (that we generate using the protocol) Case-C, C1: Physically relevant adaptation by simulating intermediate domains by varying proportions of source and target data (C), doing geodesic sampling between the physically domains (C1). C and C1 use both linear and non-linear domain representations Case D: Boosted adaptation – combines all the above cases in a multi-class boosting setting.

Unsupervised adaptation on Office dataset using finite intermediate subspaces

slide-47
SLIDE 47

Semi-supervised adaptation using finite intermediate subspaces

Legend: The un-primed cases (A,B,B1,C,C1,D) are same as unsupervised but for using target labels in learning the classifier (i.e. semi-supervised, so we have some target labels) The primed cases (B’,B1’,C’,C1’) are same as their unprimed counterparts but uses target labels in “both” intermediate domain generation stage “and” the classification stage

slide-48
SLIDE 48

Performing recognition under domain Shift

  • Modified data representation:

– Project every labeled source data, and unlabeled target data on all the domains (subspaces), and concatenate each of them into a long vector

  • Classification:

– Train a discriminative learner on the projected source data

  • Used partial least squares

– Use the PLS latent space to estimate the labels of the projected target data

slide-49
SLIDE 49

Office data set (Saenko et al., ECCV ’10)

query retrievals

3 domains (webcam, dslr, amazon) – 31 object categories

slide-50
SLIDE 50

Multi-domain adaptation on Office dataset

Legend: Cases A and D are same as that for unsupervised and semi-supervised

slide-51
SLIDE 51

Unsupervised and semi-supervised adaptation on Bing dataset

slide-52
SLIDE 52

Some additional results on manifolds

  • Dynamic models for actions and faces in videos

leading to Stiefel manifolds and appropriate inference mechanisms (PAMI 2011)

  • Recognition of group activities – IJCV (2013)
  • Video summarization (NIPS 2011)
  • Alignment manifold (PAMI 2012)
  • Age estimation and face recognition across

aging on a Grassmannian manifold (TIFS, 2013)

  • Fast approximate NN search on a manifold

(ICVGIP 2010).

slide-53
SLIDE 53

Closing remarks

  • Dictionaries and manifolds are useful for image and video-based

recognition.

– Appearance and geometry can be integrated in a nice way fully exploiting data explosion. – Should address challenges due to pose, illumination, occlusion, resolution,…etc.

  • Dictionary-based methods have neural basis.
  • Domain adaptation methods can address training/testing data

scalability.

  • Domain adaptation methods nicely bridge computer vision and

pattern recognition.

  • More of

– Math, data and computing.

  • Recently completed the evaluation of an iris sensor adaptation algorithm on

100GB of UND data!

– Exciting times are ahead.