W HAT IS IT USEFUL FOR ? The learning paradigm is useful whenever - - PowerPoint PPT Presentation

w hat is it useful for
SMART_READER_LITE
LIVE PREVIEW

W HAT IS IT USEFUL FOR ? The learning paradigm is useful whenever - - PowerPoint PPT Presentation

L EARNING AND APPLICATIONS R EGULARIZATION M ETHODS FOR H IGH D IMENSIONAL L EARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu Regularization Methods for High Dimensional Learning Learning and applications P LAN


slide-1
SLIDE 1

LEARNING AND APPLICATIONS

REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco

  • done@disi.unige.it - lrosasco@mit.edu

Regularization Methods for High Dimensional Learning Learning and applications

slide-2
SLIDE 2

PLAN

Learning and engineering applications: why? Examples of in house applications

Face and object detection Medical image analysis Microarray data analysis

Regularization Methods for High Dimensional Learning Learning and applications

slide-3
SLIDE 3

LET’S GO BACK TO THE BEGINNING

The goal is not to memorize but to generalize (or to predict) Given a set of data (x1, y1), . . . , (xn, yn) find a function f which is a good predictor of y for a future input x f(x) = y

Regularization Methods for High Dimensional Learning Learning and applications

slide-4
SLIDE 4

WHAT IS IT USEFUL FOR?

The learning paradigm is useful whenever the underlying process is partially unknown, too complex, or too noisy to be modeled as a sequence of instructions.

Regularization Methods for High Dimensional Learning Learning and applications

slide-5
SLIDE 5

THE APPLICATIONS WE DEAL WITH

Computer vision

Face detection and recognition Object detection Image annotation Dynamic events and actions analysis

Medical Image Analysis

Automatic MR annotation Dictionary learning

Computational biology

Gene selection

Regularization Methods for High Dimensional Learning Learning and applications

slide-6
SLIDE 6

THE APPLICATIONS WE DEAL WITH

Computer vision

Face detection and recognition Object detection Image annotation Dynamic events and actions analysis

Medical Image Analysis

Automatic MR annotation Dictionary learning

Computational biology

Gene selection

Regularization Methods for High Dimensional Learning Learning and applications

slide-7
SLIDE 7

THE APPLICATIONS WE DEAL WITH

Computer vision

Face detection and recognition Object detection Image annotation Dynamic events and actions analysis

Medical Image Analysis

Automatic MR annotation Dictionary learning

Computational biology

Gene selection

Regularization Methods for High Dimensional Learning Learning and applications

slide-8
SLIDE 8

THE APPLICATIONS WE DEAL WITH

Computer vision

Face detection and recognition Object detection Image annotation Dynamic events and actions analysis

Medical Image Analysis

Automatic MR annotation Dictionary learning

Computational biology

Gene selection

Regularization Methods for High Dimensional Learning Learning and applications

slide-9
SLIDE 9

THE APPLICATIONS WE DEAL WITH

Computer vision

Face detection and recognition Object detection Image annotation Dynamic events and actions analysis

Medical Image Analysis

Automatic MR annotation Dictionary learning

Computational biology

Gene selection

Regularization Methods for High Dimensional Learning Learning and applications

slide-10
SLIDE 10

THE APPLICATIONS WE DEAL WITH

Computer vision

Face detection and recognition Object detection Image annotation Dynamic events and actions analysis

Medical Image Analysis

Automatic MR annotation Dictionary learning

Computational biology

Gene selection

Regularization Methods for High Dimensional Learning Learning and applications

slide-11
SLIDE 11

THE APPLICATIONS WE DEAL WITH

Computer vision

Face detection and recognition Object detection Image annotation Dynamic events and actions analysis

Medical Image Analysis

Automatic MR annotation Dictionary learning

Computational biology

Gene selection

Regularization Methods for High Dimensional Learning Learning and applications

slide-12
SLIDE 12

PLAN

Learning and engineering applications: why? Examples of in house applications

Face and object detection Medical image analysis Microarray data analysis

Regularization Methods for High Dimensional Learning Learning and applications

slide-13
SLIDE 13

LEARNING FROM IMAGES

Object detection, image categorization and, more in general, image understanding are difficult problems Learning from examples has been accepted as a viable way to deal with such problems, addressing noise and intra-class variability by collecting appropriate data and finding suitable descriptions Images are relatively easy to gather

Regularization Methods for High Dimensional Learning Learning and applications

slide-14
SLIDE 14

IMAGE DESCRIPTIONS

WITH OVERCOMPLETE FEATURE SETS

Overcomplete general purpose sets of features are effective for modeling visual information Many object classes have peculiar intrinsic structures that can be better appreciated if one looks for symmetries or local geometries

Examples of features: wavelets, ranklets, chirplets, rectangle features, ... Examples of problems: face detection [Heisele et al., Viola & Jones, Destrero et al.], pedestrian detection [Oren et al.], car detection [Papageorgiou & Poggio]

The approach is inspired by biological systems See, for instance, B. A. Olshauser and D. J. Field “Sparse coding with an over-complete basis set: a strategy employed by V1?” 1997

Regularization Methods for High Dimensional Learning Learning and applications

slide-15
SLIDE 15

FACE DETECTION

DESTRERO ET AL, 2009

THE CLASSIFICATION PROBLEM It is a (binary) classification problem: → each image region can either be a face or not We start from a training set of faces and non-faces images: {(x1, y1), . . . , (xn, yn)} xi is a raw vector encoding the gray levels of image Ii, yi = {−1, 1} according to whether the image is a face or not IMAGE REPRESENTATION We represent images as rectangle feature vectors: xi → (φ1(xi), . . . , φp(xi))

Regularization Methods for High Dimensional Learning Learning and applications

slide-16
SLIDE 16

FACE DETECTION

ASSUMPTION We assume Φβ = Y where Φ = {Φij} is the data matrix; β = (β1, ..., βp)T vector of unknown weights to be estimated; Y = (y1, ..., yn)T output labels Usually p is big; existence of the solution is ensured, uniqueness is not The overcomplete set contains many correlated features Thus, the problem is ill-posed. We resort to regularization. SELECT FACE FEATURES L1 regularization allow us to select a sparse subset of meaningful features for the problem, with the aim of discarding correlated ones min

β∈I Rp Y − βΦ2 + λ β1 .

Regularization Methods for High Dimensional Learning Learning and applications

slide-17
SLIDE 17

A SAMPLED VERSION OF THE ALGORITHM

Applying the algorithm starting from the entire set of feature is not computationally feasible (Φ: 4000x64000 ≃ 1GB) We create many subsets of features randomly sampled with repetition We run the algorithm separately on each subset We keep only features selected in every run in which they were present

S0 Subset 1

Random extractions of 10% features w. repetition

Subset 200 Subset 2 Selected features 1 Selected features 2 Selected features 200

Thresholded Landweber

S1

Keep features selected in every run in which they were present

Regularization Methods for High Dimensional Learning Learning and applications

slide-18
SLIDE 18

THE FINAL SET OF FACE FEATURES

Positive and negative samples from the training set Notice how vertical symmetries are not captured by selected features

Regularization Methods for High Dimensional Learning Learning and applications

slide-19
SLIDE 19

THE SOLUTION DEPENDS ON THE TRAINING DATA

In MIT+CMU training set all images are registered and well cropped Vertical symmetries are captured by selected features

Regularization Methods for High Dimensional Learning Learning and applications

slide-20
SLIDE 20

FACE DETECTION

FACE CLASSIFICATION

Elastic net regularization embeds both feature selection and prediction functionalities As suggested in (Candes & Tao, 2007) in order to improve the classification performance one could use L2 regularization on the reduced data representation. Since a main requirement of our application is real-time performance we adopt a linear SVM for classification:

L1 + SVM gives us sparsity both on the representation and on the dataset and thus fewer computations

Regularization Methods for High Dimensional Learning Learning and applications

slide-21
SLIDE 21

FACE CLASSIFICATION RESULTS

0.7 0.75 0.8 0.85 0.9 0.95 1 0.005 0.01 0.015 0.02 2 stages feature selection 2 stages feature selection + correlation Viola+Jones feature selection using our same data Viola+Jones cascade performance

Our strategy for feature selection outperforms the one by Viola and Jones using the same dataset Adaboost seems to need a big number of examples to be trained effectively (we used just 4000 examples)

Regularization Methods for High Dimensional Learning Learning and applications

slide-22
SLIDE 22

FROM FACE CLASSIFICATION TO FACE DETECTION

WHY IS IT DIFFICULT?

It is very unlikely to find a face in a real image → high number of false positives

Image dimensions: 384x222px ∼ 6.5 · 105 tests in a multi-scale search with a base window of 19x19px Only 11 faces!

Regularization Methods for High Dimensional Learning Learning and applications

slide-23
SLIDE 23

FACE DETECTION:

A CASCADE OF CLASSIFIERS

For each image we have many tests to do → few positive examples and many negative examples We build a coarse-to-fine classification architecture: → Simpler classifiers are used to reject the majority of sub-windows → More complex classifiers allow us to achieve low false positive rates

Regularization Methods for High Dimensional Learning Learning and applications

slide-24
SLIDE 24

FACE DETECTION:

A CASCADE OF CLASSIFIERS

1

Start from set S of selected features

2

Choose at least 3 mutually distant features

3

Train a linear SVM classifier using those features and test it on a validation set

4

Do we reach target performance (h = 99, 5%; f = 50%)?

YES Finalize the classifier, remove used features from S and go to (2). NO Add a feature from S and go to (3).

F =

K

  • i=1

fi and H =

K

  • i=1

hi

Regularization Methods for High Dimensional Learning Learning and applications

slide-25
SLIDE 25

A PIPELINE FOR FACE AUTHENTICATION

DESTRERO ET AL., 2009

Regularization Methods for High Dimensional Learning Learning and applications

slide-26
SLIDE 26

RESULTS

Regularization Methods for High Dimensional Learning Learning and applications

slide-27
SLIDE 27

PLAN

Learning and engineering applications: why? Examples of in house applications

Face and object detection Medical image analysis Microarray data analysis

Regularization Methods for High Dimensional Learning Learning and applications

slide-28
SLIDE 28

AUTOMATIC ANNOTATION OF MR IMAGES:

SYNOVITIS ASSESSMENT

BASSO ET AL, 2010

Setting: children under 16 affected by Juvenile Idiopatic Arthritis Goal: to measure the volume of the inflamed synovia in 3D MR images Our problem: to classify each voxel of the MR The approach is supervised, we use for training the manual annotations performed by experts

Regularization Methods for High Dimensional Learning Learning and applications

slide-29
SLIDE 29

VOXEL-BASED IMAGE DESCRIPTION

Each voxel is represented with a set of cues chosen among the

  • nes commonly used for voxel classification

They include the intensity of the voxel and its neighbors, the position of the voxel, the multiscale 2-jets, the vesselness measures x → φ(x) = {ϕ1, . . . , ϕk}

Regularization Methods for High Dimensional Learning Learning and applications

slide-30
SLIDE 30

MULTI-CUE VOXEL CLASSIFIER

THE DISCRIMINANT FUNCTION We look for a more flexible discriminant function f(φ) =

  • (i,j)∈I

αj

iK j i (φ) + b

ASSUMPTION The k × n basis functions K j

i (φ) = exp

  • −||ϕj − ϕj

i||2

2σ2

  • measure the similarity of φ with an example voxel i with respect

to a specific cue j

Regularization Methods for High Dimensional Learning Learning and applications

slide-31
SLIDE 31

MULTI-CUE VOXEL CLASSIFIER

THE DISCRIMINANT FUNCTION We look for a more flexible discriminant function f(φ) =

  • (i,j)∈I

αj

iK j i (φ) + b

MODEL SELECTION The optimal subset I of basis fuctions, on which f depends, may be inferred from the data by means of feature selection. Starting from a manually annotated training set of n voxels we compute the n × kn matrix K K = (K1, . . . , Kk) and look for a sparse vector α so that y = Kα

Regularization Methods for High Dimensional Learning Learning and applications

slide-32
SLIDE 32

MULTI-CUE VOXEL CLASSIFIER

THE DISCRIMINANT FUNCTION We look for a more flexible discriminant function f(φ) =

  • (i,j)∈I

αj

iK j i (φ) + b

LEARNING ALGORITHM The goal of learning is to find the optimal affine combination defined by the coefficients αj

i and b. This is achieved with L2

regularization on the restricted matrix ˆ K

Regularization Methods for High Dimensional Learning Learning and applications

slide-33
SLIDE 33

RESULTS

Multi-cue classifier if 15 times sparser than SVM and approximately 40 times faster.

Regularization Methods for High Dimensional Learning Learning and applications

slide-34
SLIDE 34

PLAN

Learning and engineering applications: why? Examples of in house applications

Face and object detection Medical image analysis Microarray data analysis

Regularization Methods for High Dimensional Learning Learning and applications

slide-35
SLIDE 35

MACHINE LEARNING AND THE ANALYSIS OF

MICROARRAYS

GOALS Design methods able to identify a gene segnature, i.e., a panel

  • f genes potentially interesting for further screening

Learn the gene signatures, i.e., select the most discriminant subset of genes on the available data

Regularization Methods for High Dimensional Learning Learning and applications

slide-36
SLIDE 36

MACHINE LEARNING AND THE ANALYSIS OF

MICROARRAYS

A TYPICAL "-OMICS" SCENARIO High dimensional data - Few samples per class

tenths of data - tenths of thousands genes

→ Variable selection High risk of selection bias

data distortion arising from the way the data are collected due to the small amount of data available

→ Model assessment needed Possibily find ways to incorporate prior knowledge Deal with data visualization

Regularization Methods for High Dimensional Learning Learning and applications

slide-37
SLIDE 37

GENE SELECTION

THE PROBLEM Select a small subset of input variables (genes) which are used for building classifiers ADVANTAGES: it is cheaper to measure less variables the resulting classifier is simpler and potentially faster prediction accuracy may improve by discarding irrelevant variables identifying relevant variables gives useful information about the nature of the corresponding classification problem (biomarker detection)

Regularization Methods for High Dimensional Learning Learning and applications

slide-38
SLIDE 38

VARIABLE SELECTION IN BIOINFORMATICS

MOTIVATIONS Ease Computational Burden: Discard the (apparently) less significant features and train in a simplified space: alleviate the curse of dimensionality Enhance Information: Highlight (and rank) the most important features and improve the knowledge of the underlying process. COMMONLY ADOPTED METHODS Statistical Filters (t-test,S/N ratio,...) Learning Techniques (embedded methods, wrapper methods, stepwise feature elimination,..) Mapping Methods (“Metagenes”: simplified model for pathways, even though biological suggestions require caution

Regularization Methods for High Dimensional Learning Learning and applications

slide-39
SLIDE 39

STATISTICAL FILTERS

These approaches are well established in the gene selection

  • literature. One considers the various measurements associated to

each gene (column of the data matrix X)

T TEST

For each column of X we compute t = µ1 − µ2

  • σ1

n1 + σ2 n2

were subscripts 1 and 2 stand for positive and negative examples Genes are ranked with respect to the t value A threshold is set to perform gene selection

Regularization Methods for High Dimensional Learning Learning and applications

slide-40
SLIDE 40

GENE SELECTION WITH L1-L2 REGULARIZATION

MOSCI ET., 2008

min

β∈I Rp Y − βX2 + τ(β1 + ǫ β2 2).

Consistency guaranteed - the more samples available the better the estimator Multivariate - it takes into account many genes at once OUTPUT

  • ne-parameter (ǫ) family of nested lists with equivalent prediction

ability and increasing correlation among genes ǫ → 0: minimal list of prototype genes ǫ1 < ǫ2 < ǫ3 < . . .: longer lists including correlated genes

Regularization Methods for High Dimensional Learning Learning and applications

slide-41
SLIDE 41

DOUBLE OPTIMIZATION APPROACH

MOSCI ET., 2008

VARIABLE SELECTION + CLASSIFICATION: Variable selection step (L1-L2) min

β∈I Rp Y − βX2 + τ(β1 + ǫ β2 2).

Classification step (RLS) Y − βX2

2 + λ β2 2

for each ǫ we have to choose λ and τ

Regularization Methods for High Dimensional Learning Learning and applications

slide-42
SLIDE 42

A SELECTION BIAS AWARE FRAMEWORK

BARLA ET AL, 2008

λ → (λ1, . . . , λA) τ → (τ1, . . . , τB) the optimal pair (λ∗, τ ∗) is one of the possible A · B pairs (λ, τ)

Regularization Methods for High Dimensional Learning Learning and applications

slide-43
SLIDE 43

ALGORITHMIC AND COMPUTATIONAL ISSUES

FROM MANY LISTS TO ONE FINAL LIST Criterion based on frequency – i.e., occurrences of a gene across all the lists Since we have a correlation parameter we can tune and vary the list length FROM 1 WEEK COMPUTATION TO...? Computational time for LOO (for one task)

time1−optim = (2.5s to 25s) depending on the correlation parameter total time = A · B · Nsamples · time1−optim ∼ 20 · 20 · 30 · time1−optim ∼ 2 · 104s to 2 · 105s

6 tasks → 1 week!!

Regularization Methods for High Dimensional Learning Learning and applications

slide-44
SLIDE 44

COMPUTATION OVER A GRID

Grid middleware: OurGrid, a multiplatform grid that can deal with hosts not directly connected to the Internet. Used by the ShareGrid project, which involves several universities in Northern Italy. Cheap solution: 60 PCs (students: lab)

Regularization Methods for High Dimensional Learning Learning and applications

slide-45
SLIDE 45

GENE SELECTION WITH L1-L2 REGULARIZATION

DE MOL, MOSCI, TRASKINE, VERRI, 2008

Regularization Methods for High Dimensional Learning Learning and applications

slide-46
SLIDE 46

FINDING STRUCTURED GENE SIGNATURES

How do we estimate groups of correlated genes? We may rely on the nested structure obtained by varying the correlation parameter We consider the minimal list list0 as a starting point of an agglomerative clustering technique , based on the Pearson distance: d(Xi, Xj) = corr(Xi, Xj)

  • var(Xi)var(Xj)

evaluating the normalized correlation between two columns Xi and Xj of the data matrix X

Regularization Methods for High Dimensional Learning Learning and applications

slide-47
SLIDE 47

AN EXAMPLE APPLICATION

IDENTIFYING THE HYPOXIA SIGNATURE OF NEUROBLASTOMA VIA

REGULARIZATION

joint research with IGG Molecular Biology lab Dataset: 9 neuroblastoma (NB) cell lines cultured under normoxic and hypoxic conditions. Technology: Affymetrix GeneChip U133 plus 2.0. t-test: no genes selected! l1l2 protocol: 11 genes for the minimal list (frequency> 30%)

Regularization Methods for High Dimensional Learning Learning and applications

slide-48
SLIDE 48

REFERENCES

  • A. Destrero, C. De Mol, F. Odone, A. Verri. "A Regularized Framework for Feature

Selection in Face Detection and Authentication". IJCV (2009).

  • A. Destrero, C. De Mol, F. Odone, A. Verri."A sparsity-enforcing method for

learning face features.". IEEE Transactions on Image Processing 18 (2009): 188-201.

  • C. Basso, M. Santoro, A. Verri and M. Esposito. "Segmentation of Inflamed

Synovia in Multi-Modal MRI." In Proc. of IEEE ISBI 2009, June 28 - July 1 2009. Fardin, Paolo, Cornero, Andrea, Annalisa Barla, Sofia Mosci, Acquaviva, Massimo, Lorenzo Rosasco, Gambini, Claudio, Alessandro Verri, Varesio, Luigi, "Identification of multiple hypoxia signatures in neuroblastoma cell lines by l1-l2 regularization and data reduction", Journal of Biomedicine and Biotechnology, 2010

  • A. Barla, S. Mosci, L. Rosasco and A. Verri. "A method for robust variable

selection with significance assessment." Proc. of ESANN, European Symposium

  • n Artificial Neural Networks 2008.
  • C. De Mol, S. Mosci, M. Traskine and A. Verri; "A Regularized Method for

Selecting Nested Groups of Relevant Genes from Microarray Data" Journal of Computational Biology 2008.

Regularization Methods for High Dimensional Learning Learning and applications