Applications to high dimensional problems Francesca Odone and - - PowerPoint PPT Presentation
Applications to high dimensional problems Francesca Odone and - - PowerPoint PPT Presentation
Applications to high dimensional problems Francesca Odone and Lorenzo Rosasco RegML 2013 Application domains Machine Learning systems are trained on examples rather than being programmed Regularization Methods for High Dimensional Learning
Regularization Methods for High Dimensional Learning Intro
Machine Learning
systems are trained
- n examples
rather than being programmed
Application domains
Regularization Methods for High Dimensional Learning Applications
fa
Some success stories
Pedestrian detection
speech recognition
face detection OCR
Regularization Methods for High Dimensional Learning Applications
Plan
✤ bioinformatics: gene selection (elastic net) ✤ computer vision: object detection (l1 regularization) ✤ human robot interaction : action recognition (dictionary learning and multi-class categorization) ✤ video-surveillance: pose detection (semi-supervised learning)
Regularization Methods for High Dimensional Learning Applications
Microarray analysis
Goals: ✤ Design methods able to identify a gene signature, i.e., a panel of genes potentially interesting for further streening ✤ Learn the gene signatures, i.e., select the most discriminant subset of genes on the available data
Regularization Methods for High Dimensional Learning Applications
Microarray analysis
A typical “-omics” scenario: ✤ High dimensional data - Few samples per class
- tenths of data - tenths of thousands genes
→ Variable selection ✤ High risk of selection bias
- data distortion arising from the way the data are collected due to the
small amount of data available
→ Model assessment needed ✤ Find ways to incorporate prior knowledge ✤ Deal with data visualization
Regularization Methods for High Dimensional Learning Applications
Gene selection
THE PROBLEM ✤ Select a small subset of input variables (genes) which are used for building classifiers ADVANTAGES: ✤ it is cheaper to measure less variables ✤ the resulting classifier is simpler and potentially faster ✤ prediction accuracy may improve by discarding irrelevant variables ✤ identifying relevant variables gives useful information about the nature of the corresponding classification problem (biomarker detection)
Regularization Methods for High Dimensional Learning Applications
Elastic net and gene selection
✤ Consistency guaranteed - the more samples available the better the estimator ✤ Multivariate - it takes into account many genes at once Output: ✤ One-parameter family of nested lists with equivalent prediction ability and increasing correlation among genes ✤ minimal list of prototype genes ✤ longer lists including correlated genes
min
β∈Rp ||Y − X||2 + ⌧(||||1 + ✏||||2 2)
✏ → 0 ✏1 < ✏2 < ✏3 < . . .
Regularization Methods for High Dimensional Learning Applications
Double optimization approach
✤ Variable selection step (elastic net) ✤ Classification step (RLS)
min
β∈Rp ||Y − X||2 + ⌧(||||1 + ✏||||2 2)
||Y − βX||2
2 + λ||β||2 2
for each ✏ we have to choose and ⌧
the combination prevents the elastic net shrinking effect
Mosci et al, 2008
Regularization Methods for High Dimensional Learning Applications
Dealing with selection bias
λ → (λ1, . . . , λA) τ → (τ1, . . . , τB) the optimal pair (λ∗, τ ∗) is one of the possible A · B pairs (λ, τ)
Barla et al, 2008
Regularization Methods for High Dimensional Learning Applications
Computational issues
- Computational time for LOO (for one task)
time1−optim = (2.5s to 25s) depending on the correlation parameter total time = A · B · Nsamples · time1−optim ∼ 20 · 20 · 30 · time1−optim ∼ 2 · 104s to 2 · 105s
- 6 tasks → 1 week!!
Regularization Methods for High Dimensional Learning Applications
Image understanding
✤ Image understanding is still largely unsolved ✤ today we are starting to answer more specific questions such as
- bject detection, image categorization, ...
✤ Machine learning has been the key to solve this kind of problems: ✤ it deals with noise and intra-class variability by collecting appropriate data and finding suitable descriptions ✤ Notice that images are relatively easy to gather (but not to label!)
- many benchmark datasets
- labeling tools
- and services
✤ image representations are very high dimensional
- curse of dimensionality
- computational cost at run time (while often we need real time performances)
Regularization Methods for High Dimensional Learning Applications
Adaptive representations from fixed dictionaries
✤ Overcomplete general purpose sets of features are effective for modeling visual information ✤ Many object classes have peculiar intrinsic structures that can be better appreciated if one looks for symmetries or local geometries ✤ Examples of dictionaries: ✤ wavelets, ranklets, chirplets, banks of filters, ... ✤ See for instance
- face detection [Heisele et al.,
Viola & Jones, Destrero et al.],
- pedestrian detection [Oren et al., Dalal and Triggs]
- car detection [Papageorgiou & Poggio]
Regularization Methods for High Dimensional Learning Applications
Object detection in images
✤ object detection is a binary classification problem
- image regions of variable size are classified: is it an
instance of an object or not?
✤ unbalanced classes
- in this 380x220 px image we perform ~6.5x105 tests
and we should find only 11 positives
✤ the training set contains
- images of positive examples (instances of the object)
- negative examples (background)
Regularization Methods for High Dimensional Learning Applications
- bject detection in images
xi → (φ1(xi), . . . , φp(xi))
image processing & computer vision offer a variety of local and global features for different purposes
Regularization Methods for High Dimensional Learning Applications
feature selection on fixed dictionaries
✤ We start off from an overcomplete dictionary of features ✤ ✤ usually p is big; existence of the solution is ensured, uniqueness is not ✤
- vercomplete dictionaries contain many correlated features
✤ Thus, the problem is ill-posed. We assume Φβ = Y where Φ = {Φij} is the data matrix; β = (β1, ..., βp)T vector of unknown weights to be estimated; Y = (y1, ..., yn)T output labels Selection of meaningful features subsets ✤ L1 regularization allows us to select a sparse subset of meaningful features for the problem, with the aim of discarding correlated ones D = {φγ : X → R, γ ∈ Γ} min
β∈Rp ||Y − βΦ||2 + τ||β||1
Regularization Methods for High Dimensional Learning Applications
an example of fixed size dictionary
✤ rectangle features aka Haar-like features (Viola & Jones) are one of the most effective representations of images for face detection ✤ size of the initial dictionary: a 19 x 19 px image is mapped into a 64.000-dim feature vector! ✤ the features selected
Destrero et al, 2009
Regularization Methods for High Dimensional Learning Applications
the role of prior knowledge
✤ Many image features have a characteristic internal structure ✤ An image patch is divided in regions or cells and represented according to the specific description, then all representations are concatenated ✤ many features used in computer vision share this common structure (SIFT, HOG, LBP , ...) ✤ In such cases it is beneficial to select groups of features belonging to the same region (so called Group Lasso)
Regularization Methods for High Dimensional Learning Applications
Selecting features groups
Pedestrian detection with HOG features: binary classification
B∗ = arg min
B
kY ΦBk2
2 + τ G
X
g=1
- BIg
- F
! .
Face recognition with LBP features: multi-class categorization
β∗ = arg min
β
ky Φβk2
2 + τ G
X
g=1
- βIg
- 2
!
10
−410
−310
−210
−110
−210
−1false positives per window miss rate fixed size HOG (105 blocks)[DalTri05] group lasso 50 blocks group lasso 104 blocks group lasso 210 blocks group lasso 387 blocks
Zini & Odone, 2010 Fusco et al, 2013
Regularization Methods for High Dimensional Learning Applications
adaptive dictionaries
keypoints clusters
Regularization Methods for High Dimensional Learning Applications
adaptive dictionaries
✤ sparse codes
min
D,u kx Duk2 2 + λkuk1 subject to kdik2 1.
fixed vs adaptive
Regularization Methods for High Dimensional Learning Applications
HRI: iCub recognizing actions
Gori, Fanello et al 2012
Regularization Methods for High Dimensional Learning Applications
Semi-supervised pose classification
✤ The capability of classifying people with respect to their orientation in space is important for a number of tasks
- An example is the analysis of
collective activities, where the reciprocal orientation of people within a group is an important feature
- The typical approach relies on
quantizing the possible
- rientations in 8 main angles
- Appearance changes very
smoothly and labeling may be subjective
Noceti & Odone, in preparation
Back Left Left Front Left Back Left Left Front Left Back Back Right Right Front Right Front
Regularization Methods for High Dimensional Learning Applications
Semi-supervised pose classification
✤ Instead of using a fully labeled dataset, we adopt manifold regularization with both labeled and unlabeled data: ✤ A gallery of highly representative examples is extracted from the training set for each class (e.g. with k-means, LLE, ...). These are the labeled examples ✤ The remaining data are used as unlabeled examples, letting the algorithm in charge of associating them with the most appropriate class f ∗ = arg min
f∈H
1 n
n
X
i=1
V (f(xi), yi) + λAkfk2
K + λI
u2 f T Lf
Method
- Acc. 1
- Acc. 2
MultiClass-SVM 48% 75% LAPSVM 72% 80% feature sel.+LAPSVM 79% 86% An example is correctly classified if it associated with its class or one of the two adjacent
Regularization Methods for High Dimensional Learning Applications
Appendices
✤ learning how to grasp objects ✤ learning common behaviors (dealing with variable length inputs)
Regularization Methods for High Dimensional Learning Applications
learning the appropriate type of grasp
estimate the most likely grasps estimate the hand posture vector
Regularization Methods for High Dimensional Learning Applications
learning common patterns in temporal sequences
φP
u (s) = |{(v1, v2) : s = v1uv2}|
where u 2 AP , while v1, v2 are substrings such that v1 2 AP1, v2 2 AP2, and P1 + P2 + P =| s |. The associated kernel between two strings s1 and s2 is defined as: KP (s1, s2) = hφP (s1), φP (s2)i = X
u∈AP
φP
u (s1)φP u (s2).
String length independence is achieved with an appropriate normalization ˆ KP (s1, s2) = KP (s1, s2) p KP (s1, s1) p KP (s2, s2) .
temporal sequences
{xi}N
i=1 with xi = (x1 i , x2 i , . . . , xki i )>
adaptive space quantization P-spectrum kernel for sequences