Learning invariance with a large number of objects Recognizing from - - PowerPoint PPT Presentation
Learning invariance with a large number of objects Recognizing from - - PowerPoint PPT Presentation
P ATTERN R ECOGNITION FROM O NE E XAMPLE BY C HOPPING Franc ois Fleuret EPFL LCN / CVLAB Gilles Blanchard Fraunhofer FIRST 1 R ECOGNITION FROM O NE E XAMPLE Given a single training example, find the same object in the test images:
RECOGNITION FROM ONE EXAMPLE
Given a single training example, find the same object in the test images:
?
If we average the test on a large number of trials, an equivalent formulation is: given two images I1 and I2, are they showing the same object ?
2
➊ Learning invariance with a large number of objects ➋ Recognizing from one example = ? No object is common to ➊ and ➋
3
Remark
➋ Non-generative approach, no explicit model of the space of deformations ➋ Proof of concept
4
DATABASES
➊ The COIL-100 database (100 objects, 72 images of each) ➋ Our L
A
T EX symbol database (150 symbols, 1,000 images of each)
5
BOOLEAN FEATURES
We denote by I the image space and by f1, . . . , fK a set of binary features fk : I → {0, 1}. Each one is a disjunction of a simple edge-detectors of orientation d over a rectangular areas (x0, y0, x1, y1).
d=0 d=3 d=2 d=1 d=7 d=6 d=5 d=4 (x,y)
No invariance to 3D transformation, moderate invariance to scaling, rotation and translation.
6
SPLITS
We denote by X an image (random variables on I) and C its class (random variables on {1, . . . , M}). We call split a mapping ψ : I → {0, 1} which splits the set of
- bjects in two equilibrated halves:
- P(ψ(X) = 0) = 1
2
- P(ψ(X) = 0 | C) is 0 or 1
7
Let C1 and C2 denote the classes of two images X1 and X2, with an equilibrated prior P(C1 = C2) = 1
2.
- P(C1 = C2 | ψ(X1) = ψ(X2)) ≃ 1
2
- P(C1 = C2 | ψ(X1) = ψ(X2)) = 0
With several independent splits, we could do a very good job.
8
CHOPPING PRINCIPLE
We can easily build independent splits on the training objects and we can extend them to the whole set I with machine learning methods.
9
CHOPPING
We consider arbitrary splits of the training object set S1, . . . , SN, and extend them to I by training predictors L1, . . . , LN: ∀n, Ln : I → R Those learners are feature-selection + linear perceptron without threshold.
10
S 1 L =0
1
11
S 2 L =0
2
12
COMBINING SPLITS
To predict if two images show the same object, we estimate how many splits keep them together. The algorithm relies on the split predictors and takes into account their estimated reliability.
13
SPLIT PREDICTOR RELIABILITY
Since we have lot of images of the training objects, we can use a validation set to estimate P(Ln | Sn)
0.05 0.1 0.15 0.2
- 4000
- 3000
- 2000
- 1000
1000 2000 3000 4000 Response Negative class Positive class 0.05 0.1 0.15 0.2
- 4000
- 3000
- 2000
- 1000
1000 2000 3000 4000 Response Negative class Positive class
It makes sense to model P(Ln | Sn = s) as a Gaussian.
14
PREDICTION WITH ONE SPLIT
0.0005 0.001 0.0015 0.002
- 4000
- 3000
- 2000
- 1000
1000 2000 3000 4000
- 4000
- 3000
- 2000
- 1000
1000 2000 3000 4000
- 4000
- 3000
- 2000
- 1000
1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1.2
0.0005 0.001 0.0015 0.002
- 4000
- 3000
- 2000
- 1000
1000 2000 3000 4000
- 4000
- 3000
- 2000
- 1000
1000 2000 3000 4000
- 4000
- 3000
- 2000
- 1000
1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1.2
P(Ln | Sn = 0) and P(Ln | Sn = 1) P(C1 = C2 | L1
n, L2 n)
15
The rule is similar with several splits under reasonable assumptions of conditional independence: C1 C2
- ✭✭
❙ ❙ ❙
S1
1
S1
2
. . . S1
N
L1
1
L1
2
. . . L1
N
❅ ❅ ❤ ❤ ✓ ✓ ✓
S2
1
S2
2
. . . S2
N
L2
1
L2
2
. . . L2
N
16
FINAL RULE
We have log P(C1 = C2 | L1, L2) P(C1 = C2 | L1, L2) = log P(L1, L2 | C1 = C2) P(L1, L2 | C1 = C2) + log P(C1 = C2) P(C1 = C2) If we denote by αj
i = P(Sj i = 1 | Lj i), we end up with the following
expression log P(C1 = C2 | L1, L2) P(C1 = C2 | L1, L2) =
- i
log
- α1
i α2 i + (1 − α1 i )(1 − α2 i )
- + ρ
17
REMARKS
➊ Splits correctly learnt are balanced, thus optimally informative ➋ Splits which are “unlearnable” are naturally ignored in the Bayesian formulation since P(S = 1 | L = l) does not depend
- n l
18
SMART CHOPPING
An arbitrary split can label differently very similar objects. We can improve performance by getting rid of objects difficult to learn, and re-building the predictor.
19
RESULTS
We compare: ➊ Chopping with one example and several numbers of splits ➋ Smart chopping with one example and several numbers of splits ➌ Classical learning with several numbers of positive examples ➍ Direct learning of the similarity with a perceptron
20
0.1 0.2 0.3 0.4 0.5 0.6 1 2 4 8 16 32 64 128 256 512 1024 1 2 4 8 16 32 Test errors (LaTeX symbols) Number of splits for chopping Number of samples for multi-example learning Chopping Smart chopping Multi-example learning Similarity learnt directly
21
0.1 0.2 0.3 0.4 0.5 0.6 1 2 4 8 16 32 64 128 256 512 1024 1 2 4 8 16 32 Test errors (COIL-100) Number of splits for chopping Number of samples for multi-example learning Chopping Smart chopping Multi-example learning Similarity learnt directly
22
WHY DOES IT WORK ?
We are inferring functionals which are somehow arbitrary on the training examples. However, we can expect that the training objects provide an exhaustive dictionary of invariant parts, even though they are not an exhaustive dictionary of the combined parts. Note that since splits are built independently, we avoid over-fitting when their number increases.
23
RELATION WITH ANNS
The Chopping structure can be seen as a one-hidden layer ANN with shared weights and an ad hoc output layer. If we define ∆(α, β) = log (α β + (1 − α)(1 − β)), we have
Splits Image 1 features Image 2 features Σ Σ Σ Σ Σ Σ ∆ ∆ ∆ Σ