Learning invariance with a large number of objects Recognizing from - - PowerPoint PPT Presentation

learning invariance with a large number of objects
SMART_READER_LITE
LIVE PREVIEW

Learning invariance with a large number of objects Recognizing from - - PowerPoint PPT Presentation

P ATTERN R ECOGNITION FROM O NE E XAMPLE BY C HOPPING Franc ois Fleuret EPFL LCN / CVLAB Gilles Blanchard Fraunhofer FIRST 1 R ECOGNITION FROM O NE E XAMPLE Given a single training example, find the same object in the test images:


slide-1
SLIDE 1

PATTERN RECOGNITION FROM ONE EXAMPLE BY CHOPPING

Franc ¸ois Fleuret

EPFL – LCN / CVLAB

Gilles Blanchard

Fraunhofer – FIRST

1

slide-2
SLIDE 2

RECOGNITION FROM ONE EXAMPLE

Given a single training example, find the same object in the test images:

?

If we average the test on a large number of trials, an equivalent formulation is: given two images I1 and I2, are they showing the same object ?

2

slide-3
SLIDE 3

➊ Learning invariance with a large number of objects ➋ Recognizing from one example = ? No object is common to ➊ and ➋

3

slide-4
SLIDE 4

Remark

➋ Non-generative approach, no explicit model of the space of deformations ➋ Proof of concept

4

slide-5
SLIDE 5

DATABASES

➊ The COIL-100 database (100 objects, 72 images of each) ➋ Our L

A

T EX symbol database (150 symbols, 1,000 images of each)

5

slide-6
SLIDE 6

BOOLEAN FEATURES

We denote by I the image space and by f1, . . . , fK a set of binary features fk : I → {0, 1}. Each one is a disjunction of a simple edge-detectors of orientation d over a rectangular areas (x0, y0, x1, y1).

d=0 d=3 d=2 d=1 d=7 d=6 d=5 d=4 (x,y)

No invariance to 3D transformation, moderate invariance to scaling, rotation and translation.

6

slide-7
SLIDE 7

SPLITS

We denote by X an image (random variables on I) and C its class (random variables on {1, . . . , M}). We call split a mapping ψ : I → {0, 1} which splits the set of

  • bjects in two equilibrated halves:
  • P(ψ(X) = 0) = 1

2

  • P(ψ(X) = 0 | C) is 0 or 1

7

slide-8
SLIDE 8

Let C1 and C2 denote the classes of two images X1 and X2, with an equilibrated prior P(C1 = C2) = 1

2.

  • P(C1 = C2 | ψ(X1) = ψ(X2)) ≃ 1

2

  • P(C1 = C2 | ψ(X1) = ψ(X2)) = 0

With several independent splits, we could do a very good job.

8

slide-9
SLIDE 9

CHOPPING PRINCIPLE

We can easily build independent splits on the training objects and we can extend them to the whole set I with machine learning methods.

9

slide-10
SLIDE 10

CHOPPING

We consider arbitrary splits of the training object set S1, . . . , SN, and extend them to I by training predictors L1, . . . , LN: ∀n, Ln : I → R Those learners are feature-selection + linear perceptron without threshold.

10

slide-11
SLIDE 11

S 1 L =0

1

11

slide-12
SLIDE 12

S 2 L =0

2

12

slide-13
SLIDE 13

COMBINING SPLITS

To predict if two images show the same object, we estimate how many splits keep them together. The algorithm relies on the split predictors and takes into account their estimated reliability.

13

slide-14
SLIDE 14

SPLIT PREDICTOR RELIABILITY

Since we have lot of images of the training objects, we can use a validation set to estimate P(Ln | Sn)

0.05 0.1 0.15 0.2

  • 4000
  • 3000
  • 2000
  • 1000

1000 2000 3000 4000 Response Negative class Positive class 0.05 0.1 0.15 0.2

  • 4000
  • 3000
  • 2000
  • 1000

1000 2000 3000 4000 Response Negative class Positive class

It makes sense to model P(Ln | Sn = s) as a Gaussian.

14

slide-15
SLIDE 15

PREDICTION WITH ONE SPLIT

0.0005 0.001 0.0015 0.002

  • 4000
  • 3000
  • 2000
  • 1000

1000 2000 3000 4000

  • 4000
  • 3000
  • 2000
  • 1000

1000 2000 3000 4000

  • 4000
  • 3000
  • 2000
  • 1000

1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1.2

0.0005 0.001 0.0015 0.002

  • 4000
  • 3000
  • 2000
  • 1000

1000 2000 3000 4000

  • 4000
  • 3000
  • 2000
  • 1000

1000 2000 3000 4000

  • 4000
  • 3000
  • 2000
  • 1000

1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1.2

P(Ln | Sn = 0) and P(Ln | Sn = 1) P(C1 = C2 | L1

n, L2 n)

15

slide-16
SLIDE 16

The rule is similar with several splits under reasonable assumptions of conditional independence: C1 C2

  • ✭✭

❙ ❙ ❙

S1

1

S1

2

. . . S1

N

L1

1

L1

2

. . . L1

N

❅ ❅ ❤ ❤ ✓ ✓ ✓

S2

1

S2

2

. . . S2

N

L2

1

L2

2

. . . L2

N

16

slide-17
SLIDE 17

FINAL RULE

We have log P(C1 = C2 | L1, L2) P(C1 = C2 | L1, L2) = log P(L1, L2 | C1 = C2) P(L1, L2 | C1 = C2) + log P(C1 = C2) P(C1 = C2) If we denote by αj

i = P(Sj i = 1 | Lj i), we end up with the following

expression log P(C1 = C2 | L1, L2) P(C1 = C2 | L1, L2) =

  • i

log

  • α1

i α2 i + (1 − α1 i )(1 − α2 i )

  • + ρ

17

slide-18
SLIDE 18

REMARKS

➊ Splits correctly learnt are balanced, thus optimally informative ➋ Splits which are “unlearnable” are naturally ignored in the Bayesian formulation since P(S = 1 | L = l) does not depend

  • n l

18

slide-19
SLIDE 19

SMART CHOPPING

An arbitrary split can label differently very similar objects. We can improve performance by getting rid of objects difficult to learn, and re-building the predictor.

19

slide-20
SLIDE 20

RESULTS

We compare: ➊ Chopping with one example and several numbers of splits ➋ Smart chopping with one example and several numbers of splits ➌ Classical learning with several numbers of positive examples ➍ Direct learning of the similarity with a perceptron

20

slide-21
SLIDE 21

0.1 0.2 0.3 0.4 0.5 0.6 1 2 4 8 16 32 64 128 256 512 1024 1 2 4 8 16 32 Test errors (LaTeX symbols) Number of splits for chopping Number of samples for multi-example learning Chopping Smart chopping Multi-example learning Similarity learnt directly

21

slide-22
SLIDE 22

0.1 0.2 0.3 0.4 0.5 0.6 1 2 4 8 16 32 64 128 256 512 1024 1 2 4 8 16 32 Test errors (COIL-100) Number of splits for chopping Number of samples for multi-example learning Chopping Smart chopping Multi-example learning Similarity learnt directly

22

slide-23
SLIDE 23

WHY DOES IT WORK ?

We are inferring functionals which are somehow arbitrary on the training examples. However, we can expect that the training objects provide an exhaustive dictionary of invariant parts, even though they are not an exhaustive dictionary of the combined parts. Note that since splits are built independently, we avoid over-fitting when their number increases.

23

slide-24
SLIDE 24

RELATION WITH ANNS

The Chopping structure can be seen as a one-hidden layer ANN with shared weights and an ad hoc output layer. If we define ∆(α, β) = log (α β + (1 − α)(1 − β)), we have

Splits Image 1 features Image 2 features Σ Σ Σ Σ Σ Σ ∆ ∆ ∆ Σ

Can we globally learn the shared weights ?

24

slide-25
SLIDE 25

Franc ¸ois Fleuret EPFL – LCN / CVLAB francois.fleuret@epfl.ch http://cvlab.epfl.ch/~fleuret

Pattern Recognition from One Example by Chopping Franc ¸ois Fleuret and Gilles Blanchard NIPS 2005

25