[PPT] - TextonBoost : : TextonBoost Joint Appearance, Shape and Context PowerPoint Presentation

SLIDE 1

TextonBoost TextonBoost : :

Joint Appearance, Shape and Context Joint Appearance, Shape and Context Modeling for Multi Modeling for Multi-

Class Object

Class Object Recognition and Segmentation Recognition and Segmentation

J. Shotton
J. Shotton*

*, J. Winn

, J. Winn†

†, C. Rother

, C. Rother†

†,

, and and A. Criminisi

A. Criminisi†

† * * University of Cambridge

University of Cambridge

† † Microsoft Research Ltd, Cambridge, UK

Microsoft Research Ltd, Cambridge, UK

SLIDE 2

Introduction Introduction

Simultaneous recognition and

Simultaneous recognition and segmentation segmentation

Explain every pixel (dense features)

Explain every pixel (dense features)

Appearance + shape + context

Appearance + shape + context

Exploit class generalities + image

Exploit class generalities + image specifics specifics

Contributions

Contributions

New low

New low-

level features

level features

New texture

New texture-

based discriminative model

based discriminative model

Efficiency and scalability

Efficiency and scalability

Example Results

SLIDE 3

Structure of Presentation Structure of Presentation

The MSRC 21

The MSRC 21-

Class Object Recognition

Class Object Recognition Database Database

New

New ‘ ‘Shape Filter Shape Filter’ ’ Features Features

Randomised boosting with Shared Features

Randomised boosting with Shared Features

Adapting to the Pascal VOC Challenge

Adapting to the Pascal VOC Challenge

SLIDE 4

Image Databases Image Databases

MSRC 21

MSRC 21-

Class Object Recognition Database

Class Object Recognition Database

591 hand

591 hand-

labelled images ( 45% train, 10% validation, 45% test )

labelled images ( 45% train, 10% validation, 45% test )

Corel

Corel ( 7 ( 7-

class ) and

class ) and Sowerby Sowerby ( 7 ( 7-

class )

class ) [He [He et al. et al. CVPR CVPR 04] 04]

SLIDE 5

Sparse Sparse vs vs Dense Features Dense Features

Successes using sparse features, e.g.

Successes using sparse features, e.g.

[ [Sivic Sivic et al. et al. ICCV 2005], ICCV 2005], [Fergus

[Fergus et al. et al. ICCV 2005], [ ICCV 2005], [Leibe Leibe et al. et al. CVPR CVPR 2005] 2005]

But

But… …

do not explain whole image

do not explain whole image

cannot cope well with all object classes

cannot cope well with all object classes

We use

We use dense dense features features

‘

‘shape filters shape filters’ ’

local texture

local texture-

based image descriptions

based image descriptions

Cope with

Cope with

textured and

textured and untextured untextured objects, occlusions,

bjects, occlusions,

whilst retaining high efficiency whilst retaining high efficiency

problem images for sparse features?

SLIDE 6

Textons Textons

Shape filters use

Shape filters use texton texton maps maps

[ [Varma Varma & Zisserman IJCV & Zisserman IJCV 05] 05] [Leung & Malik IJCV 01] [Leung & Malik IJCV 01]

Compact and efficient characterisation of local

Compact and efficient characterisation of local texture texture

Texton map

Colours Texton Indices

Input image

Clustering

Filter Bank

SLIDE 7

Shape Filters Shape Filters

Pair:

Pair:

Feature responses

Feature responses v v( (i i, , r r, , t t) )

Integral images

Integral images

rectangle r texton t

( , )

v v( (i i1

1,

, r r, , t t) = ) = a a v v( (i i2

2,

, r r, , t t) = 0 ) = 0 v v( (i i3

3,

, r r, , t t) = ) = a/2 a/2

appearance context

SLIDE 8

feature response image v(i, r1, t1) feature response image v(i, r2, t2)

Shape and Appearance Shape and Appearance

( , )

(r (r1

1,

, t t1

1) =

) =

( , )

(r (r2

2,

, t t2

2) =

) = t t1

1

t t2

2

t t3

3

t t4

4

t t0

texton map ground truth texton map

SLIDE 9

summed response images v(i, r1, t1) + v(i, r2, t2)

Shape and Appearance Shape and Appearance

( , )

(r (r1

1,

, t t1

1) =

) =

( , )

(r (r2

2,

, t t2

2) =

) = t t1

1

t t2

2

t t3

3

t t4

4

t t0

texton map ground truth texton map

summed response images v(i, r1, t1) + v(i, r2, t2)

texton map

SLIDE 10

Shape Shape-

Texture Potentials

Texture Potentials

Joint Boost algorithm

Joint Boost algorithm

[ [Torralba Torralba et al. et al. CVPR 2004] CVPR 2004]

iteratively combines many shape filters

iteratively combines many shape filters

builds multi

builds multi-

class logistic classifier

class logistic classifier

Resulting combination exploits:

Resulting combination exploits:

Shape

Shape-

Texture potentials:

Texture potentials:

shape-texture potentials logistic classifier

Texture Shape Context (!)

SLIDE 11

Feature Selection by Boosting Feature Selection by Boosting

input image inferred segmentation colour = most likely label confidence white = high entropy black = low entropy 30 rounds 2000 rounds 1000 rounds

SLIDE 12

Feature Selection by Boosting Feature Selection by Boosting

input image confidence white = high entropy black = low entropy 30 rounds 2000 rounds 1000 rounds inferred segmentation colour = most likely label

SLIDE 13

non-randomised boosting randomised boosting

Randomised Boosting Randomised Boosting

Avoid expensive search over all features

Avoid expensive search over all features

nly check random fraction (e.g. 0.3%) at each round
nly check random fraction (e.g. 0.3%) at each round
ver several thousand rounds probably try all possible
ver several thousand rounds probably try all possible

features features

non-randomised boosting randomised boosting

SLIDE 14

Accurate Segmentation? Accurate Segmentation?

Shape

Shape-

texture potentials alone

texture potentials alone

effectively recognise objects

effectively recognise objects

but not sufficient for pixel

but not sufficient for pixel-

perfect

perfect segmentation segmentation

Conditional Random Field

Conditional Random Field (CRF) (CRF) – – see oral presentation tomorrow! see oral presentation tomorrow!

shape-texture + CRF

SLIDE 15

Adapting Adapting TextonBoost TextonBoost to the to the Pascal VOC Challenge Pascal VOC Challenge

SLIDE 16

Training Training

Pascal training data is bounding boxes.

Pascal training data is bounding boxes.

Need

Need pixelwise pixelwise labelling labelling – – use use GrabCut GrabCut based on based on bounding box (noisy labelling!): bounding box (noisy labelling!):

Add

Add ‘ ‘background background’ ’ label for non label for non-

object regions
bject regions

and train background class. and train background class.

~1 day training time (for 10 classifiers on 1/3

~1 day training time (for 10 classifiers on 1/3 d t ) data)

SLIDE 17

Results Results

SLIDE 18

Classification (competition 1) Classification (competition 1)

To give uncertainty measure, use only boosted

To give uncertainty measure, use only boosted softmax softmax classifier and normalised sum of classifier and normalised sum of classifier over all image pixels. classifier over all image pixels.

bicycle bus car cat cow dog horse motorbike person sheep

0.873 0.86 4 0.88 7 0.822 0.85 0.76 8 0.75 4 0.844 0.715 0.86 6

Area under curve (AUC)

VOC experiments by Jamie Shotton VOC experiments by Jamie Shotton

Test time: 30sec image (three seconds per

Test time: 30sec image (three seconds per classifier) classifier)

SLIDE 19

Detection (competition 3) Detection (competition 3)

Work in progress:

Work in progress: scale/viewpoint invariant scale/viewpoint invariant Layout Layout Consisent Consisent Random Field Random Field

Input image Layout-consistent regions Instance labelling

T1 T2 T3 T1 T2

SLIDE 20

Detection (competition 3) Detection (competition 3)

Work in progress:

Work in progress: scale/viewpoint invariant scale/viewpoint invariant Layout Layout Consisent Consisent Random Field Random Field

Instead, used connected

Instead, used connected-

components of most

components of most probable labelling (ignoring if <1000 pixels) and probable labelling (ignoring if <1000 pixels) and then computed normalised sum (as before) then computed normalised sum (as before)

bicycle bus car cat cow dog horse motorbike person sheep

0.249 0.13 8 0.25 4 0.151 0.14 9 0.11 8 0.09 1 0.178 0.030 0.13 1

Average precision (AP)

SLIDE 21

Suggestions for Pascal VOC 2007 Suggestions for Pascal VOC 2007

Include other types of object classes:

Include other types of object classes:

unstructured classes (e.g. sky, grass)

unstructured classes (e.g. sky, grass)

semi

semi-

structured classes (e.g. building).

structured classes (e.g. building).

Have small number of pixel

Have small number of pixel-

wise labelled

wise labelled images and include a segmentation images and include a segmentation competition. competition.

Keep it hard!!!

Keep it hard!!!

SLIDE 22

Thank you Thank you

TextonBoost TextonBoost code will be available shortly from code will be available shortly from

http://mi.eng.cam.ac.uk/~jdjs2/