TextonBoost : : TextonBoost Joint Appearance, Shape and Context - - PowerPoint PPT Presentation

textonboost textonboost
SMART_READER_LITE
LIVE PREVIEW

TextonBoost : : TextonBoost Joint Appearance, Shape and Context - - PowerPoint PPT Presentation

TextonBoost : : TextonBoost Joint Appearance, Shape and Context Joint Appearance, Shape and Context Modeling for Multi- -Class Object Class Object Modeling for Multi Recognition and Segmentation Recognition and Segmentation * , J. Winn


slide-1
SLIDE 1

TextonBoost TextonBoost : :

Joint Appearance, Shape and Context Joint Appearance, Shape and Context Modeling for Multi Modeling for Multi-

  • Class Object

Class Object Recognition and Segmentation Recognition and Segmentation

  • J. Shotton
  • J. Shotton*

*, J. Winn

, J. Winn†

†, C. Rother

, C. Rother†

†,

, and and A. Criminisi

  • A. Criminisi†

† * * University of Cambridge

University of Cambridge

† † Microsoft Research Ltd, Cambridge, UK

Microsoft Research Ltd, Cambridge, UK

slide-2
SLIDE 2

Introduction Introduction

  • Simultaneous recognition and

Simultaneous recognition and segmentation segmentation

  • Explain every pixel (dense features)

Explain every pixel (dense features)

  • Appearance + shape + context

Appearance + shape + context

  • Exploit class generalities + image

Exploit class generalities + image specifics specifics

  • Contributions

Contributions

  • New low

New low-

  • level features

level features

  • New texture

New texture-

  • based discriminative model

based discriminative model

  • Efficiency and scalability

Efficiency and scalability

Example Results

slide-3
SLIDE 3

Structure of Presentation Structure of Presentation

  • The MSRC 21

The MSRC 21-

  • Class Object Recognition

Class Object Recognition Database Database

  • New

New ‘ ‘Shape Filter Shape Filter’ ’ Features Features

  • Randomised boosting with Shared Features

Randomised boosting with Shared Features

  • Adapting to the Pascal VOC Challenge

Adapting to the Pascal VOC Challenge

slide-4
SLIDE 4

Image Databases Image Databases

  • MSRC 21

MSRC 21-

  • Class Object Recognition Database

Class Object Recognition Database

  • 591 hand

591 hand-

  • labelled images ( 45% train, 10% validation, 45% test )

labelled images ( 45% train, 10% validation, 45% test )

  • Corel

Corel ( 7 ( 7-

  • class ) and

class ) and Sowerby Sowerby ( 7 ( 7-

  • class )

class ) [He [He et al. et al. CVPR CVPR 04] 04]

slide-5
SLIDE 5

Sparse Sparse vs vs Dense Features Dense Features

  • Successes using sparse features, e.g.

Successes using sparse features, e.g.

[ [Sivic Sivic et al. et al. ICCV 2005], ICCV 2005], [Fergus

[Fergus et al. et al. ICCV 2005], [ ICCV 2005], [Leibe Leibe et al. et al. CVPR CVPR 2005] 2005]

  • But

But… …

  • do not explain whole image

do not explain whole image

  • cannot cope well with all object classes

cannot cope well with all object classes

  • We use

We use dense dense features features

‘shape filters shape filters’ ’

  • local texture

local texture-

  • based image descriptions

based image descriptions

  • Cope with

Cope with

  • textured and

textured and untextured untextured objects, occlusions,

  • bjects, occlusions,

whilst retaining high efficiency whilst retaining high efficiency

problem images for sparse features?

slide-6
SLIDE 6

Textons Textons

  • Shape filters use

Shape filters use texton texton maps maps

[ [Varma Varma & Zisserman IJCV & Zisserman IJCV 05] 05] [Leung & Malik IJCV 01] [Leung & Malik IJCV 01]

  • Compact and efficient characterisation of local

Compact and efficient characterisation of local texture texture

Texton map

Colours Texton Indices

Input image

  • Clustering

Filter Bank

slide-7
SLIDE 7

Shape Filters Shape Filters

  • Pair:

Pair:

  • Feature responses

Feature responses v v( (i i, , r r, , t t) )

  • Integral images

Integral images

rectangle r texton t

( , )

v v( (i i1

1,

, r r, , t t) = ) = a a v v( (i i2

2,

, r r, , t t) = 0 ) = 0 v v( (i i3

3,

, r r, , t t) = ) = a/2 a/2

appearance context

slide-8
SLIDE 8

feature response image v(i, r1, t1) feature response image v(i, r2, t2)

Shape and Appearance Shape and Appearance

( , )

(r (r1

1,

, t t1

1) =

) =

( , )

(r (r2

2,

, t t2

2) =

) = t t1

1

t t2

2

t t3

3

t t4

4

t t0

texton map ground truth texton map

slide-9
SLIDE 9

summed response images v(i, r1, t1) + v(i, r2, t2)

Shape and Appearance Shape and Appearance

( , )

(r (r1

1,

, t t1

1) =

) =

( , )

(r (r2

2,

, t t2

2) =

) = t t1

1

t t2

2

t t3

3

t t4

4

t t0

texton map ground truth texton map

summed response images v(i, r1, t1) + v(i, r2, t2)

texton map

slide-10
SLIDE 10

Shape Shape-

  • Texture Potentials

Texture Potentials

  • Joint Boost algorithm

Joint Boost algorithm

[ [Torralba Torralba et al. et al. CVPR 2004] CVPR 2004]

  • iteratively combines many shape filters

iteratively combines many shape filters

  • builds multi

builds multi-

  • class logistic classifier

class logistic classifier

  • Resulting combination exploits:

Resulting combination exploits:

  • Shape

Shape-

  • Texture potentials:

Texture potentials:

shape-texture potentials logistic classifier

Texture Shape Context (!)

slide-11
SLIDE 11

Feature Selection by Boosting Feature Selection by Boosting

input image inferred segmentation colour = most likely label confidence white = high entropy black = low entropy 30 rounds 2000 rounds 1000 rounds

slide-12
SLIDE 12

Feature Selection by Boosting Feature Selection by Boosting

input image confidence white = high entropy black = low entropy 30 rounds 2000 rounds 1000 rounds inferred segmentation colour = most likely label

slide-13
SLIDE 13

non-randomised boosting randomised boosting

Randomised Boosting Randomised Boosting

  • Avoid expensive search over all features

Avoid expensive search over all features

  • nly check random fraction (e.g. 0.3%) at each round
  • nly check random fraction (e.g. 0.3%) at each round
  • ver several thousand rounds probably try all possible
  • ver several thousand rounds probably try all possible

features features

non-randomised boosting randomised boosting

slide-14
SLIDE 14

Accurate Segmentation? Accurate Segmentation?

  • Shape

Shape-

  • texture potentials alone

texture potentials alone

  • effectively recognise objects

effectively recognise objects

  • but not sufficient for pixel

but not sufficient for pixel-

  • perfect

perfect segmentation segmentation

  • Conditional Random Field

Conditional Random Field (CRF) (CRF) – – see oral presentation tomorrow! see oral presentation tomorrow!

shape-texture + CRF

slide-15
SLIDE 15

Adapting Adapting TextonBoost TextonBoost to the to the Pascal VOC Challenge Pascal VOC Challenge

slide-16
SLIDE 16

Training Training

  • Pascal training data is bounding boxes.

Pascal training data is bounding boxes.

  • Need

Need pixelwise pixelwise labelling labelling – – use use GrabCut GrabCut based on based on bounding box (noisy labelling!): bounding box (noisy labelling!):

  • Add

Add ‘ ‘background background’ ’ label for non label for non-

  • object regions
  • bject regions

and train background class. and train background class.

  • ~1 day training time (for 10 classifiers on 1/3

~1 day training time (for 10 classifiers on 1/3 d t ) data)

slide-17
SLIDE 17

Results Results

slide-18
SLIDE 18

Classification (competition 1) Classification (competition 1)

  • To give uncertainty measure, use only boosted

To give uncertainty measure, use only boosted softmax softmax classifier and normalised sum of classifier and normalised sum of classifier over all image pixels. classifier over all image pixels.

bicycle bus car cat cow dog horse motorbike person sheep

0.873 0.86 4 0.88 7 0.822 0.85 0.76 8 0.75 4 0.844 0.715 0.86 6

Area under curve (AUC)

VOC experiments by Jamie Shotton VOC experiments by Jamie Shotton

  • Test time: 30sec image (three seconds per

Test time: 30sec image (three seconds per classifier) classifier)

slide-19
SLIDE 19

Detection (competition 3) Detection (competition 3)

  • Work in progress:

Work in progress: scale/viewpoint invariant scale/viewpoint invariant Layout Layout Consisent Consisent Random Field Random Field

Input image Layout-consistent regions Instance labelling

T1 T2 T3 T1 T2

slide-20
SLIDE 20

Detection (competition 3) Detection (competition 3)

  • Work in progress:

Work in progress: scale/viewpoint invariant scale/viewpoint invariant Layout Layout Consisent Consisent Random Field Random Field

  • Instead, used connected

Instead, used connected-

  • components of most

components of most probable labelling (ignoring if <1000 pixels) and probable labelling (ignoring if <1000 pixels) and then computed normalised sum (as before) then computed normalised sum (as before)

bicycle bus car cat cow dog horse motorbike person sheep

0.249 0.13 8 0.25 4 0.151 0.14 9 0.11 8 0.09 1 0.178 0.030 0.13 1

Average precision (AP)

slide-21
SLIDE 21

Suggestions for Pascal VOC 2007 Suggestions for Pascal VOC 2007

  • Include other types of object classes:

Include other types of object classes:

  • unstructured classes (e.g. sky, grass)

unstructured classes (e.g. sky, grass)

  • semi

semi-

  • structured classes (e.g. building).

structured classes (e.g. building).

  • Have small number of pixel

Have small number of pixel-

  • wise labelled

wise labelled images and include a segmentation images and include a segmentation competition. competition.

  • Keep it hard!!!

Keep it hard!!!

slide-22
SLIDE 22

Thank you Thank you

TextonBoost TextonBoost code will be available shortly from code will be available shortly from

http://mi.eng.cam.ac.uk/~jdjs2/