BIL-722 ADVANCED TOPICS IN COMPUTER VISION a da Ba , N10266943 - - PowerPoint PPT Presentation

▶

Aug 02, 2023 182 likes •387 views

BIL-722 ADVANCED TOPICS IN COMPUTER VISION a da Ba , N10266943 Paper: Searching for objects driven by context Authors: Bogdan Alexe, Nicolas Heess, Yee Whye Teh, Vittorio Ferrari PURPOSE: OBJECT DETECTION Among many problems, all

SLIDE 1

BIL-722 ADVANCED TOPICS IN COMPUTER VISION

Çağdaş Baş, N10266943 Paper: Searching for objects driven by context Authors: Bogdan Alexe, Nicolas Heess, Yee Whye Teh, Vittorio Ferrari

SLIDE 2

PURPOSE: OBJECT DETECTION

 Among many problems, all the methods exhaustively search the object

with help of the sliding windows approach.

 All the methods evaluates all the possible windows.  This process is very slow and also unnatural.

 Cognitive search shows that humans don’t do that. Instead search

intelligently.

SLIDE 3

PROPOSITION: INTELLIGENT SEARCH

 Learn an object’s relative

position to its surroundings.

 An ideal search strategy would

be like this:

W1 is sky, cars occur below sky so look below.

W2 is road, cars occur on the road, look just below the road

There is a car part inside W3, look surrounding patches.

W4 is a car.

Figure Credit: Alexe Bogdan

SLIDE 4

OVERVIEW OF THE METHOD

Figure Credit: Alexe Bogdan

SLIDE 5

ALGORITHM IN A NUTSHELL

Method randomly picks one window at the beginning.

Search Policy 𝜌𝑇:

Similar position/appearance duo searched in the training set.

Each of these similar patches votes for a new position.

Method accumulates these votes as probability maps and decides where to look next.

Output Policy 𝜌𝑃:

If current window similar enough to a car, search is over.

SLIDE 6

ALGORITHM IN DETAIL: FEATURE VECTOR

 A window is represented by these vector:

𝑥𝑚 = 𝑦𝑚, 𝑧𝑚, 𝑡𝑚 , 𝑧𝑢

 Window features 𝑧𝑢 consists of:

 Normalized location and scale of the window  HOG Histogram of the window  Classifier score  Displacement vector:

 Intersection over union with the ground truth box  Normalized Hamming distance to the ground truth box  Absolute difference in the window classifier with the ground truth box

Position Feature vectors Scale

SLIDE 7

ALGORITHM IN DETAIL: SEARCH POLICY

 Extract uniformly distributed windows from all the training images,

store features.

 For a test image: 1.

Select a window, find it’s K-NN from training windows.

Map new window and acquire the new probability map.

Choose next window with the highest probability:

SLIDE 8

ALGORITHM IN DETAIL: SEARCH POLICY (2)

 Calculate probability map with the new window in test image 𝑥𝑢

 𝑥𝑢: Current window in test image.  𝑥𝑚: Window from training set.

Feature similarity kernel Spatial Smoothing Kernel

SLIDE 9

ALGORITHM IN DETAIL: SEARCH POLICY (3)

 Normalize each probability map and integrate all the past maps.  Integrate all maps to form the overall probability map using

exponentially decaying mixture.

Feature similarity kernel Spatial Smoothing Kernel

SLIDE 10

ALGORITHM IN DETAIL: OUTPUT POLICY

 After 𝑈 iteration, output a single window which has highest classification

score amongst all: 𝑥𝑝𝑣𝑢 = 𝑏𝑠𝑕max

𝑢

𝑑(𝑥𝑢)

 This is a downside. Method assumes that there is only one instance in

the image.

SLIDE 11

ALGORITHM IN DETAIL: LEARNING WEIGHTS

 There is a weight for each class in similarity kernel stage.  This weights defines each patch’s importance for each object class.

SLIDE 12

OBJECT CLASSIFIER

 An object classifier is trained for each class.

 For each class, one root HOG filter and several part HOG filters are trained.  Root and part filters summed with weights according to Felzenswab’s work.

 For each class, training split is used for classifier learning.

SLIDE 13

EXPERIMENTS

 Experiments conducted on PASCAL

VOC 2010 dataset.

 A highly challenging dataset which contains 20 object classes witch bounding box

annotations.

 Validation set is used for testing.  Mean Average Precision over all classes and detection rate and number

f windows evaluated by the detector used as performance measures.

SLIDE 14

EXPERIMENTS: QUANTITATIVE

SLIDE 15

EXPERIMENTS: QUALITATIVE

SLIDE 16

EXPERIMENTS: QUALITATIVE

 Comparison of ? With Felzenszwalb et al. PAMI 2010

SLIDE 17

EXPERIMENTS: PERFORMANCE

 Experiments run on a Intel i7 processor powered PC.  It can be seen that compared window count is significantly lower than

the usual deformable part model approach.

 It is said that deformable part model approach takes 92s while

proposed method takes only 2s.

SLIDE 18

PROS - CONS

 Pros:

 Fast and logical search  Can be applied with any classifier/feature

 Cons:

 Assumes only one instance exists.  Dataset dependent?

SLIDE 19

BIL-722 ADVANCED TOPICS IN COMPUTER VISION

Çağdaş Baş, N10266943 Paper: Searching for objects driven by context Authors: Bogdan Alexe, Nicolas Heess, Yee Whye Teh, Vittorio Ferrari

PURPOSE: OBJECT DETECTION

with help of the sliding windows approach.

intelligently.

PROPOSITION: INTELLIGENT SEARCH

position to its surroundings.

be like this:

Figure Credit: Alexe Bogdan

OVERVIEW OF THE METHOD

Figure Credit: Alexe Bogdan

ALGORITHM IN A NUTSHELL

Method randomly picks one window at the beginning.

Search Policy 𝜌𝑇:

Output Policy 𝜌𝑃:

ALGORITHM IN DETAIL: FEATURE VECTOR

𝑥𝑚 = 𝑦𝑚, 𝑧𝑚, 𝑡𝑚 , 𝑧𝑢

Position Feature vectors Scale

ALGORITHM IN DETAIL: SEARCH POLICY

store features.

Select a window, find it’s K-NN from training windows.

Map new window and acquire the new probability map.

Choose next window with the highest probability:

ALGORITHM IN DETAIL: SEARCH POLICY (2)

Feature similarity kernel Spatial Smoothing Kernel

ALGORITHM IN DETAIL: SEARCH POLICY (3)

exponentially decaying mixture.

Feature similarity kernel Spatial Smoothing Kernel

ALGORITHM IN DETAIL: OUTPUT POLICY

score amongst all: 𝑥𝑝𝑣𝑢 = 𝑏𝑠𝑕max

𝑑(𝑥𝑢)

the image.

ALGORITHM IN DETAIL: LEARNING WEIGHTS

OBJECT CLASSIFIER

EXPERIMENTS

VOC 2010 dataset.

EXPERIMENTS: QUANTITATIVE

EXPERIMENTS: QUALITATIVE

EXPERIMENTS: QUALITATIVE

EXPERIMENTS: PERFORMANCE

the usual deformable part model approach.

proposed method takes only 2s.

PROS - CONS

THANKS