Tracking deformable objects with WiSARD networks: a preliminary - - PowerPoint PPT Presentation

tracking deformable objects with wisard networks
SMART_READER_LITE
LIVE PREVIEW

Tracking deformable objects with WiSARD networks: a preliminary - - PowerPoint PPT Presentation

Tracking deformable objects with WiSARD networks: a preliminary work INNOROBO 2014 European Workshop on Deformable Object Manipulation 20 March 2014 Lyon, France Massimo De Gregorio, Maurizio Giordano, Silvia Rossi, Mariacarla Staffa and


slide-1
SLIDE 1

Tracking deformable objects with WiSARD networks:

a preliminary work…

INNOROBO 2014

European Workshop on Deformable Object Manipulation 20 March 2014 ─ Lyon, France Massimo De Gregorio, Maurizio Giordano, Silvia Rossi, Mariacarla Staffa and Bruno Siciliano University of Naples Federico II

slide-2
SLIDE 2
  • The object tracking problem consists in reconstructing the

trajectory of objects along a sequence of images

  • It is inherently difficult when applied to real world conditions:

– unstructured forms are present

Object Tracking Problem

– unstructured forms are present – real time responses are required – computational capabilities are limited to on-board units – problems of brightness and non-stationary background affect the image elaboration system

It becomes even more challenging in case of: non-rigid objects

2

slide-3
SLIDE 3

industrial manufacturing processes: rubber tubes, sheet

metals, cords, paper sheets

domestic interaction:

clothes, food, etc.

Motivations

Medical operations:

soft tissues, muscles, skin

Objects’ location and deformation have to be tracked

clothes, food, etc.

3

slide-4
SLIDE 4
  • Our aim is to address the problem of making a robot able to

track any deformable object without an a priori physical model

  • We propose a particular neural network as future detector for

tracking deformable objects during manipulation

Proposed approach

a WiSARD–based system

  • 1. Model free
  • 2. Noise tolerant
  • 3. On-line learning

4

slide-5
SLIDE 5

Approaches to deformable

  • bjects

CAD-like object model-based methods: Appearance-based methods:

Edge detection 3D models (point clouds) Recognition by parts Edge detection 3D models (point clouds) Recognition by parts

Appearance-based methods: Feature-based methods:

5

surface patches corners linear edges surface patches corners linear edges Changes in lighting or color Changes in viewing direction Changes in size / shape Changes in lighting or color Changes in viewing direction Changes in size / shape

WiSARD-based approach:

non-rigid objects are tracked based on visual features such as color and/or texture, object contours, regions of interest.

WiSARD-based approach:

non-rigid objects are tracked based on visual features such as color and/or texture, object contours, regions of interest.

slide-6
SLIDE 6

WiSARD

Wi Wilkie lkie Stonham tonham and and Aleksander’s leksander’s Recognition ecognition Device evice

The McCulloch and Pitts model

Σ

σ

w1 w1 w2 w2 w3 w3 x1 x1 x2 x2 x3 x3 y x1w1 + x2w2 + … + xnwn > σ x1w1 + x2w2 + … + xnwn > σ = 1 = 1

6

wn wn xn xn

threshold - σ threshold - σ

The RAM-node

RAM

00 00 01 01 10 10 11 11

x1 x1 x2 x2

1 1

slide-7
SLIDE 7
  • Biunivocal pseudo-random mapping for connecting uncorrelated

parts of the image to specific address of a RAM–based node.

  • The uncorrelated n-tuples are used as address of the RAMs.
  • A set of RAM–based nodes represents a Discriminator

WiSARD Discriminator

7

WiSARD Discriminator RAM-based node

slide-8
SLIDE 8

WiSARD Discriminator

RAM 1 RAM 2 RAM 3

00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01

Mapping Mapping

1 1 1 1

Training Training phase phase tina tina Classification Classification

8

RAM 4 RAM 5 RAM 6

01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11

Training set Training set

1 1 1 1 1 1 1 1

Σ Σ Σ Σ Σ Σ Σ Σ

r

Retina Retina

Similarity Similarity measure measure

slide-9
SLIDE 9

WiSARD Network

Input Input Image Image Output Output Belonging Belonging class class Wi Wi.S S.A.R.D.

Wi Wilkie lkie Stonham tonham and and Aleksander’s leksander’s Recognition ecognition Device evice

  • A WiSARD Network is a multi-discriminator system

9

R (%) R (%)

discriminator – 0 discriminator – 1 discriminator – 2 discriminator – 3 discriminator – 4 discriminator – 5 discriminator – 6 discriminator – 7 discriminator – 8 discriminator – 9

d r1 c = d/r1

slide-10
SLIDE 10

WiSARD Modified

RAM 1 RAM 2 RAM 3

00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01

Mapping Mapping

1 1 1 1

Training Training phase phase

3 3 3 3

0 if if i i = 0 = 0 1 1 otherwise

  • therwise

2) 2) Increasing Increasing the the RAM RAM cell cell content content

10

RAM 4 RAM 5 RAM 6

01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11 1 1 1 1 1 1 1 1

Σ Σ Σ Σ Σ Σ Σ Σ

r

3 3 2 2 3 3 1 1 1 1 2 2

Retina Retina

1) Learning frame by frame time

slide-11
SLIDE 11

WiSARD Modified

RAM 1 RAM 2 RAM 3

00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01 3 3 3 3

0 if if i i = 0 = 0 1 1 otherwise

  • therwise

3 3

1+1+1+1=4 1+1+1+1=4

2) 2) Increasing Increasing the the RAM RAM cell cell content content 3) ) Filtering Filtering output

  • utput

11

RAM 4 RAM 5 RAM 6

01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11

Σ Σ Σ Σ Σ Σ Σ Σ

r

3 3 2 2 3 3 1 1 1 1 2 2 3 3 3 3 1 1 1 1

4 4

1+1+1+1=4 1+1+1+1=4

1 1 1 1 1 1 1 1

1) Learning frame by frame time

Classification Classification

slide-12
SLIDE 12

DRASiW for Shape Detection

R (%) R (%)

discriminator – 0 discriminator – 1 discriminator – 2 discriminator – 3 discriminator – 4 discriminator – 5 discriminator – 6 discriminator – 7 discriminator – 8

Input Input Image Image Output Output Belonging class Belonging class Wi Wi.S.A.R.D. .

Wi Wilkie lkie Stonham tonham and and Aleksander’s leksander’s Recognition ecognition Device evice

r1 c = d/r1

12

discriminator – 8 discriminator – 9

D.R.A.S.iW iW.: .:

exploits exploits the k the k-bit words bit words in in the RAMs cells to the RAMs cells to produce produce example of learned example of learned pattern pattern categories categories

discriminator – 0 discriminator – 1 discriminator – 2 discriminator – 3 discriminator – 4 discriminator – 5 discriminator – 6 discriminator – 7 discriminator – 8 discriminator – 9

Output Output Belonging class Belonging class

R (%) R (%)

Input Input Class name Class name Output Output “mental image mental image” Show me Show me this class! this class! Input Input Image Image

d

slide-13
SLIDE 13

Mental Image frame by frame:

RAM1

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101

RAM2

0000 0001 0010 0011

Learning frame by frame

3 1 1

Histopixels time

13

1101 1110 1111 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

RAM3

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

R e t i n a

2 2 1 3 1 1

b3 b2 b1 b0 b3 b2 b1 b0 b3 b2 b1 b0

“Mental” image

slide-14
SLIDE 14

WiSARD bleaching

  • The system always trains itself with the image on the retina of

the discriminator that outputs the best response

  • The sub-patterns of the new image on the retina are

combined with those of the MI (this means increasing their frequencies in the RAM contents)

  • bleaching: a “forgetting mechanism” to avoid RAM memory

location saturation The sub-patterns which are not addressed by the current image

  • n the retina are decremented (−1)

DRASiW maintains an updated MI of the tracked object shape

14

slide-15
SLIDE 15

WiSARD for Object Tracking

  • 10 left, right, up and down discriminators + central discriminator
  • Each discriminator is identified by its relative coordinates and is in

charge of learning the object in the retina, but looking at different parts of the image

  • The displacement of all the retinas forms the prediction window
  • The position of the discriminator with higher response will identify

the movement direction of the tracked object.

15

slide-16
SLIDE 16

Global Framework

  • 1. The user selects the

area of the object to track from the video stream input

  • 2. Images

extracted from the video are

  • 7. The MI is shown

and its mass center is used to evaluate the tracking performance

  • 6. The position of the

central retina is set to the coordinates of the

16

from the video are processed frame by frame by a filter

  • 3. The binary image is

used to train all discriminators

  • 4. The WiSARD localizes the object through discriminators responses. The higher the

response the more probable the object is in that part of the prediction window

  • 5. the mental model of

the object is updated by the image of the best discriminator to take into account the new possible shape the coordinates of the best discriminator

slide-17
SLIDE 17

Global Framework

17

The Filter: transforms the input video

frames in a suitable format for WiSARD

  • Identifies a focus area

centered in the boundingbox (α% of the box dimension)

  • Computes the Histogram by considering

the more frequent pixel colors representing the β% of the focus area

  • The selected colors are used to binarizes

the image of interest

slide-18
SLIDE 18

Case Study: Pizza Making

RODY- MAN robot manipulating a pizza

18

slide-19
SLIDE 19

Experimental Results

Pizza Making is composed by different sub-tasks: a) Translation b) Manipulation c) Extension d) Seasoning d) Seasoning e) Occlusions

We evaluated the performance of the WiSARD ability: i) to track the shape of the pizza and ii) to follow the position of the object in time

19

slide-20
SLIDE 20

Demo

20

slide-21
SLIDE 21

Experimental Results

Trend of the GT e MI centroid coordinates in the horizontal direction Ground Truth (GT) and Mental Image (MI) centroid coordinates with respect to horizontal/vertical directions and tracking error

sub-task img (px) retina (px)

  • no. of frames

error (px) Translation 480 264 131 122 405 5.09 Manipulation 480 264 128 135 469 2.83 Extension 480 264 146 129 448 3.92

Translation Manipulation Extension Seasoning & Occlusions

WiSARD tracker during the complete pizza making task horizontal direction Tracking Error during the overall task

21

Extension 480 264 146 129 448 3.92 Occlusions 568 320 159 145 228 3.83 Seasoning 480 264 132 144 480 4.59 Overall 568x320 162 149 1397 4.41

Trend of the GT e MI centroid coordinates in the vertical direction

slide-22
SLIDE 22

Conclusions

Obtained results:

  • Tracking non-rigid deformable objects without prior-model of them
  • Adapting in real time to new situation due to the on-line training
  • Coping with occlusions due to the reinforcing behavior of the DRASiW

mechanism Advantages:

  • Appropriate for a large variety of deformable objects
  • Developed directly on reprogrammable hardware (applicability in embedded

robotic systems)

22

slide-23
SLIDE 23

Future works

Improvements:

  • Motion–based segmentation for detecting the attentional bounding box
  • Adoption of more accurate filtering techniques
  • Adoption of a dead-reckoning strategy to anticipate the object next position
  • Dynamically displace the discriminators on salient and more probable areas
  • Optimizing the distribution of discriminators in the space (e.g. a dense

network near the central retina and a more sparse disposition on the network near the central retina and a more sparse disposition on the neighborhood) …for improving even more the frame rate, and so real time tracking ability Future works:

  • Changing in time the object to track (attentional swithching)
  • Exploiting the depth information by the camera to infer characteristics on the

3D shape of the tracked object

23

slide-24
SLIDE 24

Thanks for still tracking me after 20 minutes ☺

If your neural network works well you are still focused

  • n the topic and you can

Work supported by the European Community within FP7 ICT-287513 SAPHARI grant and FP7 ERC-320992 RoDyMan advanced grant

slide-25
SLIDE 25

25