event-based motion segmentation by motion compensation Timo - - PowerPoint PPT Presentation

event based motion segmentation by motion compensation
SMART_READER_LITE
LIVE PREVIEW

event-based motion segmentation by motion compensation Timo - - PowerPoint PPT Presentation

event-based motion segmentation by motion compensation Timo Stoffregen, Guillermo Gallego, Tom Drummond, Lindsay Kleeman, Davide Scaramuzza, ICCV 2019 presented by Ondrej Holesovsky, ondrej.holesovsky@cvut.cz The 5th of November 2019 Czech


slide-1
SLIDE 1

event-based motion segmentation by motion compensation

Timo Stoffregen, Guillermo Gallego, Tom Drummond, Lindsay Kleeman, Davide Scaramuzza, ICCV 2019

presented by Ondrej Holesovsky, ondrej.holesovsky@cvut.cz The 5th of November 2019

Czech Technical University in Prague, CIIRC

slide-2
SLIDE 2
  • utline
  • 1. Background: event camera intro.
  • 2. Addressed problem: motion segmentation.
  • 3. Related work.
  • 4. Proposed method.
  • 5. Experimental findings, discussion.

1

slide-3
SLIDE 3

event camera intro

slide-4
SLIDE 4

event camera sensor principle

Patrick Lichtsteiner et al., A 128x128 120 dB 15 µs Latency Asynchronous Temporal Contrast Vision Sensor, IEEE Journal of Solid-State Circuits, 2008.

  • Each pixel is independent, no global or rolling shutter.
  • A pixel responds by events to changes in log light intensity.
  • Level-crossing sampling.

3

slide-5
SLIDE 5

an event

A contrast change detection event k: ek = [xk, tk, sk]

  • xk - pixel coordinates
  • tk - timestamp in seconds, microsecond resolution
  • sk - polarity, −1 or +1

This sensory representation usually requires ’re-inventing’ computer vision approaches. Alternative way: render videos from events.

4

slide-6
SLIDE 6

a sample event sequence

A ball rolling on the floor. 10 ms of events shown in an image plane view.

5

slide-7
SLIDE 7

a sample event sequence

A rolling ball captured in an XYT view, 10 ms of events.

6

slide-8
SLIDE 8

a sample event sequence

A rolling ball captured in an XYT view, 300 ms of events.

7

slide-9
SLIDE 9

the problem: motion segmentation

slide-10
SLIDE 10

event motion segmentation

  • Classify events into Nl clusters, each representing a coherent

motion with parameters θj.

  • Clusters are three-dimensional (space-time coordinates).
  • Two objects sharing the same motion are segmented together.
  • Assume motion constancy: events processed in temporally short

packets.

  • Chicken-and-egg: estimate motion of clusters, cluster events by

motion.

9

slide-11
SLIDE 11

related work

slide-12
SLIDE 12

traditional cameras - sparse method

Xun Xu et al., Motion Segmentation by Exploiting Complementary Geometric Models, CVPR 2018.

  • Assuming known keypoint correspondences (SIFT, corners...).
  • Geometric models: affine, homography, fundamental.
  • Spectral clustering at the core. Similarly moving tracked points

should belong to the same partition of an affinity graph (motion hypothesis - feature).

11

slide-13
SLIDE 13

traditional cameras - dense method

Brox and Malik, Object Segmentation by Long Term Analysis of Point Trajectories, ECCV 2010.

  • Intensity constancy assumption.
  • Sparse translational point trajectories (3% of pixels): optical flow
  • > point tracking -> trajectory affinities -> spectral clustering.
  • Dense segmentation: variational label approximation (Potts

model optimisation) on sparse trajectories and pixel colour. VGA at 1 FPS on a GPU.

12

slide-14
SLIDE 14

event-based vs. traditional cameras and approaches

  • The presented approach is semi-dense - more than keypoints.
  • Assumptions: constant contrast vs. constant intensity. (Both

invalid in general.)

  • Event-based could benefit from higher data efficiency.
  • High-speed, high dynamic range (HDR), low power.
  • Real-time: still difficult for both.

13

slide-15
SLIDE 15

event-driven ball detection and gaze fixation in clutter

  • A. Glover and C. Bartolozzi, IROS 2016.
  • Detecting and tracking a ball from a moving event camera.
  • Locally estimate normal flow directions by fitting planes to

events.

  • Flow direction points to or from the circle centre, which directs

the Hough transform.

  • Any motion but only circular objects.

14

slide-16
SLIDE 16

independent motion detection with event-driven cameras

  • V. Vasco et al., ICAR 2017.
  • An iCub robot head and camera move.
  • Detect and track corners among events and estimate their

velocity.

  • Learn a model relating head joint velocities (from encoders) to

corner velocities.

  • Independent corners are inconsistent (Mahalanobis distance)

with the head joint velocities.

  • Any objects but need to know egomotion.

15

slide-17
SLIDE 17

iwe - image of warped events

Or motion-compensated event image. G. Gallego and D. Scaramuzza, Accurate Angular Velocity Estimation With an Event Camera, RAL 2016.

  • A rotating camera. Look at 2D event cloud projections.
  • Project along the motion trajectories -> edge structure revealed.
  • Events of a trajectory: same edge, same polarity*.
  • Consider the sum of polarities along a trajectory.
  • Number of trajectories = number of pixels...

16

slide-18
SLIDE 18

iwe - method description 1a (simplified)

An event image sums polarities along a trajectory. Discrete image coordinates x, continuous event coordinates xk: I(x) = ∑N−1

k=0 skf(x, xk).

  • N - number of events in the cloud, within a small time interval.
  • f - bilinear interpolation function, (x, xk) → [0, 1].
  • Each event contributes to its four neighbouring pixels.
  • I(x) - sum of neighbourhood-weighted polarities of events firing

at pixel location x.

17

slide-19
SLIDE 19

iwe - method description 1b (paper notation)

An event image sums polarities along a trajectory, continuous image coordinates x, continuous event coordinates xk: I(x) = ∑N−1

k=0 skδ(x − xk).

  • δ - Dirac delta.
  • Need to integrate the image for meaningful values.
  • Naive pixelwise sums along the time axis -> motion blur!

18

slide-20
SLIDE 20

iwe - method description 2

Idea: Maximise IWE sharpness by transforming the events to compensate for the motion. Iterative motion parameter

  • ptimisation:
  • Sharpness metric: variance of the IWE pixel values.
  • IWE variance and its derivatives w.r.t. motion parameters.
  • Update the motion parameters. Transform the event cloud.
  • A new IWE from the transformed event cloud. Repeat.

19

slide-21
SLIDE 21

iwe - translational motion model example

Event cloud transform equations with motion parameters vx, vy: x′

k = xk + tkvx

y′

k = yk + tkvy

It transforms all events to their spatial location at t′

k = 0.

20

slide-22
SLIDE 22

simultaneous optical flow and segmentation (sofas)

using a Dynamic Vision Sensor, by Timo Stoffregen and Lindsay Kleeman, ACRA 2017.

  • Not easy to read. Rough method description: greedy sequential

model fitting.

  • The number of local maxima of the contrast objective ideally

matches the number of structures with distinct optical flow velocities.

21

slide-23
SLIDE 23

the proposed method

slide-24
SLIDE 24

solution summary

  • One IWE per motion cluster. Each with a different motion model.
  • Table of event-cluster associations.
  • Sharpness of the IWEs guides event segmentation.
  • Joint identification of motion models and associations.

23

slide-25
SLIDE 25

event clusters

  • Event-cluster association pkj = P(ek ∈ lj) of event k being in

cluster j.

  • P ≡ (pkj) is an Ne × Nl matrix with all event-cluster associations.

Non-negative, rows add up to one.

  • Association-weighted IWE for cluster j: Ij(x) = ∑Ne

k=1 pkjδ(x − x′kj).

x′

kj is the warped event location. Note: ignoring polarity.

24

slide-26
SLIDE 26

single objective to optimise

Event alignment within cluster j measured by image contrast, such as the variance, Var(Ij) = 1 |Ω| ∫

(Ij(x) − µIj)2dx, µIj is the mean of the IWE for cluster j over the image plane Ω. Find the motion parameters θ and the event-cluster associations P, such that the total contrast of all cluster IWEs is maximised. (θ∗, P∗) = argmax(θ,P)

Nl

j=1

Var(Ij).

25

slide-27
SLIDE 27

the solution - alternating optimisation

Update the motion parameters of each event cluster. Associations are fixed. θ ← θ + µ∇θ(

Nl

j=1

Var(Ij)), µ ≥ 0 is the step size. Single gradient ascent step. Recompute event-cluster associations. Motion parameters are fixed. pkj = cj(x′

k(θj))

∑Nl

i=1 ci(x′ k(θi))

, cj(x) ̸= 0 is the local sharpness of the cluster j at pixel x, cj(x) . = Ij(x).

26

slide-28
SLIDE 28

initialisation

  • Greedy. Not crystal clear.
  • Start with equal associations.
  • Optimise the first cluster motion parameters.
  • Gradient gjk of the local contrast of each event w.r.t. motion

parameters.

  • gkj negative -> the event k likely in the cluster j, pkj set high, low

for other clusters.

  • Such event becomes blurred when moving away from the
  • ptimised parameters.
  • Repeat for the remaining clusters.

27

slide-29
SLIDE 29

experimental findings

slide-30
SLIDE 30
  • cclusion

Mitrokhin’s 2018 Extreme Event Dataset (EED), ball behind a net.

29

slide-31
SLIDE 31

low light, strobe light

EED, lighting variation.

30

slide-32
SLIDE 32

accuracy - bounding boxes

Mitrokhin’s dataset:

31

slide-33
SLIDE 33

accuracy - per event

Using a photorealistic simulator. Textured pebbles, different relative velocities. Roughly 4 pixels of relative displacement to achieve 90% accuracy (true for any velocity).

32

slide-34
SLIDE 34

throughput

Complexity linear in the number of clusters Nl, events Ne, IWE pixels Np, iterations Nit. O((Ne + Np)NlNit). Optical flow warps, CPU 2.4 GHz, GPU GeForce 1080: Fast moving drone sequence: ca. 370 kevents/s. Ball behind net: ca. 1000 kevents/s.

33

slide-35
SLIDE 35

different motion models

Fan blades spinning at 1800 rpm and a falling coin.

34

slide-36
SLIDE 36

street, facing the sun

35

slide-37
SLIDE 37

non-rigid objects

36

slide-38
SLIDE 38

number of clusters

If set too large, the clusters not needed end up empty. 5 × OF 10 × OF 5 × OF + 5 × Rotation

37

slide-39
SLIDE 39

weaknesses of the method or the paper

  • Number of warps different in each iteration. Why?
  • Clear failure cases not analysed.
  • Complexity scales linearly with the number of pixels -> losing

much of the sparsity benefit of the event representation.

  • 8-DOF homographic models not among the experiments.
  • Maybe: Contrast constancy assumption (event trajectory - same

edge, same polarity).

38

slide-40
SLIDE 40

discussion of the approach

  • Per-event, semi-dense, direct segmentation.
  • Diverse motion models: translational, rotational, 4-DOF affine or

8-DOF homographic*.

  • Recovers object structure.
  • Implicitly handles occlusions.
  • If Nl is set too large, the unnecessary clusters end up empty.
  • Main message: contrast maximisation approach is usable even

for event-based motion segmentation.

39

slide-41
SLIDE 41

questions

slide-42
SLIDE 42

real real-time performance? gesture recognition?

  • DvsGestures dataset (A. Amir et al., A Low Power, Fully

Event-Based Gesture Recognition System, CVPR 2017) - event rate below 230 kEvents/s at 128×128 pixels.

  • The same scene at the DAVIS240C resolution of 240×180 would

generate ca. 600 kEvents/s -> Real-time motion segmentation

  • nly on a GPU.
  • 480×360 px event camera easily generates over 10 Meps -> not

even GPU.

41

slide-43
SLIDE 43
  • bjects changing velocity?
  • Within a single event cloud: short duration, constant velocity

assumption, ok.

  • Velocity changes within multiple subsequent event clouds:

different clustering outcome each time.

  • Velocities of two objects become equal: both end up in the

same cluster.

42