CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D AND 3D - - PowerPoint PPT Presentation

concepts algorithms practical
SMART_READER_LITE
LIVE PREVIEW

CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D AND 3D - - PowerPoint PPT Presentation

CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D AND 3D COMPUTER VISION Csaba Beleznai Michael Rauter, Christian Zinner, Andreas Zweng, Andreas Zoufal, Julia Simon, Daniel Steininger, Markus Hofsttter und Andreas Kriechbaum Senior


slide-1
SLIDE 1

CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D AND 3D COMPUTER VISION

Michael Rauter, Christian Zinner, Andreas Zweng, Andreas Zoufal, Julia Simon, Daniel Steininger, Markus Hofstätter und Andreas Kriechbaum

Senior Scientist Center for Vision, Automation & Control Autonomous Systems AIT Austrian Institute of Technology GmbH Vienna, Austria

Csaba Beleznai

slide-2
SLIDE 2

2

GRAND CHALLENGES

RECOGNITION SEGMENTATION RECONSTRUCTION TIME

GRAND CHALLENGES

TIME

▪ Research is evolution → so is your learning process ▪ Balance: becoming a domain expert vs. being a „globalist“ ▪ Researchers tend to favour certain paradigms - Learn to outline trends, look upstream ▪ Revisit old problems to see them under new light ▪ Specialize the general & Generalize the specific ▪ Factorize your know-how (code, topics, …) into components → sustainable, scalable

MOTIVATION

slide-3
SLIDE 3

VISUAL OBJECT RECOGNITION TRENDS

Human-level performance

2012 time Accuracy 2019 2012 time

Computational costs /for real-time/

2019

CPU DEDICATED COMP.HW.

COMPUTATIONAL BARRIER 2012 time

Amount of image data (for training)

2019 IMAGE DATA BARRIER

slide-4
SLIDE 4

Nuclear Engineering Seibersdorf GmbH Seibersdorf Labor GmbH

AIT AUSTRIAN INSTITUTE OF TECHNOLOGY

AIT Austrian Institute of Technology

Energy Health & Bioresources Digital Safety & Security Vision, Automation & Control

AIT Austrian Institute of Technology

Mobility Systems Low-Emission Transport Technology Experience Innovation Systems & Policy

4

Federal Ministry for Transport, Innovation and Technology 50,46% Federation of Austrian Industries 49,54%

1300+ employees Budget: 140 Mio € Business Model: 40:30:30

slide-5
SLIDE 5

Robust and flexible 3D vision technology

VISION, AUTOMATION & CONTROL

High-Performance Vision 3D Vision and Modeling Complex Dynamical Systems Worldwide fastest vision sensor technology Advanced handling and smart production

F r o m S e n s o r T o D e c i s i o n

5

slide-6
SLIDE 6

AIT AUTONOMOUS & ASSISTIVE SYSTEMS

Driver Assistance System for Trams Assistance Systems for Construction Machines Driverless Missions in Crisis & Disaster Management Autonomous Local Railway Autonomous Bus

slide-7
SLIDE 7

7

ENABLING METHODOLOGIES FOR ASSISTED OR SELF-DRIVING

Mobile platforms: sensory signals + local context (situation) → decisions

recognition localization motion analysis vehicle control

  • bjects (type, location, pose)

environment sensor/data fusion positioning mapping ego-motion computation sensor fusion

  • bject tracking

state prediction probability based behavior elements dynamic model computation vehicle model vehicle control safety compliance Deep learning based detection & segmentation Localization, map building Sparse motion estimation, tracking Vision algorithms testing RELATED KNOW-HOW

slide-8
SLIDE 8

INTELLIGENT PERCEPTION FOR MOBILE MACHINES

slide-9
SLIDE 9

AUTONOMOUS OFFROAD VEHICLE

slide-10
SLIDE 10

10 14.07.2019

A frequently asked question

Introduction

slide-11
SLIDE 11

11

Example: Crop detection

▪ Radial symmetry ▪ Near regular structure

Example for robust vision

slide-12
SLIDE 12

IDEA

branch & bound research methodology

PRODUCT APPLICATION

RESEARCH DEVELOPMENT

  • Alg. A
  • Alg. B
  • Alg. C

MATLAB C++

Motivation

Introduction

▪ Challenges when developing Vision Systems: ▪ Complexity  Algorithmic, Systemic, Data ▪ Non-linear search for a solution

slide-13
SLIDE 13

13

Real-time optical flow based particle advection for object detection and tracking 2D

slide-14
SLIDE 14

14 14.07.2019

MOTIVATION – I. OBJECT DETECTION PIPELINES

Spatial distribution of posterior probability

Score map (DPM, R-CNN, …) Vote map Occupancy map back-projected similarity map

R-CNN: Region-based Convolutional Neural Networks DPM: Deformable Part Models

Delineated objects

Bounding boxes Instance segmentation More complex parametric representations

slide-15
SLIDE 15

weakly constrained structural prior

Non-maximum suppression

Neubeck & Van Gool, 2006 Rothe et al., 2014 Leibe et al. 2005 Comaniciu & Meer, 2002 Bradski 1998 Dollar & Zitnick 2013 Kontschieder et al. 2011

Leibe et al. 2005

Mean Shift, CAMShift

Center-surround filter MeanShift and CamShift iterations

Implicit Shape Model Structured random forests

RELATED STATE-OF-THE-ART

▪ Clustering detections ▪ Detection by voting/segmentation/learning

implicit or explicit structural prior Kontschieder et al. 2011

Markov Point Processes for object configurations

Verdie, 2014

CNN‘s for Non-Max. Suppression

Hosang et al. 2016, Wan et al. 2015 Hosang et al. 2016

slide-16
SLIDE 16

Optical flow driven advection

16

ti ti+1

Dense optical flow field

Advection: transport mechanism induced by a force field

Vx,i Vy,i A particle trajectory induced by the OF field

slide-17
SLIDE 17

Particle advection with FW-BW consistency

▪ A simple but powerful test Forward: Backward: Successful Failure



 <  x x : mean offset

Consistency check:

slide-18
SLIDE 18

Pedestrian Flow Analysis

Public dataset: Grand Central Station, NYC: 720x480 pixels, 2000 particles, runs at 35 fps

slide-19
SLIDE 19

SHAPE-GUIDED TRACKLET GROUPING

Optical flow driven particle tracklets

𝑈𝑗 = 𝑦𝑢, 𝑧𝑢 𝑢=1..𝑂, 𝒘, 𝑥

The ith tracklet: Clustering directly performed in the discrete tracklet-domain STEP 1: sampling STEP 2: weight generation from orientation similarity (w.r.t. center tracklet) STEP 3: local shape, scale and center estimation

Estimated cluster parameters + mode location

STEP 4: find nearest tracklet to mode estimate

𝒘 – velocity vector 𝑥 – weight (scalar) repeat from STEP 1 until convergence Single parameter: W – initial scale W

FOR COMPACT OBJECTS

slide-20
SLIDE 20

20

STEREO DEPTH INFORMATION CHARACTERISTICS AND USAGE 3D

slide-21
SLIDE 21

PASSIVE STEREO BASED DEPTH MEASUREMENT

  • Depth ordering of people
  • Robustness against illumination,

shadows,

  • Enables scene analysis

Advantage: ▪ 3D stereo-camera system developed by AIT ▪ Area-based, local-optimizing, correlation- based stereo matching algorithm ▪ Specialized variant of the Census Transform ▪ Resolution: typically ~1 Mpixel ▪ Run-time: ~ 14 fps (Core-i7, multithreaded, SSE-optimized) ▪ Excellent “depth-quality-vs.-computational-costs” ratio ▪ USB 2 interface

slide-22
SLIDE 22

STEREO CAMERA CHARACTERISTICS

22

Trinocular setup: ▪ 3 baselines possible ▪ 3 stereo computations with results fused into one disparity image

large baseline

near-range far-range

small medium

slide-23
SLIDE 23

23

Planar surface in 3D space (x,y) image coordinates, d disparity d(x,y)

Data characteristics

Intensity image Disparity image

y d y

slide-24
SLIDE 24

Height (world) Ground plane (world) correct measurement noisy measurement Stereo setup Computed top view of the 3D point cloud 2.5D approach 3D approach

2.5D vs. 3D algorithmic approaches

slide-25
SLIDE 25

Additional knowledge (compared to existing video analytics solutions):

  • Stationary object (Geometry introduced to a scene)
  • Object geometric properties (Volume, Size)
  • Spatial location (on the ground)

LEFT ITEM DETECTION

slide-26
SLIDE 26

METHODOLOGY

Background model

Stereo disparity Combination

  • f proposals

+

Validation Final candidates Input images Ground plane estimation Change detection

INTENSITY DEPTH

Processing intensity and depth data Ortho-map generation Object detection and validation in the ortho- map Ortho-transform

slide-27
SLIDE 27

27 14.07.2019

Left Item Detection – Demos

slide-28
SLIDE 28

28

Clustering in discrete two-dimensional distributions

slide-29
SLIDE 29

29 14.07.2019

(a)

Object detection as clustering

slide-30
SLIDE 30

A Frequently Occurring Task

Analysis of discrete two-dimensional distributions

… …

LEARNED CODEBOOK

slide-31
SLIDE 31

31 14.07.2019

EXAMPLES

▪ Description of the Binary-Shape-driven 2D clustering

  • Shape learning
  • Shape clustering, delineation

▪ Results

  • Occupancy map clustering
  • Text line delineation
  • Object delineation by shape-guided

tracklet grouping

slide-32
SLIDE 32

32 32

Intermediate probabilistic representations 2D distributions

prior, structure-specific knowledge

Local grouping generate consistent

  • bject hypotheses

Challenge: ▪ arbitrarily shaped distributions ▪ multiple nearby modes ▪ noise, clutter

TASK DEFINITION

Definitions: mode = location of maximum density computed using a kernel K density estimation of variable x

𝑔 𝑦 = ෍

𝑏

𝐿 𝑏 − 𝑦 𝑥(𝑏)

slide-33
SLIDE 33

Shape learning – Case: Compact clusters

  • 2. Sampling using an analysis window

discretized into a ni×ni grid

  • 1. Binary mask from manual annotation or

from synthetic data

M

Off-the-mode samples Mode-centered samples Spatial resolution of local structure

  • 3. Building a codebook of binary shapes

with a coarse-to-fine spatial resolution

Codebook:

Shape learning

slide-34
SLIDE 34

34 14.07.2019

Example Codebook – Case: Compact clusters

slide-35
SLIDE 35

Shape learning – Case: Line structures

Spatial resolution of local structure

low mid high

Binary mask from manually annotated text lines

Codebook:

Shape learning

slide-36
SLIDE 36

Shape delineation – I.

Shape delineation

Density measure for each resolution level for the binary structure Step 1: Fast Mode Seeking Step 2: Local density analysis

Three integral images: and Mode location:

COMPACT CLUSTERS LINE STRUCTURES

Enumerating all binary shapes at each resolution level → Finding best matching entry:

slide-37
SLIDE 37

Shape delineation – II.

Shape delineation

Recursive search for end points, starting from mode locations:

Line-centered structures Off-the-line structures

… …

END POINT CANDIDATES VOTE MAP

Relative line end locations define:

  • Search direction
  • Line end positions
slide-38
SLIDE 38

Experimental results - Case: Compact clusters

Human detection by occupancy map clustering:

13 m

Passive stereo depth sensing → depth data projected orthogonal to the ground plane Occupancy map (1246×728 pix.) clustering: 56 fps, overall system (incl. stereo computation): 6 fps

slide-39
SLIDE 39

Experimental results - Case: Compact clusters

slide-40
SLIDE 40

Input image Probability distribution for text Binarization is very sensitive to employed threshold Our scheme has no threshold, only local structural priors Simple binarization Proposed scheme

Experimental results - Case: Line structures (Text line segmentation)

slide-41
SLIDE 41

Experimental results - Case: Text line segmentation

slide-42
SLIDE 42

42

Queue length detection using depth and intensity information 2D + 3D

slide-43
SLIDE 43

Queue Length + Waiting Time estimation

What is waiting time in a queue?

Checkpoint Waiting time

Time measurement relating to last passenger in the queue

Example: Announcement of waiting times (App) → customer satisfaction Why interesting? Example: Infrastructure operator → load balancing

slide-44
SLIDE 44

Queue analysis

▪ Challenging problem

Waiting time =

▪ Shape ▪ No predefined shape (context/situation-dependent and time-varying) ▪ Motion→ not a pure translational pattern ▪ Propagating stop-and-go behaviour with a noisy „background“ ▪ Signal-to-noise ratio depends on the observation distance

Length Velocity

  • 1. What is the shape and extent of the queue?
  • 2. What is the velocity of the propagation?

DEFINITION: Collective goal-oriented motion pattern of multiple humans exhibiting spatial and temporal coherence

simple complex

slide-45
SLIDE 45

▪ How can we detect (weak) correlation? ▪ Much data is necessary → Simulating crowding phenomena in Matlab ▪ Social force model (Helbing 1998)

Source: Parameswaran et al. Design and Validation of a System for People Queue Statistics Estimation, Video Analytics for Business Intelligence, 2012

t x y

Correlation in space and time

goal-driven kinematics – force field repulsion by walls repulsion by „preserving privacy “

Visual queue analysis - Overview

slide-46
SLIDE 46

46

Simulation tool → Creating infinite number of possible queueing zones Two simulated examples (time-accelerated view):

Queue analysis

slide-47
SLIDE 47

Queue analysis (length, dynamics)

Staged scenarios, 1280x1024 pixels, computational speed: 6 fps

Straight line Meander style

slide-48
SLIDE 48

▪ Pairwise spatio-temporal correlation analysis ▪ Correlation → data weights ▪ Mode seeking and tracking ▪ Queue delineation posed as an

  • pen

Traveling Salesman Problem (oTSP) with a fixed start ▪ Queue forward velocity estimation using a deformable elastically connected chain

slide-49
SLIDE 49

Estimated configuration (top-view) Detection results

Adaptive estimation of the spatial extent of the queueing zone

Left part of the image is intentionally blurred for protecting the privacy of by-standers, who were not part of the experimental setup.

stereo sensor

slide-50
SLIDE 50

Scene-aware heatmap

slide-51
SLIDE 51

51

End-to-end video text recognition 2D

slide-52
SLIDE 52

Overview

INPUT

Presence (y/n)

OUTPUT Text

Detection Localization Propagation Segmentation Recognition, Propagation

▪ The End-to-End Video Recognition Process

Location (single frames) (x, y, w, h) Location (frame span) (x, y, w, h) Binary image regions Text (e.g. in ASCII) Characterizing dynamic elements: running text

Evaluation: High accuracy at each stage is necessary Very high recall throughout the chain Increasing Precision toward the end of the chain

slide-53
SLIDE 53

Algorithmic chain - Motivation

Main strategies for text detection: What is text (when appearing in images)?: An oriented sequence of characters in close proximity, obeying a certain regularity (spatial offset, character type, color).

Sample text region + complex background

slide-54
SLIDE 54

(a) (b)

Improved text detection – synthetic text generation

(Classification using Aggregated Channel Features)

slide-55
SLIDE 55

Convolutional Neural Network based OCR - Training

Generated single characters (0-9, A-Z, a-z): include spatial jitter, font variations

6000 „0“ 6000 „A“ 6000 „Z“

▪ role of jitter: characters can be recognized despite an offset at detection time

slide-56
SLIDE 56

56

Convolutional Neural Network based OCR - Results

Analysis window is scanned along the textline, and likelihood ration (score1/score2) is plotted in the row (below textline) belonging to the maximum classification score.

Optimum partitioning

slide-57
SLIDE 57

57 14.07.2019

slide-58
SLIDE 58

Our development concept

Implementation details

MATLAB & PYTHON C/C++

Method, Prototype, Demonstrator Real-time prototype

▪ MATLAB / PYTHON: ▪ Broad spectrum of algorithmic libraries, ▪ Well-suited for image analysis, ▪ Visualisation, debugging, ▪ Rapid development → Method, Prototype, Demonstrator ▪ C/C++ ▪ Real-time capability

Porting mex shared library

Computationally intensive methods Verification

Matlab engine

slide-59
SLIDE 59

LEARNED REPRESENTATIONS OF HIGH DISCRIMINATIVE POWER

slide-60
SLIDE 60

60 14.07.2019

MODEL-BASED VISION - USE CASE

Relevance:

  • 1. DeepLearning: coarse pose estimation (coarse initialization/re-initialization)
  • 2. Edge based model tracking (continuous fine 6DoF pose estimation)

Pose regressor trained from synthetic data:

slide-61
SLIDE 61

THANK YOU!

CSABA BELEZNAI Senior Scientist Center for Vision, Automation & Control Autonomous Systems AIT Austrian Institute of Technology GmbH Giefinggasse 4 | 1210 Vienna | Austria T +43(0) 664 825 1257 | F +43(0) 50550-4170 csaba.beleznai@ait.ac.at | http://www.ait.ac.at