Confidential
1
- Dr. Gaby Hayon, EVP R&D
Mobileye Sensing Status and Road Map
November 2019
Mobileye Sensing Status and Road Map Dr. Gaby Hayon, EVP R&D 1 - - PowerPoint PPT Presentation
November 2019 Mobileye Sensing Status and Road Map Dr. Gaby Hayon, EVP R&D 1 Confidential The Challenge of Sensing for the automotive market ME sensing has three demanding customers Sensing state for ME policy under Smart agent for
Confidential
1
November 2019
Smart agent for harvesting, localization and dynamic information for REM based map ADAS products working everywhere and at all conditions on millions of vehicles Sensing state for ME policy under the strict role of independency and redundancy.
ME sensing has three demanding customers
Surround computer vision Radar/Lidar sub-system
surround computer vision
comprehensive env. model
Full and unified surround coverage of all decision-relevant environment elements. These are generally grouped into 4 categories:
Road Geometry (RG)
All driving paths, explicitly/partially/implicitly indicated, their surface profile and surface type.
Road Boundaries (RB)
Any delimiter of the drivable area, it’s 3D structure and semantics. Both laterally delimiting elements(FS) and longitudinally (general
Road Users (RU)
360 degrees detection and inter-camera tracking of any movable road-user, and actionable semantic-cues these users convey. (light indicators, gestures).
Road Semantics (RS)
Road-side directives (TFL/TSR) , on-road directives (text, arrows, stop-line , crosswalk) and their DP association.
Object detection DNNs Texture engine , example Structure engine, example
Multiple independent visual-processing engines overlap in their coverage of the 4 categories (RG, RB, RU, RS)
To satisfy extremely-low nominal failure frequencies of the CV-Sub-system
Lanes detection DNN Single view Parallax-net elevation map Semantic Segmentation engine Multi-view Depth network Generalized-HPP (VF) Wheels DNN Road Semantic Networks RG RB, RU ,RS RB, RU RB, RU RG RU RU
▪ Longitudinal and Lateral Driving plans / decisions
▪ VRU related drive planning ▪ Environmental limitations ▪ Safe-stop possibility ▪ Emergency/Enforcement response
Support of different driving decisions & planning requires extraction of additional, essential set of contextual cues:
Ped trajectory, intentions (head/body pose), relevance, vulnerability & host-path-access. visibility range , blockage, occlusions/view-range, road friction. Emergency vehicles / personnel detection, Gesture recognition. Is the road shoulder drivable? Is it safe to stop?
Cc cc
Confidential
8
Visual perception
360 degrees detection and inter-camera tracking of any movable road-user, and actionable semantic-cues these users convey (light indicators, gestures)
On top of the standalone Object detection networks running on all cameras, 2 Dedicated 360-stitching engines have been developed to assure completeness and coherency of the unified objects map:
“Full Image Detection”- raw signal “Full Image Detection” output- short range precise detection
Temporal tracker
Dimension net output
Metric Physical Dimensions estimation
dramatically improving measurements quality using novelty methods
Wheels- RU-part (relatively regular in shape) which we deliberately detect to affirm vehicle detections, 3D position, and tracking for high-function customers.
▪ The semantic segmentation is evident of all Road users, redundant to the dedicated networks ▪ It is also evident of extremely-small visible fragments of road users; These may potentially be used as scene-level contextual cues.
Open car door is uniquely classified , as it is both extremely common, critical and of no ground intersection
Baby strollers and wheel chairs are detection through a dedicated engine on top
Baby strollers and wheel chairs are detection through a dedicated engine on top of the highly matured pedestrians detection system
Surround-view stitched SR FS
Occupancy Grid:
▪ Fusion of free space signal from 4 parking cameras, and front camera ▪ Main usages: a very accurate signal for handling crowded scenes, and a redundancy layer for
▪ Comparing the known scene (road edges and detected objects) with the occupancy
differences are marked and reported as unknown objects.
Emergency vehicle , light indicators Pedestrian understanding
Road users semantics
▪ Head/pose orientation Pedestrians posture/gesture. ▪ Vehicle light indicators Emergency vehicle/Personnel classification.
Pedestrian Gesture Understanding Come closer Stop! On the phone You can pass
Confidential
Dense Structure-based Object detection
100° 100° 100°
canonical stereo-baseline camera pair setups
region
priors, and prediction in texture-less regions
DNN based multi-view stereo
How do we do this?
Confidential
DNN based multi-view stereo
DNN based multi-view stereo
Leveraging Lidar Processing Module for Stereo Camera Sensing – “Pseudo-Lidar”
Dense depth image from stereo cameras High-res Pseudo-Lidar Object detection Upright obstacle ‘stick’ extraction
visibility
an object is because it doesn't exist or due to an occlusion
visibility range in all angles
features:
View Range knowing that you don’t know
Policy-level applications
placing "fake targets" in occluded areas that intersect with ego's planned path, assuming plausible speed and trajectory
Z axis view range
copping with occlusions deriving from road elevation Visible range Occluded
Ghost target
Visible range Occluded
View range origin legend
Main Front Narrow Front Front Right Front Left Rear Right Rear Left Rear
▪ Road ▪ Elevated ▪ Cars ▪ Bike, Bicycle ▪ Ped ▪ CA obj ▪ Guardrail ▪ Concrete ▪ Curbs ▪ Flat ▪ Snow ▪ Parking in ▪ Parking out Full Surface Segmentation Road/nRoad Detection of Any delimiter of the road surface- 3D structure and semantics. Both laterally delimiting elements(FS) and longitudinally (GO/debris) The Semantic segmentation engine provides a rich, high resolution pixel-level labeling; The SSN vocabulary is especially enriched to classify road delimiter types:
Road Edge Car Bike Ped General object GuardRail Concret Curb Flat Snow
Road Edge Car Bike Ped General object GuardRail Concret Curb Flat Snow
Detection of Any delimiter of the road surface, it’s 3D structure and semantics. Both laterally delimiting elements(FS) and longitudinally (GO/debris)
The Parallax Net engine provides an accurate understanding of structure by assessing residual elevation (flow) from the locally governing road surface (homography). It is therefore evident of extremely small objects and low-elevation lateral boundaries.
Debris detection identifies structural deviations from road surface. Structure from Motion approach: geometry-based & appearance-invariant.
detects any type of hazard.
https://www.youtube.com/watch?v=s7HCI33KVHA
Advanced lane applications (VW) Volkswagen Passat Travel Assist 2.0 with Mobileye camera
Road4 Technology provides deep lanes understanding rather than “simple” lane-marks detection
▪ Severely occluded lane-marks - Endures gaps of over 20m within marker ▪ Semi/partly/unmarked lane marker ▪ Multi-geometry lane structures – merge, split, HWE, junctions ▪ Stable DP map also pass-through Junctions and construction areas
Bots dots and occluded lane marks Lane detection on wet roads at night Merge and splits and passing through junctions
Parallax-Net
provides a dense understanding of all driving surface elevation model , and local detailed ‘longitudinal profile’ characteristics such as road bumps and ditches
▪ Host Driving Path : Geometry and Center ▪ Any-object (point) driving path ▪ Any-object (point) lane assignment ▪ Road-elevation - accounted-for by inference
The Generalized HPP technology (VF) provides
Does not involve explicit detection and modeling of lane-boundary evidence, but rather leverages top down contextual understanding.
▪ Road-side directives (TFL/TSR) ▪
stop-line , crosswalk) ▪ Lane type- HOV, bicycle lane ▪ The DP association ▪ Road Friction ▪ Boundary type ▪ OCR
K
Confidential
Confidential
Free-Space detection via 3D Occupancy Engine
Model-based approach
Road User detection & tracking
Model-based approach
Data-driven classification approach
Key use-case static object near crosswalk - distinguish between:
Dedicated Deep Neural Net fed with Lidar reflections to resolve semantic ambiguities. Pedestrians – give way Traffic signs – drive through
Localization in sparse semantic map is enabled by extracting rich Lidar features
Vehicle trajectory Semantic map information & Lidar reflections projected onto front camera Bird’s view display + map semantics