Visual-Inertial Odometry and Object Mapping with Structural - PowerPoint PPT Presentation

Visual-Inertial Odometry and Object Mapping with Structural Constraints Mo Shan and Nikolay Atanasov Department of Electrical and Computer Engineering 1 / 36

SLAM • Simultaneous Localization And Mapping (SLAM): a model of the environment (the map), and the estimation of the state of the robot moving within it (C. Cadena et al., 2016). Figure: SLAM framework. 2 / 36

Factor graph • SLAM as a factor graph Figure: Factor graph. Blue circles: robot poses, green circles: landmark positions, red circle: variable of intrinsic parameters (K). u: odometry constraints, v: camera observations, c: loop closures, p: prior factors. 3 / 36

Motivation Object-level semantics are important for • improving performance of feature tracking • reducing drift via loop closure • obtaining compressed maps of objects for subsequent tasks Figure: An object map. 4 / 36

Objective A robot equipped with an IMU and RGB camera, localize the robot using visual-inertial odometry (VIO), and map the objects composed of semantic landmarks in the scene using: • inertial observations: linear acceleration and angular velocity • geometric measurements from geometric landmarks • semantic measurements from keypoints on objects 5 / 36

State of the Art • Traditional VIO, SLAM approaches such as ORB SLAM (Mur-Artal et al., 2017), DSO (J. Engel et al., 2016) rely on geometric features, eg ORB, SIFT, but overlook objects • Learning-based approaches that use convolutional neural networks (CNNs) only regress camera pose but do not produce meaningful maps • Initial attempts on object-level SLAM often use iterative optimization as well as complicated object CAD models 6 / 36

Contribution We exploits the object semantics to • obtain uncertainty estimates for the semantic feature locations • achieve probabilistic tracking of composite semantic features, i.e., at the object level • exploit object structure constraints (e.g., the wheels of a car should not be very close or far away to each other) to execute an accurate estimate 7 / 36

Objects • Objects in the environment O � { ( o i , c i ) } N o i =1 • Object of class c i ∈ C o defined by N s ( c i ) semantic keypoints. • There also exits the pairwise category-specific constraint arising from the shape prior 8 / 36

Problem formulation Given measurements { i z t , g z t , c z t , s z t , b z t } T t =1 , determine the sensor trajectory X and the object states O that maximize the measurement likelihood: T � log( p ( i z t |X ) p ( g z t |X ) p ( c z t , b z t , s z t |O , X )) max (1) O , X t =1 The likelihood terms above can be defined as Gaussian density functions. Variances are determined by the measurement noise. Means are determined by the dynamic equations of motion over the SE (3) Lie group and the camera perspective model. 9 / 36

Front-end • We use a stacked hourglass convolutional network to extract mid-level semantic features and their uncertainties, used for the probabilistic tracking of composite semantic features 10 / 36

Keypoint detection • StarMap produces heatmap for all keypoints. • Corresponding features as 3D locations in the canonical object view (CanViewFeature) • Augmented with an additional depth channel (DepthMap) to lift the 2D keypoints to 3D Figure: Starmap. 11 / 36

MC dropout Figure: Starmap. 12 / 36

MC dropout The Monte Carlo estimate is named MC dropout, and defined as in Eq. 2 B y mc = 1 � ˆ y i ˆ B i =1 (2) B η mc = 1 � y ) 2 ˆ (ˆ y i − ˆ B i =1 MC dropout approximately integrates over the models weights and can be interpreted as a Bayesian approximation of a Gaussian process (Y. Gal, 2016). 13 / 36

Object-level tracking • Use Kalman Filter to fuse the detection and tracking: KanadeLucasTomasi (KLT) feature tracker for prediction and keypoint detection as update. • The state for object i at time t is � � y N kp a i y 1 t = x b (3) ... t t t t � ( b x 1 x 1 t , b y 1 y 1 t , b x 2 x 2 t , b y 2 y 2 where x b t , b ˙ t , b ˙ t , b ˙ t , b ˙ t ) contains the coordinates of the object bounding box and their velocities, and y j t � ( k x t , k ˙ x t , k y t , k ˙ y t ) , j ∈ 1 ... N kp represents the coordinates and velocities of semantic keypoints. • The tracker jointly tracks the bounding box and all the N kp semantic keypoints on each car. 14 / 36

Notation • We denote the global frame by { G } , the IMU frame by { I } , and the camera frame by { C } . • The transformation from { I } to { C } is specified by a I p ∈ R 3 and unit quaternion C translation C I q using a left-handed JPL convention • Alternatively via a transformation matrix: � C C � I R I p C I T � ∈ SE (3) , (4) 0 1 15 / 36

VIO background • The state of the IMU is defined as ∈ R 16 , I x � � � I ¯ q b g I v b a I p (5) • Our objective: estimate the true state I x with an estimate I ˆ x : ˆ ˆ ∈ R 16 . x � � I ˆ � I ˆ ¯ q b g I ˆ v b a I ˆ p (6) • The IMU error state is: � � I ˜ ∈ R 15 . x � ¯ ˜ ˜ I ˜ (7) θ b g I ˜ v b a I ˜ p • I ˜ 2 ˜ ¯ q ≃ [ 1 ¯ θ ⊤ θ is the angle axis representation of I ˜ q , and ˜ 1] ⊤ ¯ ¯ 17 / 36

State augmentation • Keep a history of the camera poses of length W + 1. The camera state and error state are: x � ( C ˜ ¯ p ) ∈ R 6( W +1) . C x � ( C ¯ C ˜ θ, C ˜ q , C p ) , (8) • The complete state and error state at time t are: x t � x t � � � ˜ � I ˜ C ˜ � I x t C x t − W : t , x t x t − W : t . (9) 18 / 36

Prediction • We can discretize the state estimate dynamics to obtain the prediction step for the IMU state mean • Linearized continuous-time IMU error state dynamics satisfy: I ˙ ˜ x = F ( t ) I ˜ x + G ( t ) n I (10) • The propagated covariance of the IMU state is P II t +1 | t = Φ t P II t | t Φ t + Q t (11) n I n ⊤ � � • where Q = E is continuous noise covariance I � t Φ t = Φ ( t , t + 1) = exp( F ( τ )) d τ t +1 � t +1 Φ ( t + 1 , τ ) GQGΦ ( t + 1 , τ ) ⊤ d τ Q t = t 19 / 36

Prediction • The covariance matrix after augmentation with a new camera state is � ⊤ � I 15+6( W +1) � � I 15+6( W +1) P t +1 | t = P t +1 | t (12) J t J t • We obtain the Gaussian pdf p ( i z t | X ) in (1) 20 / 36

EKF vs MSCKF • EKF: Many features constrain one state. • MSCKF: One feature constrains many states. Figure: Comparison of EKF, MSCKF. 21 / 36

Update • The measurement model relating a landmark ℓ ∈ L to its observation z t in camera frame { C t } is: � � C t R ⊤ ( ℓ − C t p ) z t = π + n t (13) ℓ j is used to define a residual r j via first-order • The estimate g ˆ Taylor series linearization of g z j t − W : t based on (13): r j = g z j z j x + H j g ˜ t − W : t − g ˆ t − W : t ≈ H j ℓ j + n j x ˜ (14) ℓ • MSCKF update, p ( g z t | X ) in (1): o = A ⊤ r j ≈ A ⊤ H j x + A ⊤ n j = H j r j x + n j x ˜ o ˜ o . (15) 22 / 36

Constrained filtering • MSCKF with Persistent Object States C 1 ℓ ∨ C k ℓ ∨ � � x t = I ˜ C ˜ (16) x t x t − W : t ... 1 k • The original measurement model in EKF SLAM as in Eq. 13 is z = Hx t + n where x t is the state vector defined in eq. 16. The measurement model could be augmented to � n � z � � H � � = x t + (17) d D n c where the constraint is enforced as Dx t + n c = d , and n c is noise with covariance Σ c . 23 / 36

Constrained filtering • Landmarks annotations ℓ p ∼ N ( µ p , Σ p ), ℓ q ∼ N ( µ q , Σ q ) • The Euclidean distance d = || ℓ p − ℓ q || 2 , where ∆ ℓ = ℓ p − ℓ q ∼ N ( µ p − µ q , Σ p + Σ q ). • Covariance of d is A ( Σ p + Σ q ) A ⊤ , where A is the Jacobian of the L 2 norm. Figure: Pairwise constraints. 24 / 36

Constrained filtering • Constrained filtering could fuse all available sources of information (S. Tully et al., 2012) Figure: Posterior with equalities and inequalities constraints. 25 / 36

Quantitative Comparison Enforcing constraints could keep the points close to groundtruth with large measurement noise Figure: Left: 640 × 480 image, birdeye view. Right: RMSE comparison between Hybrid VIO and OrcVIO in Gazebo Simulation. 26 / 36

Qualitative evaluation • Gazebo simulation using real-world IMU data • Reconstruction for 22 cars • Drift in Z is large due to insufficient movement 27 / 36

Qualitative evaluation • Semantic keypoint detection using StarMap. Upper row: successes. Lower row: failures. 28 / 36

Qualitative evaluation • Semantic feature detection on real-world dataset 29 / 36

Qualitative evaluation Reconstruction snapshot on real-world dataset Figure: Visulization of reconstruction. 30 / 36

Visual-Inertial Odometry and Object Mapping with Structural - PowerPoint PPT Presentation

Visual-Inertial Odometry and Object Mapping with Structural Constraints Mo Shan and Nikolay Atanasov Department of Electrical and Computer Engineering 1 / 36 SLAM Simultaneous Localization And Mapping (SLAM): a model of the environment

Direct Methods in Visual Odometry July 24, 2017 Direct Methods in Visual Odometry July 24, 2017

Real-Time Visual-Inertial Mapping, Re-localization and Planning Onboard MAVs in Unknown

Inertial support of distinguished and inertial support representations Examples G -data

Visual Odometry and SLAM using Line Segment Features Ruben Gomez-Ojeda Machine Perception and

Visual Odometry for Bounding Legged Robots Presenter: Jae-Eun (Esther) Lim Advisor: Professor

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Enhancing Indoor Inertial Odometry with WiFi Raghav H. Venkatnarayan, Muhammad Shahzad NC State

Inertial Odometry on Handheld Smartphones Arno Solin 1 es 1 Esa Rahtu 2 Juho Kannala 1 Santiago

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Dipole Assisted Dipole Assisted Inertial Electrostatic Inertial Electrostatic Confinement

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Regularity of Man-Made Environments Danping Zou VALSE SE online ne semina nar 2019 2019

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Low-Drift, Efficient Visual Odometry and SLAM Utilizing Environmental Structures Seung Jae Lee 1

Trajectory (Motion) estimation of Autonomously Guided vehicle using Visual Odometry By Ashish

DSC Delivery Sub-Group 18 th February 2019 Agenda (1) Item Title Document Ref Lead Action Required

Overview of Applications for Marketing Authorisation Recent experience in Quality Assessment Dr

Low Density Polyethylene/MgO Nanocomposites as Insulation for HVDC Cables Z. Jiang, S. Ju, Z.

LMI results for robust control design of observer-based controllers, the discrete-time case with

REMOTE SENSING LiDAR & PHOTOGRAMMETRY 19 May 2017 SERVICES Visual Inspections Digital

Continuous Representation of Reality Coverage at all scales from the same toolbox Andrew Evans

Welcome Community Consultation Phase 2: Exploring the Options March 11 April 2, 2013 About

-Public Information Session 3- Redistricting Project Phase 1 December 16, 2019 Welcome and

Visual-Inertial Odometry and Object Mapping with Structural - PowerPoint PPT Presentation

Visual-Inertial Odometry and Object Mapping with Structural Constraints Mo Shan and Nikolay Atanasov Department of Electrical and Computer Engineering 1 / 36 SLAM Simultaneous Localization And Mapping (SLAM): a model of the environment

Direct Methods in Visual Odometry July 24, 2017 Direct Methods in Visual Odometry July 24, 2017

Real-Time Visual-Inertial Mapping, Re-localization and Planning Onboard MAVs in Unknown

Inertial support of distinguished and inertial support representations Examples G -data

Visual Odometry and SLAM using Line Segment Features Ruben Gomez-Ojeda Machine Perception and

Visual Odometry for Bounding Legged Robots Presenter: Jae-Eun (Esther) Lim Advisor: Professor

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Enhancing Indoor Inertial Odometry with WiFi Raghav H. Venkatnarayan, Muhammad Shahzad NC State

Inertial Odometry on Handheld Smartphones Arno Solin 1 es 1 Esa Rahtu 2 Juho Kannala 1 Santiago

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Dipole Assisted Dipole Assisted Inertial Electrostatic Inertial Electrostatic Confinement

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Regularity of Man-Made Environments Danping Zou VALSE SE online ne semina nar 2019 2019

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Low-Drift, Efficient Visual Odometry and SLAM Utilizing Environmental Structures Seung Jae Lee 1

Trajectory (Motion) estimation of Autonomously Guided vehicle using Visual Odometry By Ashish

DSC Delivery Sub-Group 18 th February 2019 Agenda (1) Item Title Document Ref Lead Action Required

Overview of Applications for Marketing Authorisation Recent experience in Quality Assessment Dr

Low Density Polyethylene/MgO Nanocomposites as Insulation for HVDC Cables Z. Jiang, S. Ju, Z.

LMI results for robust control design of observer-based controllers, the discrete-time case with

REMOTE SENSING LiDAR &amp; PHOTOGRAMMETRY 19 May 2017 SERVICES Visual Inspections Digital

Continuous Representation of Reality Coverage at all scales from the same toolbox Andrew Evans

Welcome Community Consultation Phase 2: Exploring the Options March 11 April 2, 2013 About

-Public Information Session 3- Redistricting Project Phase 1 December 16, 2019 Welcome and

REMOTE SENSING LiDAR & PHOTOGRAMMETRY 19 May 2017 SERVICES Visual Inspections Digital