Li Lightweight Multi-Vie View 3D 3D Pose ose Esti timati tion - PowerPoint PPT Presentation

Li Lightweight Multi-Vie View 3D 3D Pose ose Esti timati tion on th throu ough Ca Camera-Di Dise sentangled Represe sentati tion on Edoardo Remelli Shangchen Han Sina Honari Pascal Fua Robert Wang

Motivation Multi-view input from synchronized and calibrated cameras State-of-the-art multi-view pose estimation solutions project 2D detections to 3D volumetric grids and reason jointly across views through computationally intensive 3D CNN or Pictorial Structures Can we fuse features both effectively and efficiently in latent space instead?

Problem Setting Pinhole camera model: ) = ( + = , - + = ,(/+ + 1) Given: Find: ' • 3D articulated pose ) • Multi-view input crops {" # } #%& ' in world coordinates • Camera projection matrices {( # } #%&

Lightweight pose estimation Real-time multi-view 3D pose estimation methods: • Do not share information between features, although they represent the same pose in different coordinate systems • Do not supervise for the metric of interest and use triangulation as a post-processing step ? Soft Shallow Resnet Integration decoder ? Triangulation Soft Shallow Resnet Integration decoder 3D pose 2D detections

Our Baseline [Fusion] 3d pose embedding in Soft camera coordinates Shallow Resnet Integration decoder 512X8x8 512X8x8 Conv layers 1024X8x8 1024X8x8 Soft Shallow Resnet Integration decoder Fusion 512X8x8 512X8x8 How to reason jointly about pose across views? Let the network do all the hard work… Pros: simple to implement, effective Cons: overfits by design to camera setting, does not exploit camera transforms explicitly

Can we do better? Pinhole camera model: & = ! ( = ) * ( = )(,( + .) Feature in Feature in Feature in Feature in world camera world camera coordinates coordinates coordinates coordinates #$ ! ! Soft Shallow " " Resnet Integration decoder Conv layers #$ ! ! Soft Shallow % % Resnet Integration decoder Canonical Fusion If we could map features to a common frame of reference before fusing them, jointly reasoning about views would become much easier for the network. How to apply transformation to feature maps without using 3d volume aggregation?

Review of Transforming Auto-Encoders Given a representation learning task and a known source of variation ! , [1] proposes to learn equivariance with respect to the source of variation by conditioning latent code on the variation Transforming Auto-Encoder Auto-Encoder & # $ " # $ " # $ & # $ ( # $ →# % " # $ " # % & # % & # % = ( # $ →# % [& # $ ] " # % " # % How to choose transform ( # $ →# % ? [1]: Worrall, Garbin, Turmukhambetov, and Brostow. Interpretable Transformations with Encoder-Decoder Networks

Review of Transforming Auto-Encoders [1] How to choose transformation T? ! " # > " # →" 7 = " # = " 7 • Linear ROTATIONS • Invertible ! " 7 = > " # →" 7 [! " # ] • Norm preserving Feature transform layer: We can use a feature transform ! " # ∈ ℝ 3 4 5 4 6 layers to map features between ! " # = ! " # . &'(ℎ*+'(2, /) frames of reference ! " 7 = 8 " # →" 7 ! " # ! " 7 = ! " 7 . &'(ℎ*+'(:, ;, <) [1]: Worrall, Garbin, Turmukhambetov, and Brostow. Interpretable Transformations with Encoder-Decoder Networks

Our architecture [Canonical Fusion] Feature in Feature in Feature in Feature in camera world world camera coordinates coordinates coordinates coordinates #$ ! Soft ! Shallow " " Resnet Integration decoder 512X8x8 512X8x8 Conv layers 512X8x8 1024X8x8 #$ ! ! Soft Shallow % % Resnet Integration decoder Canonical 512X8x8 512X8x8 Fusion Makes use of camera information (Flexible) • Lightweight (Does not rely on volumetric aggregation) •

Now that we computed 2D detections, how can we lift them to 3D differentiably? Direct Linear Transform (DLT) ' ( {" # } #%&

Review of DLT From Epipolar Geometry: &. ( ) # , # = - # &. ( = 3 1. , # − - # - # 0. ( ) # " # = + # ( ) # / # = - # 0. ( = 3 ' {" # } #%& ( 1. / # − - # - # 1. ( ) # = - # Accumulating over available N views: 4 ∈ ℝ 0' ×9 4( = 3, Admits non-trivial solution only if " # and + # are not noisy, therefore we must solve a relaxed version Equivalent to finding the eigenvector of 4 . 4 associated to min 4( , the smallest eigenvalue = >. @. A = 1 C D#E (4 . 4)

How to solve it? ! "#$ % & % ? In literature, the smallest eigenvalue is found by computing a Singular Value Decomposition (SVD) of matrix A [2] We argue that this is sub-optimal because: • we need only the smallest eigenvalue, not full SVD factorization • SVD is not a GPU friendly algorithm [3] [2]: Hartley and Zisserman, Multiple view geometry in computer vision [3]: Dongarra, Gates, Haidar, Kurzak, Luszczek, Tomov, and Yamazaki, Accelerating numerical dense linear algebra calculations with GPUs

How to solve it? Step 1 : derive a bound for the the smallest singular value of matrix A: Step 2 : use it to estimate the smallest singular value. Then refine the estimate iteratively using Shifted Power Iteration method. Algorithm 1 is guaranteed to converge to the desired singular value because of the bound above.

Quantitative Evaluation – Direct Linear Triangulation For reasonably accurate 2D detections, our algorithm converges in as little as 2 iterations to the desired eigenvalue. Since it requires only a small matrix inversion and few matrix multiplications, it is much faster than performing full SVD factorizations, especially on GPUs.

Quantitative Evaluation – H36M w/o additional training data: w additional training data:

Quantitative Evaluation- Total Capture Seen cameras: Unseen cameras:

Contributions • A novel multi-camera fusion technique that exploits 3D geometry in latent space to jointly reason about different views efficiently • A new GPU-friendly differentiable algorithm for solving Direct Linear Triangulation, which is up to 3 orders of magnitude faster than SVD-based implementations while allowing us to supervise directly for the metric of interest Camera-Disentangled Representation Soft Shallow Resnet %$ Integration ! |# ! |# decoder $ $ Differentiable & '( GPU-friendly Triangulation Soft Shallow 3D pose Resnet %$ Integration ! |# ) ! |# ) decoder 2D detections

Please refer to the video for qualitative results and visualizations! For any question, feel free to reach out to edoardo.remelli@epfl.ch Thank you!

Li Lightweight Multi-Vie View 3D 3D Pose ose Esti timati tion - PowerPoint PPT Presentation

Li Lightweight Multi-Vie View 3D 3D Pose ose Esti timati tion on th throu ough Ca Camera-Di Dise sentangled Represe sentati tion on Edoardo Remelli Shangchen Han Sina Honari Pascal Fua Robert Wang Motivation Multi-view input

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

emagnification: 2019 Nordic and Baltic Stata Users a tool f for e esti timati ting e effect

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

PR OGR AM OVE R VIE W PR OGR AM OVE R VIE W PR OGR AM OVE R VIE W gov/hospitals

CosyPose: Consistent multi-view multi-object 6D pose estimation arXiv:2008.08465 Yann Labb 1,2

Synthetic Occlusion Augmentation wit ith Volumetric Heatmaps fo for r 3D Human Pose Esti

Human Pose Estimation by Yannic Jnike - 04.11.2019 https://www.youtube.com/watch?v=mxKlUO_tjcg

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda What is Hand Pose

Ann Ar bor Downtown De ve lopme nt Author ity F Y21 Budg e t Re vie w Ann Arb o r DDA F Y21

z VIE KEY CONSULTING Vie Key Consulting was established by Slava Volotovsky in 2007. The company

PRE PRE SE SE NTATION ON RE NTATION ON RE VIE VIE W W P ANE L TO THE CIVIL AVIATION

$30 MILLION PRIVATE PLACEMENT November 2018 OSE Ticker PEN OSE Ticker PEN www.panoroenergy.com

CORPORATE PRESENTATION March, 2018 OSE Ticker PEN OSE Ticker PEN www.panoroenergy.com Corporate

The Long Journey to webOS Open Source Edition AGENDA webOS : History and Evolution Overview of

SS PROCE OUT OSE CL GRANT June 2017 RE PORT IS NOT T HE CL OSE OUT RE PORT

The Luntz Research Companies Luntz Research & Strategic Services n

Legal and Policy Aspects of CIRs Ashok.B.Radhakissoon General Counsel/Policy Liaison AfriNIC

LOGIC DESIGN OF STRUCTURED CONFIGURABLE CONTROLLERS Dr Jacek Tkacz, Prof. Marian Adamski

Automatic Verification of non-silent Population Protocols Masters Thesis Martin Helfrich

Annua l Gra nts Ma na g e me nt Surve y Re sults a nd Ana lysis FEBRUARY, 2020 RE I Syste ms,

Resource Management Challenges in the Era of Extreme Heterogeneity Ron Brightwell, R&D

Webinar Employee Self Service (ESS) November 13, 2014 Gavin Scott, QSS Mark Bixby, QSS Agenda

Keep your Data Close and your Caches Hotter using Apache Kafka, Connect and KSQL @gamussa |

Li Lightweight Multi-Vie View 3D 3D Pose ose Esti timati tion - PowerPoint PPT Presentation

Li Lightweight Multi-Vie View 3D 3D Pose ose Esti timati tion on th throu ough Ca Camera-Di Dise sentangled Represe sentati tion on Edoardo Remelli Shangchen Han Sina Honari Pascal Fua Robert Wang Motivation Multi-view input

For New Construction &amp; Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

emagnification: 2019 Nordic and Baltic Stata Users a tool f for e esti timati ting e effect

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

PR OGR AM OVE R VIE W PR OGR AM OVE R VIE W PR OGR AM OVE R VIE W gov/hospitals

CosyPose: Consistent multi-view multi-object 6D pose estimation arXiv:2008.08465 Yann Labb 1,2

Synthetic Occlusion Augmentation wit ith Volumetric Heatmaps fo for r 3D Human Pose Esti

Human Pose Estimation by Yannic Jnike - 04.11.2019 https://www.youtube.com/watch?v=mxKlUO_tjcg

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda What is Hand Pose

Ann Ar bor Downtown De ve lopme nt Author ity F Y21 Budg e t Re vie w Ann Arb o r DDA F Y21

z VIE KEY CONSULTING Vie Key Consulting was established by Slava Volotovsky in 2007. The company

PRE PRE SE SE NTATION ON RE NTATION ON RE VIE VIE W W P ANE L TO THE CIVIL AVIATION

$30 MILLION PRIVATE PLACEMENT November 2018 OSE Ticker PEN OSE Ticker PEN www.panoroenergy.com

CORPORATE PRESENTATION March, 2018 OSE Ticker PEN OSE Ticker PEN www.panoroenergy.com Corporate

The Long Journey to webOS Open Source Edition AGENDA webOS : History and Evolution Overview of

SS PROCE OUT OSE CL GRANT June 2017 RE PORT IS NOT T HE CL OSE OUT RE PORT

The Luntz Research Companies Luntz Research &amp; Strategic Services n

Legal and Policy Aspects of CIRs Ashok.B.Radhakissoon General Counsel/Policy Liaison AfriNIC

LOGIC DESIGN OF STRUCTURED CONFIGURABLE CONTROLLERS Dr Jacek Tkacz, Prof. Marian Adamski

Automatic Verification of non-silent Population Protocols Masters Thesis Martin Helfrich

Annua l Gra nts Ma na g e me nt Surve y Re sults a nd Ana lysis FEBRUARY, 2020 RE I Syste ms,

Resource Management Challenges in the Era of Extreme Heterogeneity Ron Brightwell, R&amp;D

Webinar Employee Self Service (ESS) November 13, 2014 Gavin Scott, QSS Mark Bixby, QSS Agenda

Keep your Data Close and your Caches Hotter using Apache Kafka, Connect and KSQL @gamussa |

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

The Luntz Research Companies Luntz Research & Strategic Services n

Resource Management Challenges in the Era of Extreme Heterogeneity Ron Brightwell, R&D