2017
play

2017 IEEE 2017 Conference on Computer Vision and Pattern - PowerPoint PPT Presentation

2017 IEEE 2017 Conference on Computer Vision and Pattern Recognition DESIRE: DISTANT FUTURE PREDICTION IN DYNAMIC SCENES WITH INTERACTING AGENTS Namhoon Lee 1 , Wongun Choi 2 , Paul Vernaza 2 , Christopher B. Choy 3 , Philip H. S. Torr 1


  1. 2017 IEEE 2017 Conference on Computer Vision and Pattern Recognition DESIRE: DISTANT FUTURE PREDICTION IN DYNAMIC SCENES WITH INTERACTING AGENTS Namhoon Lee 1 , Wongun Choi 2 , Paul Vernaza 2 , Christopher B. Choy 3 , 
 Philip H. S. Torr 1 , Manmohan Chandraker 2,4 
 1: University of Oxford, 2: NEC Labs, 3: Stanford University, 4: UCSD CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  2. FUTURE PREDICTION • We address the problem of future prediction for multiple agents in dynamic scenes. • Future prediction is defined as predicting agents' future locations in terms of trajectories . CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  3. FUTURE PREDICTION IS DIFFICULT • Various factors 
 A prediction entails reasoning about probable outcomes from multiple influences (e.g., past motion, scene context, interactions ). 
 It requires accurate time-profile for inter-influence between agents. • Multi-modality 
 Future prediction is inherently riddled with uncertainty and is fundamentally different from path prediction. 
 A system needs to produce a distribution over all probable outcomes (future), instead of one deterministic output (a path). CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  4. FUTURE PREDICTION IS DIFFICULT Pedestrian • Various factors 
 fi Car (past motion, scene context, interactions). Future Trajectory Past • Multi-modality 
 Trajectory distribution over all Scene Elements probable outcomes problem scenario CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  5. DESIRE : 
 DE ep S tochastic I OC R NN E ncoder-decoder • DESIRE is a framework for distant future prediction of multiple interacting agents in dynamic scenes. • We generate multiple prediction hypothesis using Variational Auto-Encoder and rank-and-refine them within Inverse Optimal Control framework. trian Ranking Sample Observations Generation Re fi nement 1 ry 2 3 ry 4 ts workflow CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  6. DESIRE : 
 DE ep S tochastic I OC R NN E ncoder-decoder Sample Generation Module Ranking & Re fi nement Module RNN Decoder2 CVAE Y SCF SCF SCF RNN Encoder1 RNN Decoder1 Recon Regression μ Loss fc ∧ Y ∧ fc Δ Y ⊞ ⊠ Input ⊕ z + GRU GRU GRU GRU GRU GRU GRU GRU GRU fc soft X max σ fc Feature Pooling Scoring KLD Loss RNN Encoder2 fc fc fc Y GRU GRU GRU r1 r2 rt Iterative Feedback CNN ρ ( I ) ⊞ concat ⊠ mask ⊕ addition CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  7. SCENE CONTEXT FUSION (SCF) UNIT ∧ ∧ ρ ( I ) y i,t y j\i,t RNN Decoder2 SCF Feature Pooling Velocity fc h Yj\i ∧ ReLU ∙ ∧ � (y i,t ) ⊞ ∧ ∧ ∧ r (y i,t ; y j\i,t , h Yj\i ) p (y i,t ; ρ ( I )) ∧ x t-1 x t x t+1 GRU GRU GRU h i x t = γ (ˆ v i,t ) , p (ˆ y i,t ; ρ ( I )) , r (ˆ y i,t ; ˆ Y j \ i ) y j \ i,t , h ˆ CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  8. Prediction example perspective view top-down view Iterative Iteration: 0 Iteration: 1 Iteration: 3 feedback (10% acc. for CVAE and DESIRE) Prediction KITTI 
 SDD 
 (error in meters / miss-rate with 1m threshold) (pixel error at 1/5 resolution) errors Method 1s 2s 3s 4s 1s 2s 3s 4s Linear 0.89 / 0.31 2.07 / 0.49 3.67 / 0.59 5.62 / 0.64 2.58 5.37 8.74 12.54 RNN ED-SI 0.56 / 0.16 1.40 / 0.44 2.65 / 0.58 4.29 / 0.65 1.51 3.56 6.04 8.80 CVAE 0.35 / 0.06 0.93 / 0.30 1.81 / 0.49 3.07 / 0.59 1.84 3.93 6.47 9.65 DESIRE-S-IT0 0.32 / 0.05 0.84 / 0.26 1.67 / 0.43 2.82 / 0.54 1.59 3.31 5.27 7.75 DESIRE-SI-IT4 0.28 / 0.04 0.67 / 0.17 1.22 / 0.29 2.06 / 0.41 1.29 2.35 3.47 5.33 CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  9. DESIRE CHARACTERISTICS • Scalability : 
 The use of deep learning allows for end-to-end training and easy incorporation of multiple cues. • Diversity : 
 CVAE is combined with RNN encodings to generate stochastic prediction hypotheses to hallucinate multi-modalities. • Accuracy : 
 The IOC-based framework accumulates long-term future rewards and the refinement module learns to estimate a deformation of the trajectory, enabling more accurate predictions. CVPR’17 Spotlight - 23 July 2017 Namhoon Lee | Torr Vision Group, Department of Engineering Science

  10. THANK YOU

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend