Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF - - PowerPoint PPT Presentation

autonomous driving on benchmarks
SMART_READER_LITE
LIVE PREVIEW

Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF - - PowerPoint PPT Presentation

Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF BENCHMARKING Two decades of benchmarking MNIST 1998 Character recognition 60,000 images Inspired Convolutional Neural Net Two decades of benchmarking PASCAL-VOC


slide-1
SLIDE 1

Autonomous Driving on Benchmarks

Xiaodi Hou

slide-2
SLIDE 2

TWO DECADES OF BENCHMARKING

slide-3
SLIDE 3

Two decades of benchmarking

  • MNIST

– 1998 – Character recognition – 60,000 images

  • Inspired Convolutional Neural

Net

slide-4
SLIDE 4

Two decades of benchmarking

  • PASCAL-VOC

– 2005 – Object detection & classification – 3787 images

  • Inspired Deformable Part-

based Model

slide-5
SLIDE 5

Two decades of benchmarking

  • ImageNet

– 2010 – Object classification – 1,000,000 images

  • Inspired deep learning
slide-6
SLIDE 6

LIMITATIONS OF BENCHMARKS

slide-7
SLIDE 7

Upper bounds of benchmarks

Objective tasks Intermediate tasks Subjective tasks

  • Measuring physical reality
  • Bounded by measurement

accuracy

  • Stereo/Optical flow/Face

recognition

  • Measuring human cognition
  • Bounded by subject

agreement

  • Saliency/Memorability/Image

captioning

slide-8
SLIDE 8

Imperfect benchmarks

  • Marriage market in China

– Tall, rich, and handsome

  • 80% girls are forced to choose among

– tall poor ugly guy – short rich ugly guy – short poor handsome guy

  • Dimensionality reduction

– Guaranteed information loss! – A projection of 𝑺𝒐 → 𝑺

  • Red or Blue?
slide-9
SLIDE 9

Signs of a fading benchmark

  • Saturated competition

– Labeled Face in the Wild (0.9978 ± 0.0007)

  • Weak transferability

– Middlebury Optical Flow → KITTI Optical Flow

  • Poor inert-subject consistency

– Image captioning and BLEU scores

  • A man throwing a frisbee in a park.
  • A man holding a frisbee in his hand.
  • A man standing in the grass with a frisbee.
slide-10
SLIDE 10

BENCHMARKS AND AUTONOMOUS DRIVING

slide-11
SLIDE 11

Vision-based autonomous driving benchmarks

  • KITTI & CityScapes

– Detection – Tracking – Stereo/Flow – SLAM – Semantic segmentation

  • 100% traditional vision

challenges

  • Are we ready?
slide-12
SLIDE 12

Not yet…

slide-13
SLIDE 13

Challenge 1: Data distribution

  • Academia

– Average performance

  • Silicon valley startup

– Demo oriented – Best case performance

  • Real products

– Murphy’s law – Worst case performance

slide-14
SLIDE 14

Challenge 2: Gruond-truth representation

  • Bbox

– Almost no bbox in real world! – Missing hidden variables (distance & velocity)

  • Semantic segmentation

– “pixel classification” – How to assemble all the pixels?

  • Stixels

– Representing the world using matchstick – Distance and 3D shape – Missing the notion of whole objects

slide-15
SLIDE 15

Challenge 3: Structured prior

  • What’s wrong with end-to-end learning?
slide-16
SLIDE 16

Challenge 3: Structured prior

  • Two types of priors:

– Implicit prior

  • Data driven (e.g. images)
  • Good for deep learning models

– Explicit prior

  • Rule driven (e.g. cars cannot fly)
  • Good for probabilistic models
  • The road ahead

– An image based problem with strong explicit priors

slide-17
SLIDE 17

TUSIMPLE CHALLENGES! WORKSHOP@CVPR 2017

slide-18
SLIDE 18

TuSimple Challenge 1: Lane challenge

slide-19
SLIDE 19

TuSimple Challenge 1: Lane challenge

  • Deep learning for lane?

– Parametrization of pixels

  • Strong structure priors

– ~ 3.75m lane width – Parallel lines – (almost) flat road surface

  • Over-representing corner cases

– 20% hard cases (heavy occlusion/strong light condition change/bad markings) are unlikely to occurs, if sampled uniformly

slide-20
SLIDE 20

TuSimple Challenge 2: Velocity estimation

  • Representing the world with cam + LiDAR
slide-21
SLIDE 21

TuSimple Challenge 2: Velocity estimation

  • Object-level representation for motion planning

– Stereo map? – SLAM? – Estimation based on bbox size?

  • LiDAR vs Camera

– No LiDAR solution for 200m perception

slide-22
SLIDE 22

TuSimple challenges

  • Video clip based

– We expect non-trivial temporal aggregation!

  • Confidence based

– Each entry has a “confidence” field – We evaluate the most confident 80% entries

  • Run-time

– Must report single GPU runtime speed – Slow algorithms (< 3fps) will not be included in the leaderboard

slide-23
SLIDE 23

HTTP://BENCHMARK.TUSIMPIE.AI

Available now!!

slide-24
SLIDE 24

Xiaodi Hou