3D Attention-Driven Depth Acquisition for Object Identification Kai - - PowerPoint PPT Presentation

3d attention driven depth acquisition for object
SMART_READER_LITE
LIVE PREVIEW

3D Attention-Driven Depth Acquisition for Object Identification Kai - - PowerPoint PPT Presentation

3D Attention-Driven Depth Acquisition for Object Identification Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen National University of Defense Technology Shandong University Shenzhen


slide-1
SLIDE 1

3D Attention-Driven Depth Acquisition for Object Identification

Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen

National University of Defense Technology Shandong University Shenzhen University SIAT Stanford University Tel-Aviv University

slide-2
SLIDE 2
  • Robotic indoor scene modeling

Background & motiv ivatio ion

Perception on object

slide-3
SLIDE 3
  • Indoor environments acquisition and modeling

Background & motiv ivatio ion

[Nießner et al. 2013] [Xu et al. 2015]

Object Extraction Dense Reconstruction

slide-4
SLIDE 4

Background & motiv ivatio ion

What are these objects?

slide-5
SLIDE 5
slide-6
SLIDE 6

Activ ive obje ject recognit itio ion

slide-7
SLIDE 7

Activ ive obje ject recognit itio ion

slide-8
SLIDE 8

Proble lem settin ing

  • A robot actively acquires new observations to

gradually increase the confidence of object recognition

  • Two key components:

Object classification

Estimate object class based on so far acquired

  • bservations

View planning

Predict the Next-Best- View to maximize its information gain

slide-9
SLIDE 9

The main in chall llenge

  • Observatio

ion is is partia ial l and progressiv ive

  • Shape description/matching with partial data is

hard

  • Observations from varying views
slide-10
SLIDE 10

The main in chall llenge

  • Observatio

ion is is partia ial l and progressiv ive

  • View planning

Observed view

Unobserved views

? ? ?

How can you know which view is better without knowing its observation?

slide-11
SLIDE 11

The main in chall llenge

  • Real

l in indoor scenes are often clu luttered

  • Degrade recognition accuracy
  • Invalidate the off-line learned viewing policy
slide-12
SLIDE 12

Related work

slide-13
SLIDE 13

Rela lated work

  • Onli

line scene analy lysis is and modeli ling

Plane/Object Extraction [Zhang et al. 2014] SemanticPaint [Valentin et al. 2015]

slide-14
SLIDE 14

Rela lated work

  • Activ

ive reconstructio ion and recognit itio ion

Next-best-view for reconstruction [Wu et al. 2014] Next-best-view for recognition [Wu et al. 2015]

slide-15
SLIDE 15

Method

slide-16
SLIDE 16

The general l framework

slide-17
SLIDE 17

The general l framework Goal Belief

Observe

Action

Recognition View planning

slide-18
SLIDE 18

An attentio ional formula latio ion

“Humans fo focus att ttention sele lectively on part rts of the visual space to acquire information when and where it is needed, and combine information from different fixations over time to build up an in internal l re representatio ion of the scene” –– Ronald Rensink

Hand-writing recognition [Mnih et al. 2014] Image caption generation [Xu et al. 2015]

Internal representation

slide-19
SLIDE 19

Recurrent Attentio ion Model

  • Recurrent Neural Networks (RNN)

𝐢𝑢 𝐢𝑢−1 𝐢𝑢+1

… …

𝐳𝑢 𝐳𝑢−1 𝐳𝑢+1 𝐲𝑢 𝐲𝑢−1 𝐲𝑢+1 𝐳𝑢 𝐲𝑢 𝐢𝑢

𝐗ℎℎ 𝐗𝑗ℎ 𝐗ℎ𝑝

Aggregate information

slide-20
SLIDE 20

Vie iew-based observatio ion

𝜒𝑢 𝜄𝑢 𝑤0 𝑤𝑢 𝐽(0) 𝐽(t)

slide-21
SLIDE 21

ℎ2

(1)

𝐽(0)

𝜄 1 , 𝜚(1)

NBV emission

ℎ1

(1) Feature extraction

𝜄 0 , 𝜚(0)

initial view classify

View selection View aggregation

3D 3D Recurrent Attentio ion Model

ℎ2

(2)

𝐽(1)

𝜄 2 , 𝜚(2)

NBV emission

ℎ1

(2) Feature extraction

𝜄 1 , 𝜚(1)

classify

ℎ2

(3)

𝐽(2)

𝜄 3 , 𝜚(3)

ℎ1

(3) Feature extraction

𝜄 2 , 𝜚(2)

classify

… …

slide-22
SLIDE 22

3D 3D Recurrent Attentio ion Model

Multi-View CNN [Su et al. 2015] View pooling

… …

ℓ1 ℓ2 ℓ𝐿

……

CNN2 CNN1 CNN1 CNN1

Max-pooling ℎ2

(1)

𝐽(0)

𝜄 1 , 𝜚(1)

NBV emission

ℎ1

(1) Feature extraction

𝜄 0 , 𝜚(0)

initial view classify

ℎ2

(2)

𝐽(1)

𝜄 2 , 𝜚(2)

NBV emission

ℎ1

(2) Feature extraction

𝜄 1 , 𝜚(1)

classify

ℎ2

(3)

𝐽(2)

𝜄 3 , 𝜚(3)

ℎ1

(3) Feature extraction

𝜄 2 , 𝜚(2)

classify

… …

slide-23
SLIDE 23

Network train inin ing

CNN

Reinforcement learning

𝜄 𝑗 , 𝜚(𝑗) 𝐽(𝑗)

Indifferentiable

𝜄 𝑗 , 𝜚(𝑗) 𝐽(𝑗)

rendering

Back propagation

slide-24
SLIDE 24

Rein inforcement le learnin ing

agent environment

action reward

Depth acquisition How good the depth is?

state

Stop?

slide-25
SLIDE 25

Reward 𝑠

𝑢 = 𝐼𝑢 𝑞𝑢,

𝑞 + 𝐽𝑢 𝑞𝑢, 𝑞𝑢−1 − 𝐷𝑢

prediction accuracy movement cost information gain

slide-26
SLIDE 26

Part-le level attentio ion

How to distinguish these two chairs? Informative parts

  • cclusion
slide-27
SLIDE 27

Attentio ion extractio ion

… …

Convolutional Neural Network

……

… … Mid-level kernels in CNN

slide-28
SLIDE 28

Attentio ion extractio ion

One wing Two wings

slide-29
SLIDE 29

Results and evaluation

slide-30
SLIDE 30

Database

57,452 models 57 categories 12,311 models 40 categories Render

model 52 sampled views 260 sampled views

Render with jittering

slide-31
SLIDE 31

Tim imin ing

Database MV-RNN train MV-RNN test ShapeNet 49 hr. 0.1 sec. ModelNet40 22 hr. 0.1 sec.

slide-32
SLIDE 32

Vis isuali lizatio ion of attentio ions

Part-level attention View sequence View sequence

slide-33
SLIDE 33

NBV estim imation

Classification Accuracy

40 classes

slide-34
SLIDE 34

NBV estim imation under occlu lusio ion

Classification Accuracy

slide-35
SLIDE 35

Result lts on real l scenes

slide-36
SLIDE 36

Result lts on real l scenes

slide-37
SLIDE 37

Result lts on real l scenes

slide-38
SLIDE 38

Lim imit itations

  • Recognizable objects
  • No contextual information
slide-39
SLIDE 39

Future works: Mult lti-modal l recognit itio ion

What is this?

Image database Shape database

slide-40
SLIDE 40

Future: Mult lti-robot scene reconstructio ion & understandin ing

40

Turtlebot PR2 AscTec Pelican

slide-41
SLIDE 41

Future: Mult lti-robot attentio ion model

41

Attention based on shared internal representation?

slide-42
SLIDE 42

Thank you Q & A

More details: kevinkaixu.net & yifeishi.net