[PPT] - Deep learning on graph for semantic segmentation of point cloud PowerPoint Presentation

SLIDE 1

Introduction Model Results Conclusion

Deep learning on graph for semantic segmentation

f point cloud

Alexandre Cherqui

Master in Electrical and Electronics Engineering Master Thesis LTS2, EPFL Picterra Supervisors: Michaël Defferrard (LTS2) Frank De Morsier (Picterra)

July 9th, 2018 1/29

SLIDE 2

Introduction Model Results Conclusion

Origins of the project

Need for surveying the territory. Aerial images taken from satellites or drones. Can be combined to get a 3D representation and thus better recognize objects. But manually labeled so far. Collaboration with startup Picterra to automatize the task.

Aerial images from a drone.

4/29

SLIDE 5

Introduction Model Results Conclusion Motivation Semantic segmentation Prior art on images From images to graphs

The problem of semantic segmentation

Deep learning can be used for different tasks: Images classification: very coarse level Objects detection: coarse level Semantic segmentation: fine level

(a) Illustration of detection (b) Illustration of semantic segmentation Illustrations of two problems which can be tackled with deep learning methods.

Semantic segmentation : perform a dense labelling. 5/29

SLIDE 6

Introduction Model Results Conclusion Motivation Semantic segmentation Prior art on images From images to graphs

Prior art on images

Patch based parallelized: from CNN[1] to FCN [2]

CNN architecture. FCN architecture.

6/29

SLIDE 7

Introduction Model Results Conclusion Motivation Semantic segmentation Prior art on images From images to graphs

Prior art on images

Learn the upsampling:

(a) DeconvNet [3] (b) Segnet [4]

Learn at different scales:

(c) U-net [5] (d) PSPNet [6]

7/29

SLIDE 8

Introduction Model Results Conclusion Motivation Semantic segmentation Prior art on images From images to graphs

From images to graphs

Our goal: semantic segmentation of 3D point clouds Some architectures directly extend what exist on images: 3D-CNN[7] But not well suited nor efficient (sparse data) → Graphs can efficiently represent these data + Efficient computations + Capture local neighborhood 8/29

SLIDE 9

Introduction Model Results Conclusion Build a graph Graph convolutions Coarsening and pooling Model architecture

Build a graph from a cloud

Mesh generation on a car.

wi,j = exp

− d2

i,j

2σ2

Adjacency matrix of the car.

10/29

SLIDE 11

Introduction Model Results Conclusion Build a graph Graph convolutions Coarsening and pooling Model architecture

Graph convolutions: from spectral to spatial domain

L = D − W L = UΛUT ˆ x = FG{x} = UTx ˜ x = FG

−1{ˆ

x} = Uˆ x = x For s ∈ Rn and x ∈ Rn: s ∗G x = FG

−1{FG{x} ⊙ FG{s}}

s ∗G x = U(UTx ⊙ UTs) = U(diag(ˆ x)UTs) s ∗G x = U      ˆ x(λ1) ... ˆ x(λn)      UTs [8] 11/29

SLIDE 12

Introduction Model Results Conclusion Build a graph Graph convolutions Coarsening and pooling Model architecture

Graph convolutions: from spectral to spatial domain

∀i, ˆ x(λi) =

K−1

j=0

θjTj(λi) [9] s ∗G x = U(

K−1

j=0

θjTj(Λ))UTs =

K−1

j=0

θjTj(L)s Ls =      

j∈N(1)

l1jsj . . .

j∈N(n)

lnjsj       , L2s =      

k∈N(1)

l1k

j∈N(k)

lkjsj . . .

k∈N(n)

lnk

j∈N(k)

lkjsj       ∀p ∈ [ [1; n] ], ∀k ∈ [ [1; Nout] ], Sout(p, k) =

Nin

i=1

K−1

j=0

θk

i,j(Tj(L)si)(p)

12/29

SLIDE 13

Introduction Model Results Conclusion Build a graph Graph convolutions Coarsening and pooling Model architecture

Form a binary tree to ease the pooling operation

1 2 1 3 1 2 1 2

(a) Match nodes with

respect to their edges weights for the different levels of coarsening

1 2 1 F 2 F 1 F 2 2 F F 4 F F F

(b) Reorder the nodes so that the

union of two matched neighbors from layer to layer forms a binary tree (add fake nodes F if needed)

Form a binary tree to ease the pooling operation.

13/29

SLIDE 14

Introduction Model Results Conclusion Build a graph Graph convolutions Coarsening and pooling Model architecture

Our architecture

64 RGBZ 128 256 512 256 64 BN + graph conv K=5 + BN + Relu Graph conv K=1 + softmax Max Pooling size=4 + graph conv K=5 + BN + Relu Unpooling with repetitions + graph conv K=5 + BN Graph conv K=5 + BN 128 N A node with N features 512

Model architecture. Spectral distances between colors are related to spatial distances between intra- and inter-layers real nodes.

14/29

SLIDE 15

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Available data

(a) Dataset (RGBZ) (b) Dataset (labelled) Cadastre: dataset provided by Pix4D. From 2D to 3D thanks to photogrammetry.

20 40 60 Proportion (in %) Ground High vegetation Building Road Car Man-made

bjects

50.64% 12.81% 13.25% 20.9% 0.43% 1.98%

Highly imbalanced class distribution.

16/29

SLIDE 17

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Data preprocessing

Tiling of the dataset in tiles of 36m ×36m (48m ×48m with the context):

Illustration of the tiles split: the dark green tiles correspond to the training set (50%), the dark blue ones to the validation (16%) set and the dark red ones to the test set (34%). The other colors correspond to the area where the tiles

verlap.

17/29

SLIDE 18

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Baselines and extra features

Random forest: 100 trees, max depth: 30, class weighted XGBoost: 100 trees, max depth: 5, learning rate: 0.2, weighted samples Extra features selected with random frorest: 3D aspect at scales 0.3m, 1.5m, 3m and 10m + angle between normals and xy plane.

B R Z G F , F , F , F ,

Features

2 4 6 8 10 12 14

Importances (in %)

Features selection with respect to their importances for the random forest.

18/29

SLIDE 19

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Performances on the cadastre with RGBZ

Performances Overall accuracy (in %) Mean accuracy (in %) Random Forest 74.93 52.92 XGBoost 64.68 59.44 Our model 85.85 68.09 Majority class 47.65 16.67

Performances on the test set of the cadastre with RGBZ.

G r

u

n d H i g h v e g . B u i l d i n g R

a

d C a r M a n

m

a d e

b

j e c t s

Predicted labels

Ground High veg. Building Road Car Man-made

bjects

True labels

788280 130824 20146 22300 552 4743 86530 146762 6109 3032 228 1114 16230 8782 200333 24093 1718 6862 32372 3624 72009 378941 2323 18372 1830 189 5014 2275 1413 645 11261 3504 11818 9750 462 4579

(a) Random Forest

Ground High veg. Building Road Car Man-made

bjects

Predicted labels

596373 271911 16552 19631 13798 48580 40939 184886 5372 2387 2749 7442 3601 12483 163870 26271 23527 28266 4973 2552 12277 344149 49086 94604 675 191 1245 1757 4997 2501 3274 4261 3864 6878 4955 18142

(b) XGBoost

Ground High veg. Building Road Car Man-made

bjects

Predicted labels

846658 57576 22981 31815 1107 5795 35099 201098 3110 2548 99 1642 6303 6161 217960 24277 682 2355 19299 6349 15653 462214 2354 2067 347 233 2828 1575 4896 1525 7121 5423 14857 3545 2187 8191

(c) Our model Confusion matrices computed on the test set of the cadastre with RGBZ.

19/29

SLIDE 20

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Performances on the cadastre with extra features

Performances Overall accuracy (in %) Mean accuracy (in %) Random Forest 87.61 63.53 XGBoost 83.78 73.83 Our model 86.63 71.83 Majority class 47.65 16.67

Performances on the test set of the cadastre with extra features.

G r

u

n d H i g h v e g . B u i l d i n g R

a

d C a r M a n

m

a d e

b

j e c t s

Predicted labels

Ground High veg. Building Road Car Man-made

bjects

True labels

891912 30474 22745 18528 250 2936 32726 197061 12127 699 66 1096 26174 8499 215611 5362 205 2167 27568 1384 10036 463385 1719 3549 2293 411 2222 3896 1554 990 11774 4476 9765 6696 568 8095

(a) Random Forest

Ground High veg. Building Road Car Man-made

bjects

Predicted labels

814193 63124 24420 23253 7346 34509 10057 213879 7028 529 2136 10146 8199 7299 207669 7544 3009 24298 7582 1036 5984 436208 26722 30109 387 162 468 1266 5832 3251 2598 4302 4305 3824 4285 22060

(b) XGBoost

Ground High veg. Building Road Car Man-made

bjects

Predicted labels

819496 33959 31887 64030 445 17028 28018 201935 6453 4608 64 2697 4438 4315 229129 17839 193 2104 9972 2887 1893 487711 1167 4011 198 44 637 2955 4902 2630 7964 3409 5592 7473 2304 14632

(c) Our model Confusion matrices computed on the test set (cadastre) with extra features.

20/29

SLIDE 21

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Qualitative results on the cadastre with RGBZ

(a) Test set (b) Ground truth (c) Predictions Qualitative results of our model on the test set.

21/29

SLIDE 22

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Performances on another dataset

Performances Overall accuracy (in %) Mean accuracy (in %) Random Forest 82.01 63.38 XGBoost 78.30 66.20 Our model 87.47 87.57 Majority class 51.22 25.00

Performances on the test set from Picterra’s dataset with RGBZ.

Ground High veg. Pylons Powerlines

Predicted labels

Ground High veg. Pylons Powerlines

True labels

424057 115348 501 5 78368 511921 612 353 430 2198 3333 3418 2 1933 4544 7260

(a) Random Forest

G r

u

n d H i g h v e g . P y l

n

s P

w

e r l i n e s

Predicted labels

402586 119247 18072 6 80742 488560 15157 6795 49 883 4588 3859 241 5433 8065

(b) XGBoost

G r

u

n d H i g h v e g . P y l

n

s P

w

e r l i n e s

Predicted labels

468800 49811 6305 14995 69942 520144 945 223 112 124 7243 1900 39 177 23 13500

(c) Our model Confusion matrices computed on the test set from Picterra with RGBZ.

22/29

SLIDE 23

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Qualitative results on test set from Picterra with RGBZ

(a) Test set (b) Ground truth (c) Predictions Qualitative results of our model on the test set from Picterra.

23/29

SLIDE 24

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Performances inter-dataset

Performances Overall accuracy (in %) Mean accuracy (in %) Random Forest 71.32 43.57 XGBoost 75.34 52.86 Our model 95.15 84.07 Majority class 54.66 25.00

Performances on a dataset from Picterra with RGB.

Ground High veg. Pylons Powerlines

Predicted labels

Ground High veg. Pylons Powerlines

True labels

584726 162771 19004 27529 100536 447082 37208 39859 1285 2017 1039 1455 12709 9054 3226 3135

(a) Random Forest

G r

u

n d H i g h v e g . P y l

n

s P

w

e r l i n e s

Predicted labels

617705 143727 19895 12703 104818 471013 14326 34528 776 1226 2779 1015 10433 7635 7165 2891

(b) XGBoost

G r

u

n d H i g h v e g . P y l

n

s P

w

e r l i n e s

Predicted labels

735971 39190 123 18746 8956 615528 188 13 308 355 2619 2514 12 2 21 28089

(c) Our model Confusion matrices computed on a dataset from Picterra with RGB.

24/29

SLIDE 25

Introduction Model Results Conclusion Available data Data preprocessing Performances of our model

Qualitative results on a dataset from Picterra with RGB

(a) Test set (b) Ground truth (c) Predictions Qualitative results of our model on a dataset from Picterra.

25/29

SLIDE 26

Introduction Model Results Conclusion

Conclusion

Summing up: Model for semantic segmentation of aerial photogrammetry points clouds. Better results than random forest or XGBoost with a reduced number of features. Future work: Dilated convolutions and skip connections. Learning on other graphs. 26/29

SLIDE 27

Introduction Model Results Conclusion

References

[1] Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. Object Recognition with Gradient-Based Learning, pages 319–345. Springer Berlin Heidelberg, Berlin, Heidelberg, 1999. [2] Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully convolutional networks for semantic segmentation. IEEE, pages 1–12, May 2016. [3] Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution network for semantic segmentation. CoRR, abs/1505.04366, 2015. [4] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image

segmentation. CoRR, abs/1511.00561, 2015.

27/29

SLIDE 28

Introduction Model Results Conclusion

References (cont.)

[5] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015. [6] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. CoRR, abs/1612.01105, 2016. [7] Jing Huang and Suya You. Point cloud labeling using 3d convolutional neural network. pages 2670–2675, Dec 2016. [8] David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. Signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular data

domains. CoRR, abs/1211.0053, 2012.

28/29

SLIDE 29

Introduction Model Results Conclusion

References (cont.)

[9] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral

filtering. CoRR, abs/1606.09375, 2016.

Deep learning on graph for semantic segmentation

Alexandre Cherqui

July 9th, 2018 1/29

Table of contents

Introduction Motivation Semantic segmentation Prior art on images From images to graphs

Model Build a graph Graph convolutions Coarsening and pooling Model architecture

Results Available data Data preprocessing Performances of our model

Conclusion 2/29

Table of contents

Introduction Motivation Semantic segmentation Prior art on images From images to graphs

Model Build a graph Graph convolutions Coarsening and pooling Model architecture

Results Available data Data preprocessing Performances of our model

Conclusion 3/29

Origins of the project

Need for surveying the territory. Aerial images taken from satellites or drones. Can be combined to get a 3D representation and thus better recognize objects. But manually labeled so far. Collaboration with startup Picterra to automatize the task.

Aerial images from a drone.

4/29

The problem of semantic segmentation

Deep learning can be used for different tasks: Images classification: very coarse level Objects detection: coarse level Semantic segmentation: fine level

(a) Illustration of detection (b) Illustration of semantic segmentation Illustrations of two problems which can be tackled with deep learning methods.

Semantic segmentation : perform a dense labelling. 5/29

Prior art on images

Patch based parallelized: from CNN[1] to FCN [2]

CNN architecture. FCN architecture.

6/29

Prior art on images

Learn the upsampling:

(a) DeconvNet [3] (b) Segnet [4]

Learn at different scales:

(c) U-net [5] (d) PSPNet [6]

7/29

From images to graphs

Our goal: semantic segmentation of 3D point clouds Some architectures directly extend what exist on images: 3D-CNN[7] But not well suited nor efficient (sparse data) → Graphs can efficiently represent these data + Efficient computations + Capture local neighborhood 8/29

Table of contents

Introduction Motivation Semantic segmentation Prior art on images From images to graphs

Model Build a graph Graph convolutions Coarsening and pooling Model architecture

Results Available data Data preprocessing Performances of our model

Conclusion 9/29

Build a graph from a cloud

Mesh generation on a car.

wi,j = exp

2σ2

10/29

Graph convolutions: from spectral to spatial domain

L = D − W L = UΛUT ˆ x = FG{x} = UTx ˜ x = FG

x} = Uˆ x = x For s ∈ Rn and x ∈ Rn: s ∗G x = FG

s ∗G x = U(UTx ⊙ UTs) = U(diag(ˆ x)UTs) s ∗G x = U      ˆ x(λ1) ... ˆ x(λn)      UTs [8] 11/29

Graph convolutions: from spectral to spatial domain

∀i, ˆ x(λi) =

θjTj(λi) [9] s ∗G x = U(

θjTj(Λ))UTs =

θjTj(L)s Ls =      

l1jsj . . .

lnjsj       , L2s =      

l1k

lkjsj . . .

lnk

lkjsj       ∀p ∈ [ [1; n] ], ∀k ∈ [ [1; Nout] ], Sout(p, k) =

θk

12/29

Form a binary tree to ease the pooling operation

(a) Match nodes with

respect to their edges weights for the different levels of coarsening

(b) Reorder the nodes so that the

union of two matched neighbors from layer to layer forms a binary tree (add fake nodes F if needed)

Form a binary tree to ease the pooling operation.

13/29

Our architecture

Model architecture. Spectral distances between colors are related to spatial distances between intra- and inter-layers real nodes.

14/29

Table of contents

Introduction Motivation Semantic segmentation Prior art on images From images to graphs

Model Build a graph Graph convolutions Coarsening and pooling Model architecture

Results Available data Data preprocessing Performances of our model

Conclusion 15/29

Available data

(a) Dataset (RGBZ) (b) Dataset (labelled) Cadastre: dataset provided by Pix4D. From 2D to 3D thanks to photogrammetry.

Highly imbalanced class distribution.

16/29

Data preprocessing