Sem Semanti tic c segm segmen enta tati tion on CV3DST | - - PowerPoint PPT Presentation

sem semanti tic c segm segmen enta tati tion on
SMART_READER_LITE
LIVE PREVIEW

Sem Semanti tic c segm segmen enta tati tion on CV3DST | - - PowerPoint PPT Presentation

Sem Semanti tic c segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Ta Task d defin init itio ion: s semantic ic s segm gmentatio ion Classify the main object in the image. CAT , GRASS, TREE, SKY No objects, just


slide-1
SLIDE 1

1 CV3DST | Prof. Leal-Taixé

Sem Semanti tic c segm segmen enta tati tion

  • n
slide-2
SLIDE 2

Ta Task d defin init itio ion: s semantic ic s segm gmentatio ion

2 CV3DST | Prof. Leal-Taixé

CAT, GRASS, TREE, SKY No objects, just classify each pixel. Classify the main object in the image.

slide-3
SLIDE 3

Se Semantic ic Se Segmentatio ion

3 CV3DST | Prof. Leal-Taixé

  • Every label in the

image needs to be labelled with a category label.

  • Do not differentiate

between the instances (see how we do not differentiate between pixels coming from different cows).

slide-4
SLIDE 4

9 CV3DST | Prof. Leal-Taixé

Fully lly Convolu lutional l Netw Networks

slide-5
SLIDE 5

Fully convolutio ional neural networks

  • A FCN is able to deal with any input/output size

10 CV3DST | Prof. Leal-Taixé

Long, Shelhamer, Darrell - Fully Convolutional Networks for Semantic Segmentation, CVPR 2015, PAMI 2016

slide-6
SLIDE 6

Fully convolutio ional neural networks

11 CV3DST | Prof. Leal-Taixé

Convolutional layers 1. Replace FC layers with convolutional layers. 2. Convert the last layer

  • utput to the original

resolution. 3. Do softmax-cross entropy between the pixelwise predictions and segmentaion ground truth. 4. Backprop and SGD

slide-7
SLIDE 7

12 CV3DST | Prof. Leal-Taixé

1x1 Convolutions!

“Co Convolutio ionaliz izatio ion”

slide-8
SLIDE 8

13 CV3DST | Prof. Leal-Taixé

See a more detailed explanation in this quora answer.

“Co Convo volutionaliza zation”

slide-9
SLIDE 9

Se Semanti ntic c Se Segmenta ntati tion n (FCN)

  • Fully Convolutional Networks for Semantic Segmentation

14 CV3DST | Prof. Leal-Taixé

How do we upsample?

Long, Shelhamer, Darrell - Fully Convolutional Networks for Semantic Segmentation, CVPR 2015, PAMI 2016

slide-10
SLIDE 10

Network's archit itecture

15 CV3DST | Prof. Leal-Taixé

Predict the segmentation mask from high level features

slide-11
SLIDE 11

Network's archit itecture

16 CV3DST | Prof. Leal-Taixé

Predict the segmentation mask from high level features Predict the segmentation mask from mid-level features

slide-12
SLIDE 12

Network's archit itecture

17 CV3DST | Prof. Leal-Taixé

Predict the segmentation mask from high level features Predict the segmentation mask from mid-level features Predict the segmentation mask from low-level features

slide-13
SLIDE 13

Network's archit itecture

18 CV3DST | Prof. Leal-Taixé

Hierarchical training where the network is initially trained only based on high level features and then finetuned based on middle and low-level features.

slide-14
SLIDE 14

Network's archit itecture

19 CV3DST | Prof. Leal-Taixé

This is important because it allows the network to also learn the mid and low-level details of the image, in addition to high level ones.

slide-15
SLIDE 15

Qualit itativ ive results

20 CV3DST | Prof. Leal-Taixé

Good Better Best

slide-16
SLIDE 16

Qualit itativ ive results

21 CV3DST | Prof. Leal-Taixé

SDS is an R-CNN-based method, i.e., it uses object proposals. In general, FCN outperforms significantly (both qualitatively and quantitatively) pre-deep learning and quasi-deep learning methods and is recognized as the AlexNet of semantic segmentation.

slide-17
SLIDE 17

Au Autoenc ncoder-style le ar archit hitecture

22 CV3DST | Prof. Leal-Taixé

slide-18
SLIDE 18

Se SegNet

  • Step-wise upsampling

23 CV3DST | Prof. Leal-Taixé

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-19
SLIDE 19

Se SegNet

  • Enc

Encoder: normal convolutional filters + pooling

  • De

Decoder: Upsampling + convolutional filters

24 CV3DST | Prof. Leal-Taixé

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-20
SLIDE 20

Se SegNet

  • Enc

Encoder: normal convolutional filters + pooling

  • De

Decoder: Upsampling + convolutional filters

25 CV3DST | Prof. Leal-Taixé

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-21
SLIDE 21

Se SegNet

  • Enc

Encoder: normal convolutional filters + pooling

  • De

Decoder: Upsampling + convolutional filters

  • The convolutional filters in the decoder are learned

using backprop and their goal is to refine the upsampling

26 CV3DST | Prof. Leal-Taixé

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-22
SLIDE 22

Tr Trans nsposed co convo volu luti tion

  • Transposed convolution
  • Unpooling
  • Convolution filter (learned)
  • Also called up-convolution

(never deconvolution)

27 CV3DST | Prof. Leal-Taixé

Output 5x5 Input 3x3

slide-23
SLIDE 23

Se SegNet

  • Enc

Encoder: normal convolutional filters + pooling

  • De

Decoder: Upsampling + convolutional filters

  • Softmax

ax layer: The output of the soft-max classifier is a K channel image of probabilities where K is the number of classes.

28 CV3DST | Prof. Leal-Taixé

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-24
SLIDE 24

Upsampli ling

CV3DST | Prof. Leal-Taixé

29

slide-25
SLIDE 25

Ty Types of upsa upsampl plings gs

  • 1. Interpolation

30 CV3DST | Prof. Leal-Taixé

?

slide-26
SLIDE 26

Ty Types of upsa upsampl plings gs

  • 1. Interpolation

31 CV3DST | Prof. Leal-Taixé

Original image Nearest neighbor interpolation Bilinear interpolation Bicubic interpolation

Image: Michael Guerzhoy

slide-27
SLIDE 27

Ty Types of upsa upsampl plings gs

  • 1. Interpolation

Few artifacts

32 CV3DST | Prof. Leal-Taixé

slide-28
SLIDE 28

Ty Types of upsa upsampl plings gs

  • 2. Fixed unpooling

33 CV3DST | Prof. Leal-Taixé

  • A. Dosovitskiy, “Learning to Generate Chairs, Tables and Cars with Convolutional Networks“. TPAMI 2017

+ CONVS

efficient

slide-29
SLIDE 29

Ty Types of upsa upsampl plings gs

  • 3. Unpooling: “à la DeconvNet”

34 CV3DST | Prof. Leal-Taixé

Keep the locations where the max came from

Zeiler and Fergus. „Visualizing and understanding convolutional neural networks“. ECCV 2014

slide-30
SLIDE 30

Ty Types of upsa upsampl plings gs

  • 3. Unpooling: “à la DeconvNet”

Keep the details of the structures

35 CV3DST | Prof. Leal-Taixé

slide-31
SLIDE 31

Sk Skip p con connecti ection

  • ns

s (U (U-Net) Net)

36 CV3DST | Prof. Leal-Taixé

slide-32
SLIDE 32

Ski Skip Conne nnecti ctions ns

  • U-Net

37 CV3DST | Prof. Leal-Taixé

  • O. Ronneberger et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. MICCAI 2015

Pass the low- level information High-level information Recall ResNet

slide-33
SLIDE 33

Ski Skip Conne nnecti ctions ns

  • U-Net: zoom in

38 CV3DST | Prof. Leal-Taixé

  • O. Ronneberger et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. MICCAI 2015

append

slide-34
SLIDE 34

Ski Skip Conne nnecti ctions ns

  • Concatenation connections

39 CV3DST | Prof. Leal-Taixé

  • C. Hazirbas et al. “Deep depth from focus”. ACCV 2018
slide-35
SLIDE 35

DeepL DeepLab

41 CV3DST | Prof. Leal-Taixé

slide-36
SLIDE 36

Deep DeepLab ab

42 CV3DST | Prof. Leal-Taixé

slide-37
SLIDE 37

Se Semant ntic Se Segm gment ntation: n: 3 cha hallenge nges

  • Reduced feature resolution

– Proposed solution: Atrous convolutions

  • Objects exist at multiple scales

– Proposed solution: Pyramid pooling, as in detection.

  • Poor localization of the edges

– Proposed solution: Refinement with Conditional Random Field (CRF)

43 CV3DST | Prof. Leal-Taixé

slide-38
SLIDE 38

Se Semant ntic Se Segm gment ntation: n: 3 cha hallenge nges

  • Reduced feature resolution

– Proposed solution: Atrous convolutions

  • Objects exist at multiple scales

– Proposed solution: Pyramid pooling, as in detection.

  • Poor localization of the edges

– Proposed solution: Refinement with Conditional Random Field (CRF)

44 CV3DST | Prof. Leal-Taixé

slide-39
SLIDE 39

Wi Wish: no

  • redu

educed ed feat eature e res esol

  • lution
  • n

pixels in width x height x RGB pixels out width x height x classes conv conv conv conv Just convs & activations Fully Convolutional Network Super expensive!

slide-40
SLIDE 40

Al Alternative: Dilated (at atrous) ) con

  • nvol
  • lution

ions

46 CV3DST | Prof. Leal-Taixé

Sparse feature extraction with standard convolution on a low resolution input feature map. Dense feature extraction with atrous convolution with rate r = 2, applied on a high resolution input feature map.

slide-41
SLIDE 41

Al Alternative: Dilated (at atrous) ) con

  • nvol
  • lution

ions

47 CV3DST | Prof. Leal-Taixé

Sparse feature extraction with standard convolution on a low resolution input feature map. Dense feature extraction with atrous convolution with rate r=2, applied on a high resolution input feature map.

slide-42
SLIDE 42

Di Dilated ed (at atrous) ) con

  • nvol
  • lution

ions 1D

48 CV3DST | Prof. Leal-Taixé

(a) Sparse feature extraction with standard convolution on a low resolution input feature map. (b) Dense feature extraction with atrous convolution with rate r = 2, applied on a high resolution input feature map.

slide-43
SLIDE 43

Di Dilated ed (at atrous) ) co convo nvolutions ns in n 2D

49 CV3DST | Prof. Leal-Taixé

cla lass ss to torch ch.n .nn.Co Conv2d (in in_channels, ,

  • ut
  • ut_ch

channels els, , ker kernel_ el_si size, , st stride= e=1, , pa paddin ing=0, , di dilat ation= n=2) cla lass ss to torch ch.n .nn.Co ConvTran anspose2d (in in_channels, , out

  • ut_ch

channels els, , ker kernel_ el_si size, , st stride= e=1, , pa paddin ing=0, , di dilat ation= n=2)

Standard convolution has dilation 1 An analogy for dilated conv is a conv filter with holes Input Output

slide-44
SLIDE 44

Di Dilated ed (at atrous) ) con

  • nvol
  • lution

ions 2D

50 CV3DST | Prof. Leal-Taixé

(a) the dilation parameter is 1, and each element produced by this filter has reception field of 3x3. (b) the dilation parameter is 2, and each element produced by it has reception field of 7x7. (c ) the dilation parameter is 4, and each element produced by it has reception field of 15x15.

slide-45
SLIDE 45

Di Dilated ed (at atrous) ) con

  • nvol
  • lution

ions 2D

51 CV3DST | Prof. Leal-Taixé

Each layer has the same number of parameters, but the receptive field grows exponentially while the number of parameters grows linearly.

slide-46
SLIDE 46

Se Semant ntic Se Segm gment ntation: n: 3 cha hallenge nges

  • Reduced feature resolution

– Proposed solution: Atrous convolutions

  • Objects exist at multiple scales

– Proposed solution: Pyramid pooling, as in detection.

  • Poor localization of the edges

– Proposed solution: Refinement with Conditional Random Field (CRF)

52 CV3DST | Prof. Leal-Taixé

slide-47
SLIDE 47

Co Conditional l Ra Rand ndom Fields (CRF RF)

53 CV3DST | Prof. Leal-Taixé

Slide Credit: Philipp Krahenbuhl

slide-48
SLIDE 48

Ef Effect t of numbe mber of ite terati tions of CR CRF

54 CV3DST | Prof. Leal-Taixé

Score map (input before softmax function) and belief map (output of softmax function) for Aeroplane. The image shows the score (1st row) and belief (2nd row) maps after each mean field iteration. The output

  • f last DCNN layer is used as input to the mean field inference.
slide-49
SLIDE 49

DeepLab: qualit itativ ive results

55 CV3DST | Prof. Leal-Taixé

slide-50
SLIDE 50

DeepLab: qualit itativ ive results

56 CV3DST | Prof. Leal-Taixé

slide-51
SLIDE 51

DeepLab: qualit itativ ive results

57 CV3DST | Prof. Leal-Taixé

slide-52
SLIDE 52

Problems wit ith CRF

58 CV3DST | Prof. Leal-Taixé

  • The network is not trained end-to-end. The FCN and

the CRF are trained independently from each other.

  • This makes the training both slow and arguably

suboptiomal. Solution: Formulate CRF as an Recurrent Neural Network

Zheng et al., Conditional Random Fields as Recurrent Neural Networks, ICCV 2015

slide-53
SLIDE 53

Replacin ing CRF wit ith an RNN

59 CV3DST | Prof. Leal-Taixé

RNN that "emulates" a CRF

Zheng et al., Conditional Random Fields as Recurrent Neural Networks, ICCV 2015

slide-54
SLIDE 54

CR CRF-RNN: qualit itativ ive results

61 CV3DST | Prof. Leal-Taixé

slide-55
SLIDE 55

CR CRF-RNN: qualit itativ ive results

62 CV3DST | Prof. Leal-Taixé

slide-56
SLIDE 56
  • To properly localize the masks, i.e., get the contours

correctly

  • We need to process information at the original

(image) resolution for this. We need to look at the

  • pixels. à CRF is conditioned on the RGB image.
  • What if we use attention?

63 CV3DST | Prof. Leal-Taixé

Wh Why do do we e need eed the e CRF RF?

slide-57
SLIDE 57

At Attent ntio ion

slide-58
SLIDE 58

The The proble lem

  • For very long sentences, the score for machine

translation really goes down after 30-40 words.

  • Prof. Leal-Taixé and Prof. Niessner

Bahdanau et al 2014. Neural machine translation by jointly learning to align and translate.

With attention Performance degradation

65

slide-59
SLIDE 59

Bas Basic structure e of

  • f a

a RN RNN

  • We want to have notion of “time” or “sequence”
  • Prof. Leal-Taixé and Prof. Niessner

[Christopher Olah] Understanding LSTMs

Hidden state input Previous hidden state

66

slide-60
SLIDE 60

Bas Basic structure e of

  • f a

a RN RNN

  • We want to have notion of “time” or “sequence”
  • Prof. Leal-Taixé and Prof. Niessner

Hidden state Parameters to be learned

67

slide-61
SLIDE 61

Bas Basic structure e of

  • f a

a RN RNN

  • We want to have notion of “time” or “sequence”
  • Prof. Leal-Taixé and Prof. Niessner

Hidden state Same parameters for each time step = generalization! Output

68

slide-62
SLIDE 62

Bas Basic structure e of

  • f a

a RN RNN

  • Unrolling RNNs
  • Prof. Leal-Taixé and Prof. Niessner

[Christopher Olah] Understanding LSTMs

Hidden state is the same

69

slide-63
SLIDE 63

Bas Basic structure e of

  • f a

a RN RNN

  • Unrolling RNNs
  • Prof. Leal-Taixé and Prof. Niessner

70

slide-64
SLIDE 64

Lo Long ng-te term depend ndenci ncies

  • Prof. Leal-Taixé and Prof. Niessner

I mo moved to Germany any … so I speak German an fluently

71

slide-65
SLIDE 65

Atte Attenti ntion: n: intu ntuiti tion

  • Prof. Leal-Taixé and Prof. Niessner

I mo moved to Germany any … so I speak German an fluently

ATTENTION: Which hidden states are more important to predict my output?

72

slide-66
SLIDE 66

Atte Attenti ntion: n: intu ntuiti tion

  • Prof. Leal-Taixé and Prof. Niessner

Context

I mo moved to Germany any … so I speak German an fluently

αt,t+1 αt+1,t+1 α1,t+1

73

slide-67
SLIDE 67

Atte Attenti ntion: n: archi hite tectu ture

  • A decoder processes

the information

  • Decoders take as

input:

– Previous decoder hidden state – Previous output – Attention

  • Prof. Leal-Taixé and Prof. Niessner

D D D

Context

αt,t+1 αt+1,t+1

74

slide-68
SLIDE 68

Atte Attenti ntion

  • indicates how much the word in the position

is important to translate the work in position

  • The context aggregates the attention
  • So

Soft ft attention: All attention masks alpha sum up to 1

  • Prof. Leal-Taixé and Prof. Niessner

α1,t+1 t + 1 + 1 ct+1 =

t+1

X

k=1

αk,t+1ak

75

slide-69
SLIDE 69

Comp Computin ing the e atten ention ion ma mask

  • We can train a small neural network
  • Normalize
  • Prof. Leal-Taixé and Prof. Niessner

NN

a1 dt

Hidden state of the encoder Previous state of the decoder

f1,t+1 α1,t+1 = expf1,t+1 Pt+1

k=1 expfk,t+1

76

slide-70
SLIDE 70

Attentio ion for semantic ic segmentatio ion

77 CV3DST | Prof. Leal-Taixé

The attention model learns to put different weights on objects

  • f different scales.

For example, the model learns to put large weights on the small-scale person (green dashed circle) for features from scale = 1, and large weights on the large-scale child (magenta dashed circle) for features from scale = 0.5. We jointly train the network component and the attention model.

Chen et al., Attention to Scale: Scale-aware Semantic Image Segmentation, CVPR 2016

slide-71
SLIDE 71
  • Do we even need these blocks which include

the global information (CRF, RNN, attention)? Spoiler alert: Not neccesarly.

78 CV3DST | Prof. Leal-Taixé

slide-72
SLIDE 72

Deep DeepLab abv3+

79 CV3DST | Prof. Leal-Taixé

Combine atrous convolutions and spatial pyramid pooling with an encoder-decoder module.

slide-73
SLIDE 73

Delvin ing deeper in into DeepLabv3+

80 CV3DST | Prof. Leal-Taixé

1) Encode multi-scale contextual information by applying atrous convolution at multiple scales

slide-74
SLIDE 74

Delvin ing deeper in into DeepLabv3+

81 CV3DST | Prof. Leal-Taixé

1) Encode multi-scale contextual information by applying atrous convolution at multiple scales 2) Refine the segmentation results along object boundaries.

slide-75
SLIDE 75

Delvin ing deeper in into DeepLabv3+

82 CV3DST | Prof. Leal-Taixé

1) Encode multi-scale contextual information by applying atrous convolution at multiple scales 2) Refine the segmentation results along object boundaries. 3) Use depthwise separable convolutions.

slide-76
SLIDE 76

De Depth-wi wise se separable conv nvolutions ns

83 CV3DST | Prof. Leal-Taixé

Normal convolutions act on all channels.

slide-77
SLIDE 77

De Depth-wi wise se separable conv nvolutions ns

84 CV3DST | Prof. Leal-Taixé classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3) classtorch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3)

Filters are applied only at certain depths of the features. Normal convolutions have groups set to 1, the convolutions used in this image have groups set to 3.

slide-78
SLIDE 78

De Depth-wi wise se separable conv nvolutions ns

85 CV3DST | Prof. Leal-Taixé classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3) classtorch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3)

But the depth size is always the same!

slide-79
SLIDE 79

De Depth-wi wise se separable conv nvolutions ns

86 CV3DST | Prof. Leal-Taixé

Solution: 1x1 convs!

slide-80
SLIDE 80

But wh why?

87 CV3DST | Prof. Leal-Taixé

Or Orig igin inal al co convo volution 256 kernels of size 5x5x3 Multiplications: 256x5x5x3 x (8x8 locations) = 1.228.800

slide-81
SLIDE 81

But wh why?

88 CV3DST | Prof. Leal-Taixé

Or Orig igin inal al co convo volution 256 kernels of size 5x5x3 Multiplications: 256x5x5x3 x (8x8 locations) = 1.228.800 Dep Depth-wi wise convolution 3 kernels of size 5x5x1 Multiplications: 5x5x3 x (8x8 locations) = 4800 1x 1x1 1 convolu luti tion 256 kernels of size 1x1x3 Multiplications: 256x1x1x3x (8x8 locations) = 49152

Less computations!

slide-82
SLIDE 82

DeepLabv3+: qualit itativ ive results

89 CV3DST | Prof. Leal-Taixé

Still considered as SOTA!

Chen et al., Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, ECCV 2018

slide-83
SLIDE 83

DeepLab is amazing, , but th there are ot

  • ther im

impor

  • rtant

archi architect cture ures s to kno now. Re Recommended reads

90 CV3DST | Prof. Leal-Taixé

slide-84
SLIDE 84

Refin ineNet

92 CV3DST | Prof. Leal-Taixé

Many building blocks but the goal is the same: use convolutional layers to refine the information coming from different scales.

Lin et al., RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation, CVPR 2017

slide-85
SLIDE 85

PS PSPNe PNet

95 CV3DST | Prof. Leal-Taixé

Similar idea to RefineNet (fuse information from multiple scales), but the features here are shared (and the multi-scaling comes from pooling). The method is simpler than RefineNet and performs slightly better.

Zhao et al., Pyramid Scene Parsing Network, CVPR 2017

slide-86
SLIDE 86

98 CV3DST | Prof. Leal-Taixé

Da Data tasets sets and d me metric ics

slide-87
SLIDE 87

Datas Datasets ets

99 CV3DST | Prof. Leal-Taixé

Pascal VOC 2012: 9993 natural images divided into 20 classes. Cityscapes: 25K urban- street images divided into 30 classes. ADE20K: 25K (20 stands for 20K training) scene-parsing images divided into 150 classes. Mapillary Vistas: 25K street level images, divided into 152 classes. Models are often pre-trained in the large MS-COCO dataset, before finetuned to the specific dataset.

slide-88
SLIDE 88

Metric ics: in intersectio ion over unio ion (IoU)

100 CV3DST | Prof. Leal-Taixé

slide-89
SLIDE 89

Metric ics: in intersectio ion over unio ion (IoU)

101 CV3DST | Prof. Leal-Taixé

slide-90
SLIDE 90

Me Metrics: s: mean interse section over union (m (mIoU)

102 CV3DST | Prof. Leal-Taixé

MIoU simply computes the IoU for each class and then computes the mean of those values. Another widely used metric is the pixel accuracy (ratio of pixels classified correctly).

slide-91
SLIDE 91
  • Typically DeepLab models are considered to

be good baselines. Nevetheless, different problems might require different models (no free lunch in deep learning).

  • Don't be a hero! Before making up your own

model, use some of the SOTA models, for example the best performing model in PASCAL VOC.

103 CV3DST | Prof. Leal-Taixé

So, , what model to use?

slide-92
SLIDE 92

CV CV3DST Comp Compet etit ition ion

  • The tracking challenge is evaluated on a subset of the MOT16 test data.

(Sequences 01, 03, 08, 12)

  • The training data can be downloaded from the MOT challenge

website: https://motchallenge.net/data/MOT16/

  • The submission website

is https://adm9.in.tum.de/embed.php/prakt/cv3dst

  • You will have to sign with your matriculation number to get your account. If

you do not have a TUM matriculation number, please send a mail to dst@dvl.in.tum.de

  • Every student only has 1 ACCOUNT.
  • You are allowed to submitt 4 TIMES to the challenge. Always the most

recent submission is considered for the bonus (BE CAREFUL, YOU CAN WORSEN YOUR RESULTS)

104 CV3DST | Prof. Leal-Taixé

slide-93
SLIDE 93

CV CV3DST Comp Compet etit ition ion

  • In order to be eligible for the bonus you will need to

achieve a MOTA > Threshold (tbd)

  • Every student has to submit their own results (we will

check code and results!).

105 CV3DST | Prof. Leal-Taixé

slide-94
SLIDE 94

CV CV3DST Comp Compet etit ition ion

  • Dates:

– 15.01.20: Test set is open for submission! – 02.02.20 (midnight): Competition closes – 03.02.20 (midnight): Abstract and code submission deadline – 04.02.20: Presenters are announced – 07.02.20: Presentation of selected methods

106 CV3DST | Prof. Leal-Taixé

slide-95
SLIDE 95

Ne Next l lecture ures

  • Instance segmentation and panoptic segmentation
  • Next lecture on January 17th.

107 CV3DST | Prof. Leal-Taixé