Leveraging local neighborhood topology for large scale person - - PowerPoint PPT Presentation

▶

Jan 02, 2023 380 likes •512 views

Leveraging local neighborhood topology for large scale person re-identification Svebor Karaman 1 , Giuseppe Lisanti 1 , Andrew D. Bagdanov 2 , Alberto Del Bimbo 1 1 Media Integration and Communication Center (MICC) University of Florence,

SLIDE 1

Leveraging local neighborhood topology for large scale person re-identification

Svebor Karaman1, Giuseppe Lisanti1, Andrew D. Bagdanov2, Alberto Del Bimbo1

1Media Integration and Communication Center (MICC)

University of Florence, Florence, Italy {firstname.lastname}@unifi.it, http://www.micc.unifi.it/vim/people

2Computer Vision Center, Barcelona (CVC)

Universitat Aut`

noma de Barcelona

bagdanov@cvc.uab.es http://www.cvc.uab.es/LAMP

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 1 / 12

SLIDE 2

Person re-identification

Problem definition: identify previously seen individuals in one or more images captured from one or more cameras. Important for modern surveillance systems: a way of maintaining identity information about targets in multiple views. Difficult: changes in illumination and pose, occlusions, similarity of appearance, and changes in camera view. Research focus: mostly on features for re-identification (SDALF, CPS, HPE, etc). Recent trend: re-identification from a “learn to rank” point of view (SVR, RPLM, transfer and metric learning, etc).

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 2 / 12

SLIDE 3

Laboratory versus realistic scenarios

Standard formulation: re-identification in terms of gallery images and probe images to be re-identified. Three standard scenarios:

◮ Single-vs-Single (SvsS): exactly one example image of each person in the

gallery and at least one instance of each person in the probe set.

◮ Multi-vs-Multi (MvsM): a group of M examples of each individual in the

gallery and a group of M examples of each individual in the probe set.

◮ Multi-vs-Single (MvsS): multiple images of each person are given as groups

in the gallery, and exactly one example image of each person in the probe set.

Structured scenario (MvsM) Untructured scenario (SvsS) Gallery Test Test Gallery

Real-world: many test images, many identities, few labeled samples.

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 3 / 12

SLIDE 4

Overview of our approach

3 2 1 1 1 3 3 2 2

(a) (b) (c)

Our goal is to bring some of the advantages of structured re-identification to unstructured problems. Linear discriminants are used to weakly separate gallery individuals. A Conditional Random Field (CRF) is built on top of all available images. Inference in the CRF leverages local structure in feature space to enforce labeling consistency.

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 4 / 12

SLIDE 5

Linear SVMs for re-identification

Linear discriminant model estimated for each person given the gallery images: (wi, ·, bi) = arg min

wi,ξ,bi

1 2||wi||2 + C

ξj (1) subject to δi

l(gj)(wT i gj + bi)

≥ 1 − ξj, ∀ i ∈ {1, . . . , N} ∀ j ∈ {1, . . . , n}, and ξj ≥ 0, ∀ j ∈ {1, . . . , n}, (2) where δi

j is a modified Kronecker delta function:

δi

j =

if i = j −1

therwise.

(3) The discriminative models (wi, bi) are learned only on gallery images and are the only supervised components of our approach

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 5 / 12

SLIDE 6

Mapping a re-identification problem on a CRF

Unsupervised approach induces a local topology over all images. Combined with discriminative models, becomes a semi-supervised approach. A CRF is defined by a graph G = (V, E) and a label set L. We create one vertex in V to represent each image in the re-identification scenario (all gallery and probe images): V = {v1, v2, . . . , vn+m}. (4) The graph topology is:

◮ defined by the group structure of probe images, if given (MvsM); and ◮ induced by the nearest neighbor topology in feature space using all images.

We create an edge (vi, vj) if one is in the k-nearest neighbors of the other: E = {(vi, vj) : xi ∈ kNN(xj) ∨ xj ∈ kNN(xi)} . (5)

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 6 / 12

SLIDE 7

Mapping a re-identification problem on a CRF

Given a hypothetical labeling ˆ y = (ˆ y1, . . . , ˆ y|V |) assigning a label ˆ yi ∈ L to each vertex vi ∈ V in graph G, we define the energy function of ˆ y as: E(ˆ y) =

i∈V

φi(ˆ yi) + λ

(vi,vj)∈E

ψij(ˆ yi, ˆ yj). (6) The unary data cost φi(ˆ yi) is defined using the linear SVM models: φi(ˆ yi) = e−(wˆ

yixi+bˆ yi).

(7) The smoothness cost ψij is defined using distances in feature space: ψij(ˆ yi, ˆ yj) = ψ(ˆ yi, ˆ yj)e−

||xi−xj ||2 σ2

, (8)

◮ σ2: variance of the distances between all connected features in the graph; and ◮ ψ(ˆ

yi, ˆ yj): average distance between gallery images of identity ˆ yi and ˆ yj: ψ(ˆ yi, ˆ yj) = 1 |Gˆ

yi||Gˆ yj|

g∈Gˆ

g′∈Gˆ

||g − g′||2. (9)

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 7 / 12

SLIDE 8

Experiments

Inference in the CRF using graph cuts. All parameters (C, λ, k) estimated from data (see section 3.5). Experiments on standard structured/unstructured re-identification scenarios.

Table 1 : Characteristics of re-identification datasets.

ETHZ1 ETHZ2 ETHZ3 CAVIAR 3DPeS CMV100 Environment Outdoor Outdoor Outdoor Indoor Outdoor Indoor Cameras 1 1 1 2 8 5 Identities 83 35 28 72 191 100 Min/Avg/Max images per person 7/58/226 6/56/206 5/62/356 10/17/20 2/5/26 7/361/1245 Total images 4857 1961 1762 1220 10034 36171 Average detection size 132 × 60 135 × 63 148 × 66 81 × 34 158 × 74 224 × 75

4For 3DPeS we have 1003 labeled images and 62 796 unlabeled “anonymous” images.

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 8 / 12

SLIDE 9

Structured re-identification scenarios

Our method can efficiently solve structured scenarios like MvsM. Comparison with AHPE, CPS, HPE, SDALF and IDINF (see section 4.3 for complete references and results).

Table 2 : Comparison with the state-of-the-art on ETHZ and CAVIAR (structured). Structured ETHZ1 ETHZ2 ETHZ3 CAVIAR M = 2 5 10 2 5 10 2 5 10 2 3 5 AHPE

91
90.6
94
7

8 7.5 CPS

97.7
97.3
98
13

13 HPE 77 84 85 77 79 81 83 86.5 83

SDALF

78 90.2 89.6 85 91.6 89.6 86.5 93.7 89.6

8.3 IDINF 87 92 99 80.7 94.3 95.9 85.3 92.2 96.1

FEAT+CRF 93.5 99.4 99.6 92.3 99.1 100 98.9 100 100 50.7 65.8 85.3

SVM+CRF 95.7 99.5 99.3 93.7 99.4 100 99.6 100 98.6 60.7 76.1 93.2

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 9 / 12

SLIDE 10

Unstructured scenarios

Unstructured scenarios are much harder than structured ones. Added: Manifold Ranking (MR-Lu) and Multiple Feature Learning (MFL opt) (see section 4.3 for complete references and results).

Table 3 : Comparison with the state-of-the-art on ETHZ and CAVIAR (unstructured). Unstructured ETHZ1 ETHZ2 ETHZ3 CAVIAR M = 1 2 5 10 1 2 5 10 1 2 5 10 1 SDALF 64.8

64.4
77
AHPE
7

CPS

MR-Lu k=4 78.8

73.7
85.1
28.1

MR-Lu k=15 78.1

73.3
84.8
27.7

MFL opt.

IDINF 69.7 83.3 92.2 96.1 65.7 80.4 89.5 89.5 88.1 93.6 98.4 98

FEAT+CRF

79 89.9 96.9 98.4 76.3 87.8 95 98 85.4 92.9 99.3 99.6 27.1 SVM+CRF 84.9 92.1 97.2 98.2 78.9 89.1 94.8 97 88.3 96.9 99.6 99.5 31.7

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 10 / 12

SLIDE 11

Large-scale unstructured person re-identification

We can solve large scale (up to 30K images) re-identification problems. Performance of our approach improves when more images are available.

1000 5000 10000 0.48 0.5 0.52 0.54 0.56 0.58 Accuracy Unlabelled images FEAT SVM FEAT+CRF SVM+CRF T2 T5 T10 T20 T50 T100

(a) CMV100: one gallery image per person,

fixed number test images (T), varying unlabeled

1000 2000 5000 10000 0.4 0.45 0.5 0.55 0.6 0.65 Accuracy Unlabelled images FEAT SVM FEAT+CRF SVM+CRF M1 M2 M3

(b) 3DPeS: fixed number of gallery images (M),

varying unlabeled.

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 11 / 12

SLIDE 12

Discussion

Semi-supervised approach combining discriminative models and a CRF model

f local feature-space topology to solve re-identification problems.

Our approach can efficiently solve structured re-identification problems, but particularly excels in the case of more difficult unstructured re-identification. Our approach performs very well even in cases of re-identification of very many probe images on the basis of very few gallery images. Adding unlabeled increases the performance of our approach. Higher performance with more test images, while standard discriminative models like SVMs see their performance degrade when adding test data. Increasing test data gives a denser sampling of the manifold in feature space. Our approach makes the most out of all available data while existing approaches are usually limited to exploiting only gallery images. Benefits of stronger discriminative models or metric learning but does not come at the cost of setting aside a portion of available data for learning.

Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 12 / 12