Vision Camera Computer Network Communication Interface 3/51 O - - PowerPoint PPT Presentation

vision
SMART_READER_LITE
LIVE PREVIEW

Vision Camera Computer Network Communication Interface 3/51 O - - PowerPoint PPT Presentation

C LUSTER -B ASED D ISTRIBUTED F ACE T RACKING IN C AMERA N ETWORKS Josiah Yoder Robot Vision Lab Purdue University 9 September 2011 1/51 I NTRODUCTION C AMERA N ETWORKS 2/51 A C AMERA N ODE Vision Camera Computer Network


slide-1
SLIDE 1

CLUSTER-BASED DISTRIBUTED FACE TRACKING IN CAMERA NETWORKS

Josiah Yoder

Robot Vision Lab — Purdue University

9 September 2011 1/51

slide-2
SLIDE 2

INTRODUCTION — CAMERA NETWORKS

2/51

slide-3
SLIDE 3

A CAMERA NODE

Network Interface Camera

Vision Communication

Computer

3/51

slide-4
SLIDE 4

OVERVIEW

Wired Networks Wireless Networks

Medeiros et al. (2008)

4/51

slide-5
SLIDE 5

OVERVIEW

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Distributed Face Tracking 5/51

slide-6
SLIDE 6

OVERVIEW

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Familiar Face Recognition Distributed Face Tracking Unconstrained Face Tracking 6/51

slide-7
SLIDE 7

CLUSTER-BASED TRACKING IN WIRELESS NETWORKS

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Familiar Face Recognition Distributed Face Tracking Unconstrained Face Tracking 7/51

slide-8
SLIDE 8

CLUSTER-BASED TRACKING IN WIRELESS NETWORKS

Developed by Medeiros et al. (2008) Addresses challenges of wireless networks:

Limited communication range Cameras tracking same target may not be able to communicate

8/51

slide-9
SLIDE 9

VISION GRAPHS & COMMUNICATION GRAPHS

Vision Graph Communication Graph

9/51

slide-10
SLIDE 10

CLUSTER LEADER ELECTION

1 2 3 4 5

10/51

slide-11
SLIDE 11

CLUSTER COALESCENCE

2 3 4

2 3 4

Coalesce

11/51

slide-12
SLIDE 12

CLUSTER FRAGMENTATION

1 2 3 1 2 3

Fragment

12/51

slide-13
SLIDE 13

TRACKING IN WIRED NETWORKS

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Familiar Face Recognition Distributed Face Tracking Unconstrained Face Tracking 13/51

slide-14
SLIDE 14

TRACKING IN WIRED NETWORKS

Vision Graph Communication Graph

14/51

slide-15
SLIDE 15

ESTABLISHING CORRESPONDING DETECTIONS

How can cameras determine they have detected the same target?

Detect objects Extract visual features Apply similarity criterion Objects “match” if criterion passes a decision threshold

Many variations in features and matching criteria

Color Histograms HoG features SIFT features Point clouds Face pose estimates Gabor jets . . .

15/51

slide-16
SLIDE 16

TRACKING GRAPHS

Edge between two cameras if their detected

  • bjects pass the matching

criterion Cameras may participate in more than one tracking graph Dynamic: Changes as

  • bjects move

Target

  • bject A

4 1 1 2 3

2 3

Target

  • bject B

4 5

5

16/51

slide-17
SLIDE 17

TRACKING GRAPHS: CLUSTER FORMATION

May have missing edges in the tracking graph Cameras can only establish correspondence through other cameras For rapid cluster formation, cameras join clusters only with immediate neighbors

Target

  • bject

1 2 3 4

1 2 3 4

17/51

slide-18
SLIDE 18

TRACKING GRAPHS: CLUSTER FORMATION

May have false edges in tracking graph Leads to a single cluster tracking multiple targets Fixed by cluster fragmentation during propagation

1 2 3

1 2 3

18/51

slide-19
SLIDE 19

A FRAMEWORK FOR MULTI-CAMERA FACE TRACKING

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Familiar Face Recognition Distributed Face Tracking Unconstrained Face Tracking 19/51

slide-20
SLIDE 20

A FRAMEWORK FOR MULTI-CAMERA FACE TRACKING

A framework for face detection, pose estimation, and tracking: Allows generic single-camera methods to be incorporated into a multi-camera method Representation of face position as a coherent 6-DOF quantity 20/51

slide-21
SLIDE 21

THE 6-DOF WORLD AND IMAGE-BASED POSES

pw = [ x,y,z , θ,φ,ψ ]T = [ xT

w,

rT

w

]T pi = [ u,v,s , α,β,γ, ]T = [ xT

i ,

rT

i

]T 21/51

slide-22
SLIDE 22

Face Detection

Camera 1 Camera 2 Camera N Face Detection Method Orientation Estimation Method

Face Detection Face Detection

Coordinating Agent

Orient. Estimation Orient. Estimation Orient. Estimation Evidence Integration Kalman Filtering Observation Matching

slide-23
SLIDE 23

TRANSFORMATION OF POSITION AND ROTATION

We can transform observations from image to world coordinates, through an invertible function f pw = f(pi) pw = xw rw

  • =
  • fx(u,v,s)

fr(u,v,α,β,γ)

  • xw =

  x y z   rw =   θ φ ψ   fx(u,v,s) – [Iwaki et al. 2008] fr(u,v,α,β,γ) – [Murphy-Chutorian and Trivedi 2008] 23/51

slide-24
SLIDE 24

TRANSFORMATION OF POSITION

(u,v,1) (0,0,0)

xw = fx(xi) = wRc Ks s ˆ dc

  • + wtc

ˆ dc = (uˆ ic +vˆ jc + ˆ kc)/(

  • u2 +v2 +1)

24/51

slide-25
SLIDE 25

TRANSFORMATION OF ROTATION

rw = fr(xi,ri) = [wRc cRi [ri]3×3]3×1

wRc — Camera Rotation cRi — Murphy-Chutorian & Trivedi Rotation

[]3×3 — Conversion to Rotation Matrix []3×1 — Conversion to Roll-Pitch-Yaw MURPHY-CHUTORIAN & TRIVEDI ROTATION (2008) rotation about the axis ˆ kc × ˆ dc by the angle cos−1(ˆ kc · ˆ dc) 25/51

slide-26
SLIDE 26

UNCERTAINTY MODELING

We represent observations as Gaussian distributions Image-based observation: pi ∼ N (pi,Cp,i) World observation: pw ∼ N (pw,Cp,w) Transform from pi to pw using the Unscented Transform 26/51

slide-27
SLIDE 27

COMPARING OBSERVATIONS

We compute the distance between jth and kth observation using the Mahalanobis distance d(pj

w,pk w) = (pj w −pk w)T(Cj p,w +Ck p,w)−1(pj w −pk w)

Distributed: We declare two observations consistent if d(pj

w,pk w) < T

Centralized: We employ a feature clustering algorithm based on the distance d(pj

w,pk w)−Tclique

27/51

slide-28
SLIDE 28

INTEGRATING OBSERVATIONS

We use a minimum-variance estimator to integrate world observations into a more accurate estimate E[ˆ pw] = (Cov[ˆ pw]) ∑

pk

w∈E

  • Ck

p,w

−1 pk

w

Cov[ˆ pw] =

pk

w∈E

  • Ck

p,w

−1 −1 28/51

slide-29
SLIDE 29

Face Detection

Camera 1 Camera 2 Camera N Face Detection Method Orientation Estimation Method

Face Detection Face Detection

Coordinating Agent

Orient. Estimation Orient. Estimation Orient. Estimation Evidence Integration Kalman Filtering Observation Matching

slide-30
SLIDE 30

DISTRIBUTED CLUSTER-BASED FACE POSE TRACKING

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Familiar Face Recognition Distributed Face Tracking Unconstrained Face Tracking 30/51

slide-31
SLIDE 31

DISTRIBUTED CLUSTER-BASED FACE POSE TRACKING

Here we combine the two systems Multi-camera face pose tracking framework Cluster-based tracking protocol 31/51

slide-32
SLIDE 32

SYSTEM ARCHITECTURE

Clustering Module 1 Clustering Module 2 Clustering Module k ... Object Manager Matching Module Integration Module 6-DOF Face Pose Estimation Network Camera Node 1 Camera Node 2 Camera Node N

...

32/51

slide-33
SLIDE 33

FACE POSE AS IDENTIFYING FEATURE

Cluster-based protocol makes use of a feature to distinguish targets Real-time unconstrained face recognition not available We use current face pose for this feature 33/51

slide-34
SLIDE 34

EXPERIMENTS

270 280 290 300

slide-35
SLIDE 35

EXPERIMENTS

100 200 300 400 500 50 100 150 200 X (cm) frames 100 200 300 400 500 50 100 150 200 Y (cm) frames 100 200 300 400 500 50 100 150 200 Z (cm) frames 100 200 300 400 500 −40 −20 20 40 ψ (degrees) frames 100 200 300 400 500 −40 −20 20 40 φ (degrees) frames 100 200 300 400 500 −40 −20 20 40 θ (degrees) frames

slide-36
SLIDE 36

EXPERIMENTS

Comparison with a centralized system Both use same synchronization of collaboration period use 6-DOF face pose estimation framework detect faces in the individual cameras In centralized system 6-DOF observations are sent to a central server Correspondences are computed based on all pairwise matches 36/51

slide-37
SLIDE 37

EXPERIMENTS

Distributed Centralized

50 100 150 200 250 300 350 400 450 500 50 100 150 200 X (cm) frames 50 100 150 200 250 300 350 400 450 500 50 100 150 200 X (cm) frames

37/51

slide-38
SLIDE 38

EXPERIMENTS

TP FP rmseT(cm) rmseR(◦) Centralized 95 (95%) 12 (12%) 5.8 20.8 Distributed 94 (94%) 4 (4%) 6.1 18.7 38/51

slide-39
SLIDE 39

FACE RECOGNITION

Cluster-based protocol can also be used for other activities Here, we perform distributed face recognition Face recognition useful for multi-camera tracking

Associate observations from multiple cameras Associate multiple tracks with the same person Restore lost tracks Many other applications

Each camera performs PCA face recognition

Project face images into PCA space Select nearest neighbor from training set (gallery) of faces Send vote for that person to cluster leader

Cluster leader counts votes and declares overall winner 39/51

slide-40
SLIDE 40

EXPERIMENTS: DISTRIBUTED FACE RECOGNITION

Tracking Recognition TP / FP TP / FP 92.4% / 7.6% 87.9% / 9.9% Completely distributed:

No central server, single point of failure Scalable — Load on each link or node does not increase with network size

Only using frontal faces . . . uncommon in camera networks 40/51

slide-41
SLIDE 41

HUMAN FAMILIARITY-BENCHMARKED FACE RECOGNITION DATABASE

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Familiar Face Recognition Distributed Face Tracking Unconstrained Face Tracking 41/51

slide-42
SLIDE 42

UNCONSTRAINED FACE RECOGNITION

Face recognition useful for multi-camera tracking Current algorithms poor for unconstrained face images

Low resolution Varying pose

Human performance is still significantly better for unconstrained images How to compare algorithms with the best human performance? 42/51

slide-43
SLIDE 43

BENCHMARKS FOR UNCONSTRAINED FACE RECOGNITION

The best human performance: “familiar” face recognition Unfair comparison?

People have prior knowledge unavailable to algorithms Memories from previous encounters Emotions, social relationships, etc.

Can use same prior knowledge:

Videos of people to be pictured in the testing phase

People can gain some sort of “familiarity” through watching the videos People can gain more familiarity if they chat while watching the videos (Bruce et al., 2001) 43/51

slide-44
SLIDE 44

N-VIEWER FAMILIARITY BENCHMARK

Familiarization: (training)

N people watch videos together

Testing:

Each person performs the face matching recognition task separately

Questions for familiarization:

1

Which of your friends does he/she look like?

2

What sports or hobbies might he/she like?

3

What actor/actress or politician does he/she look like?

4

What might his/her major be?

5

Make a nickname for him/her.

6

Describe his/her personality.

Here, we use only 1-viewer familiarization. 44/51

slide-45
SLIDE 45

1-VIEWER BENCHMARKED DATABASE

45/51

slide-46
SLIDE 46

DATABASE

46/51

slide-47
SLIDE 47

1-VIEWER BENCHMARKED DATABASE

20 subjects tested for recognition matching ability Unfamiliar or 1-Viewer Familiar Test Familiar Test 47/51

slide-48
SLIDE 48

BENCHMARKED PERFORMANCE

% correct, mean (std) Unfamiliar 54 (20) 1-Viewer Familiar 54 (17) Previously Familiar 80 (19) 1-Viewer Familiarity does not improve performance Previous Familiarity does, even in challenging low-resolution images 48/51

slide-49
SLIDE 49

CONCLUSIONS

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Familiar Face Recognition Distributed Face Tracking Unconstrained Face Tracking 49/51

slide-50
SLIDE 50

FUTURE WORK

Wired Networks Wireless Networks

Medeiros et al. (2008)

Face Pose Tracking Familiar Face Recognition Distributed Face Tracking Unconstrained Face Tracking 50/51

slide-51
SLIDE 51

FUTURE WORK

Cluster-based tracking in wired camera networks Coalescence and fragmentation without propagation 6-DOF face pose tracking Use local roll-pitch-yaw axes centered around current estimate

  • r “bootstrapped” estimate

Familiar face recognition Unconstrained face pose tracking in camera networks 51/51

slide-52
SLIDE 52

FOR FURTHER READING I

  • H. Iwaki, G. Srivastava, A. Kosaka, J. Park, and A. Kak.

A novel evidence accumulation framework for robust multi-camera person detection. In Proceedings of the ACM/IEEE International Conference on Distributed Smart Cameras, pages 1–10, 2008.

  • H. Medeiros, J. Park, and A. Kak.

Distributed object trackng using a cluster-based Kalman filter in wireless camera networks. In IEEE Journal of Selected Topics in Signal Processing, volume 2, pages 448–463, 2008. 52/51

slide-53
SLIDE 53

FOR FURTHER READING II

  • E. Murphy-Chutorian and M. Trivedi.

3d tracking and dynamic analysis of human head movements and attentional targets. Proceedings of the International Conference on Distributed Smart Cameras (ICDSC’08), 2008. 53/51

slide-54
SLIDE 54

PRIOR WORK

Face Detection, Pose Estimation, and Tracking Face Recognition Cluster-Based Tracking in Camera Networks 54/51

slide-55
SLIDE 55

PRIOR WORK: FACE POSE TRACKING

To track face pose, we need to: Detect faces Estimate face pose Track face pose 55/51

slide-56
SLIDE 56

PRIOR WORK: FACE DETECTION

Component-based detection

[Heisele et al. 2001]

Color-based detection Scanning window methods

Heisele, B.; Serre, T.; Pontil, M. & Poggio, T. “Component-based face detection”, Proc. CVPR, pp. 657–662. 2001.

56/51

slide-57
SLIDE 57

PARAMETERS DETERMINED BY FACE DETECTION METHODS

Every face detection method determines

Face position Face size Face rotation (approximately)

For example, for face size can be estimated:

from component distance from color blob size from window size

[Heisele et al. 2001]

57/51

slide-58
SLIDE 58

PRIOR WORK: SINGLE-CAMERA POSE ESTIMATION

There are many pose estimation methods as well [Survey: Murphy-Chutorian and Trivedi 2009]

Detector arrays Regression methods Deformable Models etc.

Methods estimate the face rotation

1–, 2–, or 3–Degrees of Freedom (DOF) Often analyze cropped face image and ignore the rest of the image Rotation as if cropped image were at center of camera

58/51

slide-59
SLIDE 59

PRIOR WORK: SINGLE-CAMERA FACE TRACKING

Pose tracking methods find a face iteratively based on the location in the previous frame Some of these methods analyze the cropped face image in image-based coordinates

Active Appearance Models Appearance-template particle filters

59/51

slide-60
SLIDE 60

PRIOR WORK: MULTI-CAMERA FACE POSE ESTIMATION

[Murphy-Chutorian and Trivedi, 2008] [Iwaki et al. 2008] 60/51

slide-61
SLIDE 61

PRIOR WORK: CLUSTER-BASED TRACKING IN SENSOR NETWORKS

General Sensor Networks:

Clusters used to facilitate data aggregation, e.g. for sensor monitoring

Tracking:

A cluster is dedicated to tracking a single target Cameras may participate in multiple clusters, track multiple targets

Tracking protocols and systems:

Zhang and Cao (2004) organize clusters as Dynamic Convoy Trees Blum et al. (2003) avoid creating multiple leaders tracking the same target using multi-hop communication

No works prior to Medeiros et al. (2008) methods take into account the directional nature of camera sensors. 61/51

slide-62
SLIDE 62

CLUSTER-BASED COMMUNICATION PROTOCOL: CLUSTER LEADER ELECTION

62/51

slide-63
SLIDE 63

CLUSTER LEADER ELECTION

1 2 3 4 5

63/51

slide-64
SLIDE 64

CLUSTER LEADER ELECTION

1 2 5 1 2 3 4 5

1 2 3 4

2 3 4

5

1 2 5 2 3 4

64/51

slide-65
SLIDE 65

CLUSTER LEADER ELECTION

1 1

1 2 3 4 5

1 3 4 3 4

65/51

slide-66
SLIDE 66

CLUSTER LEADER ELECTION

1 1

1 2 3 4 5

1 3 3

66/51

slide-67
SLIDE 67

CLUSTER LEADER ELECTION

67/51

slide-68
SLIDE 68

CLUSTER PROPAGATION (1)

68/51

slide-69
SLIDE 69

CLUSTER PROPAGATION (2)

69/51

slide-70
SLIDE 70

UNSCENTED TRANSFORMATION

We assume that Cp,i is diagonal We take a set of 2N sigma-points ˙ pk

i in image-based

coordinates ˙ pk

i = pi +

√ Nσkek k = 1,...,N ˙ pk+N

i

= pi + √ Nσkek k = 1,...,N 70/51

slide-71
SLIDE 71

UNSCENTED TRANSFORMATION

Then transform each sigma point into world coordinates ˙ pk

w = f(˙

pk

i )

And compute the mean and covariance of the points in the world space pw = 1 2N

2N

k=1

˙ pk

w

Cp,w = 1 2N

2N

k=1

˙ pk

w −pw

˙ pk

w −pw

T 71/51