Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv - - PowerPoint PPT Presentation

wit ith im image clu lustering
SMART_READER_LITE
LIVE PREVIEW

Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv - - PowerPoint PPT Presentation

Unsuperv rvised Learning Jointly Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia Tech 1 https://filebox.ece.vt.edu/~jw2yang/ 2 Huge amount of images!!! 3 Huge amount of images!!! Learning


slide-1
SLIDE 1

Unsuperv rvised Learning Jointly Wit ith Im Image Clu lustering

Vir irgin inia ia Tech

Jianwei Yang Devi Parikh Dhruv Batra

https://filebox.ece.vt.edu/~jw2yang/

1

slide-2
SLIDE 2

2

slide-3
SLIDE 3

Huge amount of images!!!

3

slide-4
SLIDE 4

Huge amount of images!!! Learning without annotation efforts

4

slide-5
SLIDE 5

Huge amount of images!!! Learning without annotation efforts What we need to learn?

5

slide-6
SLIDE 6

An open problem

Huge amount of images!!! Learning without annotation efforts What we need to learn?

6

slide-7
SLIDE 7

An open problem A hot problem

Huge amount of images!!! Learning without annotation efforts What we need to learn?

7

slide-8
SLIDE 8

Various methodologies An open problem A hot problem

Huge amount of images!!! Learning without annotation efforts What we need to learn?

8

slide-9
SLIDE 9

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

9

slide-10
SLIDE 10

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

K-means (Image Credit: Jesse Johnson) 10

slide-11
SLIDE 11

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

K-means (Image Credit: Jesse Johnson) Hierarchical Clustering 11

slide-12
SLIDE 12

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

K-means (Image Credit: Jesse Johnson) Spectral Clustering Manor et al, NIPS’04 Hierarchical Clustering 12

slide-13
SLIDE 13

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

K-means (Image Credit: Jesse Johnson) Spectral Clustering Manor et al, NIPS’04 Hierarchical Clustering Graph Cut Shi et al, TPAMI’00 13

slide-14
SLIDE 14

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

K-means (Image Credit: Jesse Johnson) DBSCAN, Ester et al, KDD’96 (Image Credit: Jesse Johnson) Spectral Clustering Manor et al, NIPS’04 Hierarchical Clustering Graph Cut Shi et al, TPAMI’00 14

slide-15
SLIDE 15

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

K-means (Image Credit: Jesse Johnson) DBSCAN, Ester et al, KDD’96 (Image Credit: Jesse Johnson) Spectral Clustering Manor et al, NIPS’04 Hierarchical Clustering Graph Cut Shi et al, TPAMI’00 EM Algorithm, Dempster et al, JRSS’77 15

slide-16
SLIDE 16

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

K-means (Image Credit: Jesse Johnson) DBSCAN, Ester et al, KDD’96 (Image Credit: Jesse Johnson) Spectral Clustering Manor et al, NIPS’04 Hierarchical Clustering Graph Cut Shi et al, TPAMI’00 EM Algorithm, Dempster et al, JRSS’77 NMF, Xu et al, SIGIR‘03 (Image Credit: Conrad Lee) 16

slide-17
SLIDE 17

Learning distribution (structure) Sub-space Analysis

PCA (Image Credit: Jesse Johnson) ICA (Image Credit: Shylaja et al) tSNE, Maaten et al, JMLR’08 Subspace Clustering, Vidal et al. Sparse coding, Olshausen et al. Vision Research’97 17

slide-18
SLIDE 18

Learning representation (feature)

Yoshua Bengio, Aaron Courville, and Pierre Vincent. "Representation learning: A review and new perspectives." IEEE Transactions on Pattern Analysis and Machine Intelligence. 35.8 (2013): 1798-1828.

Autoencoder, Hinton et al, Science’06 (Image Credit: Jesse Johnson) DBN, Hinton et al, Science’06 DBM, Salakhutdinov et al, AISTATS’09 Bengio et al, TPAMI’13 18

slide-19
SLIDE 19

Learning representation (feature)

VAE, Kingma et al, arXiv’13 (Image Credit: Fast Forward Labs) GAN, Goodfellow et al, NIPS’14 DCGAN, Radford et al, arXiv’15 (Image Credit: Mike Swarbrick Jones) 19

slide-20
SLIDE 20

Most Recent CV Works

Spatial context, Doersch et al, ICCV’15 Temporal context, Wang et al, ICCV’15 Solving Jigsaw, Noroozi et al, ECCV’16 Context Encoder, Deepak et al, CVPR’16 Ego-motion, Jayaraman et al, ICCV’15

20

slide-21
SLIDE 21

Most Recent CV Works

Visual concept clustering, Huang et al, CVPR’16 Graph constraint, Li et al, ECCV’16 TAGnet, Wang et al, SDM’16 Deep Embedding, Xie et al, ICML’16

21

slide-22
SLIDE 22

Our Work

Joint Unsupervised Learning (JULE)

  • f Deep Representations and Image Clusters

22

slide-23
SLIDE 23

Outline

  • Intuition
  • Approach
  • Experiments
  • Extensions

23

slide-24
SLIDE 24

Intuition

Meaningful clusters can provide supervisory signals to learn image representations

24

slide-25
SLIDE 25

Intuition

Meaningful clusters can provide supervisory signals to learn image representations

Good representations help to get meaningful clusters

25

slide-26
SLIDE 26

Intuition

Cluster images first, and then learn representations

26

slide-27
SLIDE 27

Intuition

Cluster images first, and then learn representations Learn representations first, and then cluster images

27

slide-28
SLIDE 28

Intuition

Cluster images and learn representations progressively Cluster images first, and then learn representations Learn representations first, and then cluster images

28

slide-29
SLIDE 29

Intuition

Good cluster Good representations Good clusters Good representations Poor clusters Poor representations

29

slide-30
SLIDE 30

Intuition

Good cluster Good representations Good clusters Good representations Poor clusters Poor representations

30

slide-31
SLIDE 31

Intuition

Good cluster Good representations Good representations Good clusters Poor clusters Poor representations

31

slide-32
SLIDE 32

Intuition

Good cluster Good representations Good representations Good clusters Poor clusters Poor representations

32

slide-33
SLIDE 33

Approach

  • Framework
  • Objective
  • Algorithm & Implementation

33

slide-34
SLIDE 34

Approach: Framework

Agglomerative Clustering Convolutional Neural Network

Representation Learning Agglomerative Clustering

argmin ( | , )

y

L y I  arg min ( | , ) L y I

,

argmin ( , | )

y

L y I

34

slide-35
SLIDE 35

Approach: Framework

Convolutional Neural Network Agglomerative Clustering

arg min ( | , ) L y I

 argmin ( | , )

y

L y I 

35

slide-36
SLIDE 36

Approach: Recurrent Framework

36

slide-37
SLIDE 37

Approach: Recurrent Framework

37

slide-38
SLIDE 38

Approach: Recurrent Framework

38

slide-39
SLIDE 39

Approach: Recurrent Framework

39

slide-40
SLIDE 40

Approach: Recurrent Framework

40

slide-41
SLIDE 41

Approach: Recurrent Framework

41

slide-42
SLIDE 42

Approach: Recurrent Framework

Backward at each time-step is time-consuming and prone to over-fitting!

42

slide-43
SLIDE 43

Approach: Recurrent Framework

How about updating once for multiple time-steps?

Backward at each time-step is time-consuming and prone to over-fitting!

43

slide-44
SLIDE 44

Approach: Recurrent Framework

Partially Unrolling: divide all T time-steps into P periods

In each period, we merge clusters for multiple times and update CNN parameters at the end of period

44

slide-45
SLIDE 45

Approach: Recurrent Framework

Partially Unrolling: divide all T time-steps into P periods

In each period, we merge clusters for multiple times and update CNN parameters at the end of period

45

slide-46
SLIDE 46

Approach: Recurrent Framework

In each period, we merge clusters for multiple times and update CNN parameters at the end of period

Partially Unrolling: divide all T time-steps into P periods

P is determined by a hyper-parameter will be introduced later

46

slide-47
SLIDE 47

Approach: Objective Function

Overall loss:

,

argmin ( , | )

y

L y I

argmin ( | , )

y

L y I  arg min ( | , ) L y I

47

slide-48
SLIDE 48

Approach: Objective Function

Loss at time-step t:

Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy

48

slide-49
SLIDE 49

Approach: Objective Function

Loss at time-step t:

Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy Affinity measure

49

slide-50
SLIDE 50

Approach: Objective Function

Loss at time-step t:

Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy i-th cluster

50

slide-51
SLIDE 51

Approach: Objective Function

Loss at time-step t:

Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy K_c nearest neighbor clusters of i-th cluster

51

slide-52
SLIDE 52

Approach: Objective Function

Loss at time-step t:

Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy Affinity between i-th cluster and its NN

52

slide-53
SLIDE 53

Approach: Objective Function

Loss at time-step t:

Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy Affinity between i-th cluster and its NN Differences between two cluster affinities

53

slide-54
SLIDE 54

Approach: Objective Function

Loss at time-step t:

Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy Affinity between i-th cluster and its NN Differences between two cluster affinities Merge these two clusters

54

slide-55
SLIDE 55

Approach: Objective Function

Loss at time-step t:

Conventional Agg. Clustering Strategy Proposed Agg. Clustering Strategy Affinity between i-th cluster and its NN Differences between two cluster affinities Merge these two clusters

55

slide-56
SLIDE 56

Approach: Objective Function

Loss in forward pass in period p (merge clusters): Loss in forward pass in period p (merge clusters):

56

slide-57
SLIDE 57

Approach: Objective Function

Loss in forward pass in period p (merge clusters): Loss in forward pass in period p (merge clusters):

57

slide-58
SLIDE 58

Approach: Objective Function

Loss in forward pass in period p (merge clusters): Loss in forward pass in period p (merge clusters):

CNN parameters are fixed

58

slide-59
SLIDE 59

Approach: Objective Function

Loss in forward pass in period p (merge clusters): Loss in forward pass in period p (merge clusters):

CNN parameters are fixed Cluster labels are fixed

59

slide-60
SLIDE 60

Approach: Objective Function

Forward Pass: Simple Greedy Algorithm Merge two clusters which minimize the loss at each time step

60

slide-61
SLIDE 61

Approach: Objective Function

Forward Pass: Simple Greedy Algorithm Merge two clusters which minimize the loss at each time step

61

slide-62
SLIDE 62

Approach: Objective Function

Forward Pass: Simple Greedy Algorithm Merge two clusters which minimize the loss at each time step

62

slide-63
SLIDE 63

Approach: Objective Function

Forward Pass: Simple Greedy Algorithm Merge two clusters which minimize the loss at each time step

63

slide-64
SLIDE 64

Approach: Objective

Backward Pass:

64

slide-65
SLIDE 65

Approach: Objective

Backward Pass:

Consider all previous periods

65

slide-66
SLIDE 66

Approach: Objective

Backward Pass:

Cluster based loss is not proper for batch optimization!!!

Consider all previous periods

66

slide-67
SLIDE 67

Approach: Objective

Backward Pass:

Cluster based loss is not proper for batch optimization!!!

Consider all previous periods

Approximation:

67

slide-68
SLIDE 68

Approach: Objective

Backward Pass:

Convert to sample-based loss:

Consider all previous periods

Intra-sample affinity Inter-sample affinity

Recall cluster-based loss:

68

slide-69
SLIDE 69

Approach: Objective

Backward Pass:

Convert to sample-based loss:

Consider all previous periods

Intra-sample affinity Inter-sample affinity

Recall cluster-based loss:

Weighted triplet loss

69

slide-70
SLIDE 70

Approach: Algorithm & Implementation

70

slide-71
SLIDE 71

Approach: Algorithm & Implementation

Raw image data

71

slide-72
SLIDE 72

Approach: Algorithm & Implementation

Raw image data Assume it is known

72

slide-73
SLIDE 73

Approach: Algorithm & Implementation

Raw image data Assume it is known Randomly initialize CNN parameters 4 samples in each cluster in average

73

slide-74
SLIDE 74

Approach: Algorithm & Implementation

Raw image data Assume it is known Randomly initialize CNN parameters 4 samples in each cluster in average Train CNN for about 20 epochs

74

slide-75
SLIDE 75

Approach: Algorithm & Implementation

Raw image data Assume it is known Randomly initialize CNN parameters 4 samples in each cluster in average Train CNN for about 20 epochs We can go back and retrain the model, but it improve slightly

75

slide-76
SLIDE 76

Experiments

  • Datasets
  • Network Architecture
  • Image Clustering
  • Representation Learning

76

slide-77
SLIDE 77

Experiments: Datasets

MNIST (70000, 10, 28x28) USPS (11000, 10, 16x16) COIL20 (1440, 20, 128x128) COIL100 (7200, 100, 128x128) UMist (575, 20, 112x92) FRGC (2462, 20, 32x32) CMU-PIE (2856, 68, 32x32) Youtube Face (1000, 41, 55x55)

77

slide-78
SLIDE 78

Experiments: Settings

Two important parameters Set the layer numbers so that the Output feature map is about 10x10

78

slide-79
SLIDE 79

Experiments: Clustering : Performance

+6.43% on NMI to best performance of existing approaches averaged over all datasets

79

slide-80
SLIDE 80

Experiments: Clustering : Performance

+12.76% on AC to best performance of existing approaches averaged over all datasets

80

slide-81
SLIDE 81

Experiments: Clustering : Performance

Average +21.5% on NMI

81

slide-82
SLIDE 82

Experiments: Clustering : Performance

Average +25.7% on NMI

82

slide-83
SLIDE 83

Experiments: Clustering : Performance

Our clustering performance vs. that of existing clustering approaches using raw image data.

Clustering performance using our representation fed to existing clustering algorithms.

slide-84
SLIDE 84

Experiments: Clustering : Visualization

COIL-20 COIL-100

84

slide-85
SLIDE 85

Experiments: Clustering : Visualization

USPS MNIST-test

85

slide-86
SLIDE 86

Experiments: Clustering : Ablation study

86

slide-87
SLIDE 87

Experiments: Clustering : Verification

87

slide-88
SLIDE 88

Experiments: Clustering : Time Cost

88

slide-89
SLIDE 89

Experiments: Representation Learning

Testing generalization of our learnt (unsupervised) representation to LFW face verification. Evaluation on CIFAR-10 classification

Representation transfer Representation learning

89

slide-90
SLIDE 90

Extensions: Data Visualization

90

slide-91
SLIDE 91

Conclusion

  • A new unsupervised learning method jointly with image clustering,

cast the problem into a recurrent optimization problem;

  • In the recurrent framework, clustering is conducted during forward

pass, and representation learning is conducted during backward pass;

  • A unified loss function in the forward pass and backward pass;
  • Performance outperforms the state-of-the-art over a number of

datasets;

  • It can also learn plausible representations for image recognition.

91

slide-92
SLIDE 92

Thanks!

https://github.com/jwyang/joint-unsupervised-learning

92