DM2C: Deep Mixed-Modal Clustering Yangbangyan Jiang, Qianqian Xu, - - PowerPoint PPT Presentation

dm2c deep mixed modal clustering
SMART_READER_LITE
LIVE PREVIEW

DM2C: Deep Mixed-Modal Clustering Yangbangyan Jiang, Qianqian Xu, - - PowerPoint PPT Presentation

DM2C: Deep Mixed-Modal Clustering Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang , Xiaochun Cao, Qingming Huang Institute of Information Engineering, CAS University of Chinese Academy of Sciences Institute of Computing Technology, CAS Key Lab. of


slide-1
SLIDE 1

DM2C: Deep Mixed-Modal Clustering

Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang

Institute of Information Engineering, CAS University of Chinese Academy of Sciences Institute of Computing Technology, CAS Key Lab. of BDKM, CAS Peng Cheng Lab.

slide-2
SLIDE 2

Why multiple modalities?

Ubiquitous multi-modal data

  • The related information among multiple modalities helps us to understand the data.

1

slide-3
SLIDE 3

Supervised Learning under Multiple Modalities

  • Supervision comes from class labels and modality pairing.
  • Modality pairing: a sample in modality A and another sample in modality B represent the

same instance.

  • Manual annotations: expensive and laborious.

When involving multiple modalities, the labeling is even more complicated than that for single modal data.

  • We turn to unsupervised learning under multiple modalities since it works without data

labels.

2

slide-4
SLIDE 4

Mixed-modal Setting: Fully-unsupervised Learning

  • Traditional unsupervised multi-modal learning still requires extra pairing information among

modalities for feature alignment.

  • E.g., partial modality pairing, ‘must/cannot link’ constraints, co-occurrence frequency...
  • Mixed-modal data: each instance is represented in only one modality.

Figure 1: Examples of multi-modal and mixed-modal data with two modalities.

3

slide-5
SLIDE 5

Mixed-modal Clustering: The Goal

  • Dataset D = {xi}n

i=1 mixed from two modalities.

  • D → {x(a)

i

}na

i=1 ∪ {x(b) j

}nb

j=1, where n = na + nb.

  • Mixed-modal clustering aims at learning unifjed representations for the modalities and

then grouping the samples into k categories.

4

slide-6
SLIDE 6

How to Learn Unifjed Representations?

Choice 1: learn a joint semantic space for all the modalities

  • hard to fjnd the correlation among all the modalities when pairing information is not available

Choice 2: learn the translation across the modalities

  • easy to obtain the cross-modal mappings under the guidance of cycle-consistency
  • modality unifying: transforming all the samples into a specifjc modality space

5

slide-7
SLIDE 7

Framework: Overview

Figure 2: Overview of the proposed method.

Modules

  • Modality-specifjc auto-encoders: to learn latent representations for each modality.
  • Cross-modal generators: to learn mappings across modalities with unpaired data.
  • Discriminators: to distinguish whether a sample is mapped from other modality spaces.

6

slide-8
SLIDE 8

Framework: Module I

Modality-specifjc auto-encoders Latent representations for each modality are learned by single-modal data reconstruction: LA

rec(ΘAEA) = ∥x(a) i

− DecA(EncA(x(a)

i

))∥2

2,

LB

rec(ΘAEB) = ∥x(b) i

− DecB(EncB(x(b)

i

))∥2

2.

(1)

7

slide-9
SLIDE 9

Framework: Module II

Cross-modal generators Mappings across modalities are constrained by cycle-consistency: LA

cyc(ΘGAB, ΘGBA) = Eza∼XA [∥za − GBA(GAB(za))∥1] ,

LB

cyc(ΘGAB, ΘGBA) = Ezb∼XB [∥zb − GAB(GBA(zb))∥1] .

(2) Generators: produce fake samples that are transformed from other modalities rather than

  • riginally lying in a specifjc modality space.

8

slide-10
SLIDE 10

Framework: Module III

Discriminators Discriminators: distinguish whether a sample is mapped from other modality spaces. Games between generators and discriminators: LA

adv(ΘGBA, ΘDA) = Eza∼XA[DA(za)] − Ezb∼XB[DA(GBA(zb))],

LB

adv(ΘGAB, ΘDB) = Ezb∼XB[DB(zb)] − Eza∼XA[DB(GAB(za))].

(3)

9

slide-11
SLIDE 11

Framework: Objective Function

Objective Function min

ΘGAB,ΘGBA ΘAEA,ΘAEB

max

ΘDA,ΘDB

LA

adv + LB adv + λ1(LA cyc + LB cyc) + λ2(LA rec + LB rec)

(4)

10

slide-12
SLIDE 12

Thank You for Your Attention!

See you at the poster session! Wed Dec 11th 10:45AM – 12:45PM @ East Exhibition Hall B+C #63

11