BiANE: Bipartite Attributed Network Embedding 1 2 2 1 3 Wentao - - PowerPoint PPT Presentation

biane bipartite attributed network embedding
SMART_READER_LITE
LIVE PREVIEW

BiANE: Bipartite Attributed Network Embedding 1 2 2 1 3 Wentao - - PowerPoint PPT Presentation

BiANE: Bipartite Attributed Network Embedding 1 2 2 1 3 Wentao Huang, Yuchen Li, Yuan Fang, Ju Fan, Hongxia Yang 1 School of Information, Renmin University of China 2 School of Information System, Singapore Management University 3 Damo


slide-1
SLIDE 1

BiANE: Bipartite Attributed Network Embedding

Wentao Huang, Yuchen Li, Yuan Fang, Ju Fan, Hongxia Yang School of Information, Renmin University of China School of Information System, Singapore Management University Damo Academy, Alibaba Group

1 2 2 3 1 1 2 3

slide-2
SLIDE 2

Outline

2

q Introduction & Challenge q Methodology q Experiment q Conclusion & Future Work

slide-3
SLIDE 3

q Bipartite Attributed Network

ü

E-Commerce Websites

ü

Recommendation System

ü

Bibliometric Network Analysis

ü

Biological Community Detection

ü

Risk Assessment of Financial Systems

Introduction

3

Mario

male 18 soccer player Rome ⋯

Ricardo

male 23 soccer player Rio de Janeiro ⋯

Lisa

female 21 actress Milan ⋯

Alicia

female 22 model Paris ⋯

Spaghetti

food Barilla $8 ⋯

Makeup Palette

cosmetics YSL $32 ⋯

Cosmetic Bag

cosmetics L‘Oreal $59 ⋯

Soccer Ball

sports utility Nike $48 ⋯

slide-4
SLIDE 4

q Bipartite Attributed Network

ü

E-Commerce Websites

ü

Recommendation System

ü

Bibliometric Network Analysis

ü

Biological Community Detection

ü

Risk Assessment of Financial Systems

q Characteristics

Introduction

4

Mario

male 18 soccer player Rome ⋯

Ricardo

male 23 soccer player Rio de Janeiro ⋯

Lisa

female 21 actress Milan ⋯

Alicia

female 22 model Paris ⋯

Spaghetti

food Barilla $8 ⋯

Makeup Palette

cosmetics YSL $32 ⋯

Cosmetic Bag

cosmetics L‘Oreal $59 ⋯

Soccer Ball

sports utility Nike $48 ⋯

slide-5
SLIDE 5

q Bipartite Attributed Network

ü

E-Commerce Websites

ü

Recommendation System

ü

Bibliometric Network Analysis

ü

Biological Community Detection

ü

Risk Assessment of Financial Systems

q Characteristics

§

The Inter-Partition Proximity

Introduction

5

Mario

male 18 soccer player Rome ⋯

Ricardo

male 23 soccer player Rio de Janeiro ⋯

Lisa

female 21 actress Milan ⋯

Alicia

female 22 model Paris ⋯

Spaghetti

food Barilla $8 ⋯

Makeup Palette

cosmetics YSL $32 ⋯

Cosmetic Bag

cosmetics L‘Oreal $59 ⋯

Soccer Ball

sports utility Nike $48 ⋯

slide-6
SLIDE 6

q Bipartite Attributed Network

ü

E-Commerce Websites

ü

Recommendation System

ü

Bibliometric Network Analysis

ü

Biological Community Detection

ü

Risk Assessment of Financial Systems

q Characteristics

§

The Inter-Partition Proximity

§

The Intra-Partition Proximity

Introduction

6

Mario

male 18 soccer player Rome ⋯

Ricardo

male 23 soccer player Rio de Janeiro ⋯

Lisa

female 21 actress Milan ⋯

Alicia

female 22 model Paris ⋯

Spaghetti

food Barilla $8 ⋯

Makeup Palette

cosmetics YSL $32 ⋯

Cosmetic Bag

cosmetics L‘Oreal $59 ⋯

Soccer Ball

sports utility Nike $48 ⋯

slide-7
SLIDE 7

q Bipartite Attributed Network

ü

E-Commerce Websites

ü

Recommendation System

ü

Bibliometric Network Analysis

ü

Biological Community Detection

ü

Risk Assessment of Financial Systems

q Characteristics

§

The Inter-Partition Proximity

§

The Intra-Partition Proximity

1)

The Attribute Proximity

Introduction

7

Mario

male 18 soccer player Rome ⋯

Ricardo

male 23 soccer player Rio de Janeiro ⋯

Lisa

female 21 actress Milan ⋯

Alicia

female 22 model Paris ⋯

Spaghetti

food Barilla $8 ⋯

Makeup Palette

cosmetics YSL $32 ⋯

Cosmetic Bag

cosmetics L‘Oreal $59 ⋯

Soccer Ball

sports utility Nike $48 ⋯

slide-8
SLIDE 8

q Bipartite Attributed Network

ü

E-Commerce Websites

ü

Recommendation System

ü

Bibliometric Network Analysis

ü

Biological Community Detection

ü

Risk Assessment of Financial Systems

q Characteristics

§

The Inter-Partition Proximity

§

The Intra-Partition Proximity

1)

The Attribute Proximity

2)

The Structure Proximity

Introduction

8

Mario

male 18 soccer player Rome ⋯

Ricardo

male 23 soccer player Rio de Janeiro ⋯

Lisa

female 21 actress Milan ⋯

Alicia

female 22 model Paris ⋯

Spaghetti

food Barilla $8 ⋯

Makeup Palette

cosmetics YSL $32 ⋯

Cosmetic Bag

cosmetics L‘Oreal $59 ⋯

Soccer Ball

sports utility Nike $48 ⋯

slide-9
SLIDE 9

q Bipartite Attributed Network

ü

E-Commerce Websites

ü

Recommendation System

ü

Bibliometric Network Analysis

ü

Biological Community Detection

ü

Risk Assessment of Financial Systems

q Characteristics

§

The Inter-Partition Proximity

§

The Intra-Partition Proximity

1)

The Attribute Proximity

2)

The Structure Proximity

q Goal:

Given a bipartite attributed network G =(𝒱, 𝒲, E, 𝐘𝒱, 𝐘𝒲), we want to learn a mapping function to transform each node to a vector in a low-dimension space.

Introduction

9

Mario

male 18 soccer player Rome ⋯

Ricardo

male 23 soccer player Rio de Janeiro ⋯

Lisa

female 21 actress Milan ⋯

Alicia

female 22 model Paris ⋯

Spaghetti

food Barilla $8 ⋯

Makeup Palette

cosmetics YSL $32 ⋯

Cosmetic Bag

cosmetics L‘Oreal $59 ⋯

Soccer Ball

sports utility Nike $48 ⋯

slide-10
SLIDE 10

Technical Challenges

10

q The Attribute-Structure Correlation

§ Complementarity & Coherence

q Negative Sampling Strategy

§ Static sampling strategies can not reflect the variation of

embedding space.

§ Dynamic sampling strategies will result in the scalability issue.

Mario

male 18 soccer player Rome ⋯

Ricardo

male 23 soccer player Rio de Janeiro ⋯

Lisa

female 21 actress Milan ⋯

Alicia

female 22 model Paris ⋯

Spaghetti

food Barilla $8 ⋯

Makeup Palette

cosmetics YSL $32 ⋯

Cosmetic Bag

cosmetics L‘Oreal $59 ⋯

Soccer Ball

sports utility Nike $48 ⋯

The Structure Information

Mario

male 18 soccer player Rome ⋯ The Attribute Information

slide-11
SLIDE 11

Methodology

11 G =(!, ", , #!, #") Intra-Partition Proximity Modeling Inter-Partition Proximity Modeling

! "

! " ! "

! !

.3 .4 .1 .7 .2 .3 .8 .5 .9 .3 .9 .5 .1 .1 .3 .4 ⋯ ⋯ ⋯ ⋯

!

!

.3 .4 .1 .7 .2 .3 .8 .5 .9 .3 .9 .5 .1 .1 .3 .4 ⋯ ⋯ ⋯ ⋯

Dynamic Positive Sampling Latent Correlation Training Structure Autoencoder First-Order Proximity Modeling Attribute Autoencoder First-Order Proximity Modeling Dynamic Positive Sampling Latent Correlation Training Structure Autoencoder First-Order Proximity Modeling Attribute Autoencoder First-Order Proximity Modeling

slide-12
SLIDE 12

Example

12

q Scholar-Publication Network

SDNE SIGKDD TF-IPM SIGKDD NetClus SIGKDD H-tree VLDBJ AccDNN FCCM IFUHJ CIDR PSL ICDE J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML W Zhu THU; ML P Cui THU; ML, AI BC Ooi NUS; DB WF Wong NUS; ARCH

Publication Scholar

WW Hwu: Wen-mei W. Hwu WF Wong: Weng-Fai Wong BC Ooi: Beng Chin Ooi D Chen: Deming Chen W Zhu: Wenwu Zhu T Chen: Ting Chen Y Sun: Yizhou Sun J Han: Jiawei Han P Cui: Peng Cui TF-IPM: Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network. IFUHJ: Is FPGA Useful for Hash Joins? AccDNN: An IP-Based DNN Generator for FPGAs. PSL: Parallelizing Skip Lists for In-Memory Multi-Core Database Systems. SDNE: Structural Deep Network Embedding. H-tree: Index nesting – an efficient approach to indexing in object-oriented databases. NetClus: Ranking-Based Clustering of Heterogeneous Information Networks with Star Network Schema. Scholar Partition: Publication Partition:

  • Jiawei Han
  • Gender: Male
  • Institutions: UIUC, SFU
  • Research Interests:
  • Data Mining
  • Database Systems
  • Data Warehousing
  • Information Networks
slide-13
SLIDE 13

Intra-Partition Proximity Modeling

13

SDNE SIGKDD TF-IPM SIGKDD NetClus SIGKDD H-tree VLDBJ AccDNN FCCM IFUHJ CIDR PSL ICDE J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML W Zhu THU; ML P Cui THU; ML, AI BC Ooi NUS; DB WF Wong NUS; ARCH

Publication Scholar

J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML BC Ooi NUS; DB WF Wong NUS; ARCH

  • Jiawei Han
  • Gender: Male
  • Institutions: UIUC, SFU
  • Research Interests:
  • Data Mining
  • Database Systems
  • Data Warehousing
  • Information Networks

! = # $ + # $! + ⋯ ⋯+ # $"#$ + # $%

! "!

(#)

# $ ! "%

(#)

% $ # & # #′ !!

(#)

! (!

(#)

% % & %′ !%

(#)

! (%

(#)

⋯ ⋯ ⋯

zz

⋯ ⋯ ⋯

slide-14
SLIDE 14

Intra-Partition Proximity Modeling

14

! "

! "!

(#)

# $ ! "%

(#)

% $ # & # #′ !!

(#)

! (!

(#)

% % & %′ !%

(#)

! (%

(#)

⋯ ⋯ ⋯

zz

⋯ ⋯ ⋯

slide-15
SLIDE 15

Intra-Partition Proximity Modeling

15

! "

! "!

(#)

# $ ! "%

(#)

% $ # & # #′ !!

(#)

! (!

(#)

% % & %′ !%

(#)

! (%

(#)

⋯ ⋯ ⋯

zz

⋯ ⋯ ⋯

qCompact Feature Learning

slide-16
SLIDE 16

Intra-Partition Proximity Modeling

16

! "

! "!

(#)

# $ ! "%

(#)

% $ # & # #′ !!

(#)

! (!

(#)

% % & %′ !%

(#)

! (%

(#)

⋯ ⋯ ⋯

zz

⋯ ⋯ ⋯

qJoint Modeling — Preserving the first-order proximity

slide-17
SLIDE 17

Latent Correlation Training

17

! "

! "!

(#)

# $ ! "%

(#)

% $ # & # #′ !!

(#)

! (!

(#)

% % & %′ !%

(#)

! (%

(#)

⋯ ⋯ ⋯

zz

⋯ ⋯ ⋯

slide-18
SLIDE 18

Latent Correlation Training

18

! "

! "!

(#)

# $ ! "%

(#)

% $ # & # #′ !!

(#)

! (!

(#)

% % & %′ !%

(#)

! (%

(#)

⋯ ⋯ ⋯

zz

⋯ ⋯ ⋯

qTransform encodings to latent representations via auxiliary kernels.

slide-19
SLIDE 19

Latent Correlation Training

19

! "

! "!

(#)

# $ ! "%

(#)

% $ # & # #′ !!

(#)

! (!

(#)

% % & %′ !%

(#)

! (%

(#)

⋯ ⋯ ⋯

zz

⋯ ⋯ ⋯

qEnhance the attribute-structure correlation

slide-20
SLIDE 20

Dynamic Positive Sampling

20

SDNE SIGKDD TF-IPM SIGKDD NetClus SIGKDD H-tree VLDBJ AccDNN FCCM IFUHJ CIDR PSL ICDE J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML W Zhu THU; ML P Cui THU; ML, AI BC Ooi NUS; DB WF Wong NUS; ARCH

Publication Scholar

J Han UIUC; ML P Cui THU; ML, AI W Zhu THU; ML BC Ooi NUS; DB WF Wong NUS; ARCH Y Sun UCLA; ML T Chen UCLA; ML WW Hwu UIUC; ARCH D Chen UIUC; EDA

Positive- Negative Boundary

Scholar-Publication Network Dynamic Positive Sampling

slide-21
SLIDE 21

Dynamic Positive Sampling

21

q

Build up HNSW index for each vector (! 𝑦, ̃ 𝑨) in the latent space (time complexity: 𝑃(𝑜 log 𝑜))

SDNE SIGKDD TF-IPM SIGKDD NetClus SIGKDD H-tree VLDBJ AccDNN FCCM IFUHJ CIDR PSL ICDE J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML W Zhu THU; ML P Cui THU; ML, AI BC Ooi NUS; DB WF Wong NUS; ARCH

Publication Scholar

J Han UIUC; ML P Cui THU; ML, AI W Zhu THU; ML BC Ooi NUS; DB WF Wong NUS; ARCH Y Sun UCLA; ML T Chen UCLA; ML WW Hwu UIUC; ARCH D Chen UIUC; EDA

Positive- Negative Boundary

Scholar-Publication Network Dynamic Positive Sampling

slide-22
SLIDE 22

Dynamic Positive Sampling

22

q

Build up HNSW index for each vector (! 𝑦, ̃ 𝑨) in the latent space (time complexity: 𝑃(𝑜 log 𝑜))

q

Perform kNN approximate search for each vector (! 𝑦, ̃ 𝑨) via HNSW (time complexity: 𝑃(𝑜 log 𝑜))

SDNE SIGKDD TF-IPM SIGKDD NetClus SIGKDD H-tree VLDBJ AccDNN FCCM IFUHJ CIDR PSL ICDE J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML W Zhu THU; ML P Cui THU; ML, AI BC Ooi NUS; DB WF Wong NUS; ARCH

Publication Scholar

J Han UIUC; ML P Cui THU; ML, AI W Zhu THU; ML BC Ooi NUS; DB WF Wong NUS; ARCH Y Sun UCLA; ML T Chen UCLA; ML WW Hwu UIUC; ARCH D Chen UIUC; EDA

Positive- Negative Boundary

Scholar-Publication Network Dynamic Positive Sampling

slide-23
SLIDE 23

Dynamic Positive Sampling

23

q

Build up HNSW index for each vector (! 𝑦, ̃ 𝑨) in the latent space (time complexity: 𝑃(𝑜 log 𝑜))

q

Perform kNN approximate search for each vector (! 𝑦, ̃ 𝑨) via HNSW (time complexity: 𝑃(𝑜 log 𝑜))

SDNE SIGKDD TF-IPM SIGKDD NetClus SIGKDD H-tree VLDBJ AccDNN FCCM IFUHJ CIDR PSL ICDE J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML W Zhu THU; ML P Cui THU; ML, AI BC Ooi NUS; DB WF Wong NUS; ARCH

Publication Scholar

J Han UIUC; ML P Cui THU; ML, AI W Zhu THU; ML BC Ooi NUS; DB WF Wong NUS; ARCH Y Sun UCLA; ML T Chen UCLA; ML WW Hwu UIUC; ARCH D Chen UIUC; EDA

Positive- Negative Boundary

Scholar-Publication Network Dynamic Positive Sampling

slide-24
SLIDE 24

Dynamic Positive Sampling

24

q

Build up HNSW index for each vector (! 𝑦, ̃ 𝑨) in the latent space (time complexity: 𝑃(𝑜 log 𝑜))

q

Perform kNN approximate search for each vector (! 𝑦, ̃ 𝑨) via HNSW (time complexity: 𝑃(𝑜 log 𝑜))

SDNE SIGKDD TF-IPM SIGKDD NetClus SIGKDD H-tree VLDBJ AccDNN FCCM IFUHJ CIDR PSL ICDE J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML W Zhu THU; ML P Cui THU; ML, AI BC Ooi NUS; DB WF Wong NUS; ARCH

Publication Scholar

J Han UIUC; ML P Cui THU; ML, AI W Zhu THU; ML BC Ooi NUS; DB WF Wong NUS; ARCH Y Sun UCLA; ML T Chen UCLA; ML WW Hwu UIUC; ARCH D Chen UIUC; EDA

Positive- Negative Boundary

Scholar-Publication Network Dynamic Positive Sampling HNSW positive sampling probability distribution

slide-25
SLIDE 25

Inter-Partition Proximity Modeling

25

SDNE SIGKDD TF-IPM SIGKDD NetClus SIGKDD H-tree VLDBJ AccDNN FCCM IFUHJ CIDR PSL ICDE J Han UIUC; ML D Chen UIUC; EDA WW Hwu UIUC; ARCH Y Sun UCLA; ML T Chen UCLA; ML W Zhu THU; ML P Cui THU; ML, AI BC Ooi NUS; DB WF Wong NUS; ARCH

Publication Scholar

slide-26
SLIDE 26

Inter-Partition Proximity Modeling

26

slide-27
SLIDE 27

Experimental Setup

27

q Tasks:

¤ Link Prediction & Node Classification

q Metrics:

¤ AUC-ROC, AUC-PR ¤ Micro-F1, Macro-F1

q Datasets:

𝑡𝑞𝑏𝑠𝑡𝑗𝑢𝑧 = 1 −

#%&'( #)*+,×#&.+/

slide-28
SLIDE 28

Experimental Setup

28

q Compared Methods:

Homogeneous Network Methods:

  • DeepWalk

[Perozzi et al SIGKDD 2014]

  • node2vec

[Grover et al SIGKDD 2016]

  • SDNE

[Wang et al SIGKDD 2016] Heterogeneous Network Methods:

  • metapath2vec++

[Dong et al KDD 2017]

  • BiNE

[Gao et al SIGIR 2018]

  • NGCF

[Wang et al SIGIR 2019] Attributed Network Methods:

  • AANE

[Huang et al SDM 2017]

  • ANRL

[Zhang et al IJCAI 2018]

  • FeatWalk

[Huang et al AAAI 2019]

  • STAR-GCN

[Zhang et al IJCAI 2019]

slide-29
SLIDE 29

Efficacy Study

29

q Link Prediction

slide-30
SLIDE 30

Efficacy Study

30

q Node Classification

slide-31
SLIDE 31

Ablation Setup

31

  • BiANE-ATTR: BiANE without structure information
  • BiANE-STRUC: BiANE without attribute information
  • BiANE-INTER: BiANE with inter-partition proximity modeling only
  • BiANE-CONCAT: Integrating attribute and structure encoding by concatenation
  • BiANE-LAYER: Integrating attribute and structure encoding by sharing neural layers
  • BiANE-IS: BiANE with the sampling distribution
  • BiANE-ISL: BiANE with the sampling distribution

in the latent space

slide-32
SLIDE 32

Ablation Study

32

q Node Classification on AMiner and Alibaba Dataset

slide-33
SLIDE 33

Performance w.r.t. #Sample

33

q Link Predcition on MovieLens Dataset

slide-34
SLIDE 34

Efficiency Study

34

q The Time Cost of a Single Round of Sampling

slide-35
SLIDE 35

Conclusion & Future Work

35

q Conclusion

§ Propose a model for embedding bipartite attributed networks,

which simultaneously preserves the intra-partition proximity and the inter-partition proximity

§ Introduce a dynamic positive sampling strategy to ameliorate the

representation learning process without loss of model scalability.

q Future Work

§ Reduce the space complexity for representation learning model. § Extend the current work to model dynamic bipartite attributed

networks.

slide-36
SLIDE 36

THANK YOU FOR YOUR ATTENTION! Q&A