Yutao Zhang + , Jie Tang + , Zhilin Yang + , Jian Pei # , and Philip - - PowerPoint PPT Presentation

yutao zhang jie tang zhilin yang jian pei and philip s yu
SMART_READER_LITE
LIVE PREVIEW

Yutao Zhang + , Jie Tang + , Zhilin Yang + , Jian Pei # , and Philip - - PowerPoint PPT Presentation

COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency Yutao Zhang + , Jie Tang + , Zhilin Yang + , Jian Pei # , and Philip S. Yu* + Tsinghua University # Simon Fraser University * University of Illinois at Chicago 1


slide-1
SLIDE 1

1

COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency

Yutao Zhang+, Jie Tang+, Zhilin Yang+, Jian Pei#, and Philip S. Yu*

+Tsinghua University #Simon Fraser University

*University of Illinois at Chicago

slide-2
SLIDE 2

2

p Academic Social Network Analysis and Mining system—AMiner (http://aminer.org)

p Online since 2006 p >38 million researcher profiles p >100 million publications p >241 million requests p >12.35 Terabyte data p 100K IP access from 170 countries

per month

p 10% increase of visits per month

p Deep analysis, mining, and search

AMiner II (ArnetMiner)

slide-3
SLIDE 3

3

Ruud Bolle

Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: bolle@us.ibm.com Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The

  • Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in

1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode

  • Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson

Research Center in the Artificial Intelligence Department of the Computer Science

  • Department. In 1988 he became manager of the newly formed Exploratory Computer

Vision Group which is part of the Math Sciences Department. Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications. Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud

  • M. Bolle is a Member of the IBM Academy of Technology.

DBLP: Ruud Bolle

2006

Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics: A Case Study in Fingerprints. ICPR (4) 2006: 370-373

EE 50

Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint Representation Using Localized Texture Features. ICPR (4) 2006: 521-524

EE 49

Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle: Appearance models for occlusion handling. Image Vision Comput. 24(11): 1233-1243 (2006)

EE 48 2005

Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior: The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20

EE 47

Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ratha: Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116

EE 46 ...

Ruud Bolle

Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: bolle@us.ibm.com Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The

  • Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in

1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode

  • Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson

Research Center in the Artificial Intelligence Department of the Computer Science

  • Department. In 1988 he became manager of the newly formed Exploratory Computer

Vision Group which is part of the Math Sciences Department. Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications. Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud

  • M. Bolle is a Member of the IBM Academy of Technology.

Knowledge Acquisition from the Web

(ACM TKDD, WWW’12, ISWC’06, ICDM’07, ACL’07)

Contact Information Educational history Academic services Publications

Ruud Bolle Position Affiliation Address Address Email Phduniv Phdmajor Phddate Msuniv Msdate Msmajor Bsuniv Bsdate Bsmajor Research Staff IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA bolle@us.ibm.com Brown University 1984 Electrical Engineering Delft University of Technology Analog Electronics 1977 Delft University of Technology IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA IBM T.J. Watson Research Center Electrical Engineering 1980 Applied Mathematics Msmajor http://researchweb.watson.ibm.com/ ecvg/people/bolle.html Homepage Ruud Bolle Name video database indexing video processing visual human-computer interaction biometrics applications Research_Interest Photo

Publication 1#

Cancelable Biometrics: A Case Study in Fingerprints ICPR

370 2006 Date Start_page Venue Title 373 End_page Publication 2#

Fingerprint Representation Using Localized Texture Features ICPR

521 2006 Date Start_page Venue Title 524 End_page

. . .

Co-author Co-author

Ruud Bolle

Publication #3 Publication #5

coauthor coauthor UIUC affiliation Professor position

slide-4
SLIDE 4

4

Researcher Profile Database[1]

Extracted more than 1,000,000 researcher profiles from the Web

[1] J. Tang, L. Yao, D. Zhang, and J. Zhang. A Combination Approach to Web User Profiling. ACM Transactions on Knowledge Discovery from Data (TKDD), (vol. 5 no. 1), Article 2 (December 2010), 44 pages.

slide-5
SLIDE 5

5

Is this Enough?

slide-6
SLIDE 6

6

Required semantics are distributed in multiple sources

LinkedIn Videolectures

slide-7
SLIDE 7

7

Identity Linking

  • Identifying users from multiple heterogeneous networks and integrating

semantics from the different networks together.

slide-8
SLIDE 8

8

COSNET: Connecting Social Networks with

Local and Global Consistency

  • Input: G={G1, G2, …, Gm}, with Gk=(Vk, Ek, Rk)
  • Formalization: X={xi}, all possible pairwise

matchings and each corresponds to

  • COSNET: an energy-based model

[1] Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip Yu. COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency. KDD’15.

yi ∈{1,0}

Y * = argminE(Y, X)

slide-9
SLIDE 9

9

Local vs. Global consistency

  • Given three networks,

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

slide-10
SLIDE 10

10

Local vs. Global consistency

  • Local matching: matching users by profiles

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local consistency

Pairwise similarity features

– Username similarity and uniqueness – Profile content similarity – Ego network similarity – Social status Energy function

slide-11
SLIDE 11

11

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性 网络一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local vs. Global consistency

  • Network matching: matching users’ ego networks

Network matching Local consistency

Encourage “neighborhood

  • preserving matching”
slide-12
SLIDE 12

12

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性 网络一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local vs. Global consistency

  • Network matching: matching users’ ego networks

Network matching Local consistency

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

True True

slide-13
SLIDE 13

13

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性 网络一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local vs. Global consistency

  • Network matching: matching users’ ego networks

Network matching Local consistency

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

True True

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

False True x

slide-14
SLIDE 14

14

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性 网络一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local vs. Global consistency

  • Network matching: matching users’ ego networks

Network matching Local consistency

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

True True

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

False True x

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

False False x x

slide-15
SLIDE 15

15

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性 网络一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local vs. Global consistency

  • Network matching: matching users’ ego networks

Network matching Local consistency

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

True True

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

False True x

𝑤1

1

𝑤1

2

𝑤2

2

𝑤2

1

False False x x

slide-16
SLIDE 16

16

Network Matching

  • Network matching: matching users’ ego networks

𝒘𝟐

𝟐

𝒘𝟐

𝟑

𝒘𝟐

𝟒

𝒘𝟑

𝟐

𝒘𝟑

𝟑

𝒘𝟑

𝟒

𝒘𝟐

𝟐𝒘𝟑 𝟐

𝒘𝟐

𝟑𝒘𝟑 𝟐

𝒘𝟐

𝟐𝒘𝟑 𝟑

𝒘𝟐

𝟒𝒘𝟑 𝟒

𝒘𝟐

𝟑𝒘𝟑 𝟑

𝒘𝟐

𝟑𝒘𝟑 𝟒

𝒘𝟐

𝟐𝒘𝟑 𝟒

𝒘𝟐

𝟒𝒘𝟑 𝟐

𝒘𝟐

𝟒𝒘𝟑 𝟑

𝑯𝟐 𝑯 𝟑 𝑫

Input networks Matching graph

Energy function

slide-17
SLIDE 17

17

Candidate Pruning

  • Content-based method

– Username similarity above a threshold

  • Structure-based similarity

– Starting from a seed mapping set and iteratively propagate the m

slide-18
SLIDE 18

18

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性 网络一致性 全局一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local vs. Global consistency

  • Global consistency: matching users by avoiding global

inconsistency

Global inconsistency

Network matching Local consistency

Avoid “global inconsistency”

slide-19
SLIDE 19

19

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性 网络一致性 全局一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local vs. Global consistency

  • Global consistency: matching users by avoiding global

inconsistency

Global inconsistency

Network matching Local consistency

𝑤1

3

𝑤1

1

𝑤1

2

𝑤1

3

𝑤1

1

𝑤1

2

𝑤1

3

𝑤1

1

𝑤1

2

𝑤1

3

𝑤1

1

𝑤1

2

x x x x x x

slide-20
SLIDE 20

20

𝑤1

3

𝑤1

1

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

3

𝑤2

3

𝑤3

2

𝐻 2

局部一致性 网络一致性 全局一致性

Username: Ortiz_Brandy Nation: USA Gender: female Username: @ortizbrandy Nation: USA Gender: female

𝐻 1 𝐻 3

Local vs. Global consistency

  • Global consistency: matching users by avoiding global

inconsistency

Global inconsistency

Network matching Local consistency

𝑤1

3

𝑤1

1

𝑤1

2

𝑤1

3

𝑤1

1

𝑤1

2

𝑤1

3

𝑤1

1

𝑤1

2

𝑤1

3

𝑤1

1

𝑤1

2

x x x x x x Inconsistent!

slide-21
SLIDE 21

21

Avoid Global Inconsistency

Energy function

Input networks Matching graph

slide-22
SLIDE 22

22

Model Construction

Objective function by combining all the energy functions

𝑤3

1

𝑤1

2

𝑤2

2

𝑤2

1

𝑤3

2

𝐻 2

s

𝑤1

1 𝑤1 2

𝑤1

1

Matching Graph Generation Candidate Pruning Model Construction

𝑤1

1 𝑤2 2

𝑤1

1𝑤3 2

𝑤2

1 𝑤2 2

𝑤2

1 𝑤1 2

𝑤2

1𝑤3 2

𝑤3

1𝑤1 2

𝑤3

1𝑤2 2

𝑤3

1𝑤3 2

𝑦1 𝑦2 𝑦3 𝑦4 𝑦5 𝑧1 𝑧4 𝑧5 𝑧2 𝑧3

𝑔 𝑚(𝑦1) 𝑔 𝑚 (𝑦2) 𝑔 𝑚 (𝑦4) 𝑔 𝑚 (𝑦5) 𝑔 𝑚 (𝑦3) 𝑔 𝑓 (𝑧1, 𝑧2)

𝐻 1

𝑤1

1 𝑤1 2

𝑤1

1𝑤3 2

𝑤2

1 𝑤2 2

𝑤3

1𝑤1 2

𝑤3

1𝑤3 2

𝑤1

1 𝑤1 2

𝑤1

1 𝑤3 2

𝑤2

1 𝑤2 2

𝑤3

1 𝑤1 2

𝑤3

1 𝑤3 2 𝑔 𝑓 (𝑧2, 𝑧3) 𝑔 𝑓 (𝑧2, 𝑧4) 𝑔 𝑓 (𝑧2, 𝑧5)

(a) Two input networks (b) The generated matching graph (c) Matching graph after pruning (d) The constructed model

slide-23
SLIDE 23

23

Model Learning

  • Max-margin learning
  • As the original problem is intractable, we use Lagrangian

relaxation to decompose the original objective function into a set of easy-to-solve sub-problems

slide-24
SLIDE 24

24

Model Learning (cont.)

  • Dual decomposition

The resulting objective function is convex and non-differentiable, and can be solved by projected sub-gradient method This provides a lower bound to the original function

slide-25
SLIDE 25

25

Results

slide-26
SLIDE 26

26

Connecting AMiner with …

  • LinkedIn and VideoLectures

Name-match: match name only; SVM: use classifier to identify the same user; MNA: an optimization method; SiGMa: local propagation; COSNET: our method; COSNET-: w/o global consistency.

slide-27
SLIDE 27

27

Connecting Social Media Sites

  • Twitter, LiveJournal, Last.fm, Flickr, MySpace

Name-match: match name only; SVM: use classifier to identify the same user; MNA: an optimization method; SiGMa: local propagation; COSNET: our method; COSNET-: w/o global consistency.

slide-28
SLIDE 28

28

Effects of Global Consistency

+5.4%

Academia Collection SNS Collection

+9.5%

COSNET-: w/o global consistency.

slide-29
SLIDE 29

29

Application in AMiner

  • Video contents
  • Personal profiles
  • Business connections
  • Skills and expertise
  • Patents data
slide-30
SLIDE 30

30

Thanks!

http://aminer.org Data & source code http://aminer.org/cosnet