APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou - PowerPoint PPT Presentation

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015

Heterogeneous Information Networks • Multiple object types and/or multiple link types Movie Studio Director Actor Venue Paper Author Movie DBLP Bibliographic Network The IMDB Movie Network The Facebook Network 1. Homogeneous networks are Information loss projection of heterogeneous networks! 2. New problems are emerging in heterogeneous networks! Directly Mining information richer heterogeneous networks 1

Outline • Why Heterogeneous Information Networks? • Entity Recommendation • Information Diffusion • Ideology Detection • Summary 2

Recommendation Paradigm feedback user community user- item feedback recommender system recommendation product features Collaborative Filtering Content-Based Methods Hybrid Methods E.g., K-Nearest Neighbor (Sarwar WWW’01) , Matrix E.g., (Balabanovic Comm. ACM’ 97, Zhang SIGIR’02) E.g., Content-Based CF (Antonopoulus , IS’06) , Factorization (Hu ICDM’08, Koren IEEE- CS’09) , External Knowledge CF (Ma WSDM’11) Probabilistic Model (Hofmann SIGIR’03) external knowledge 3

Problem Definition feedback user implicit user feedback recommender system recommendation hybrid collaborative filtering with information networks information network 4

Hybrid Collaborative Filtering with Networks • Utilizing network relationship information can enhance the recommendation quality • However, most of the previous studies only use single type of relationship between users or items (e.g., social network Ma,WSDM’ 11 , trust relationship Ester, KDD’ 10 , service membership Yuan, RecSys’ 11 ) 5

The Heterogeneous Information Network View of Recommender System Revolution Avatar Titanic Aliens -ary Road James Romance Cameron Zoe Leonardo Kate Adventure Saldana Dicaprio Winslet 6

Relationship Heterogeneity Alleviates Data Sparsity Collaborative filtering methods suffer from data sparsity issue # of ratings A small number Most users and items have of users and items a small number of ratings have a large number of ratings # of users or items • Heterogeneous relationships complement each other • Users and items with limited feedback can be connected to the network by different types of paths • Connect new users or items (cold start) in the information network 7

Relationship Heterogeneity Based Personalized Recommendation Models Different users may have different behaviors or preferences Two levels of personalization Data level James Cameron fan • Most recommendation methods use Aliens one model for all users and rely on personal feedback to achieve 80s Sci-fi fan personalization Model level Sigourney Weaver fan • With different entity relationships, we can learn personalized models for Different users may be interested in the same different users to further distinguish movie for different reasons their differences 8

Preference Propagation-Based Latent Features genre: drama King Kong Bob Naomi Watts Charlie tag: Oscar Nomination Ralph Fiennes Alice Titanic skyfall revolutionary Kate Winslet Sam Mendes road Calculate latent- Generate L different Propagate user features for users meta-path (path th typ ypes) es) implicit feedback and items for each connecting users along each meta- meta-path with NMF and items path related method 9

Recommendation Models Observation 1 : Different meta-paths may have different importance Global Recommendation Model features for user i and item j ranking score (1) the q-th meta-path Observation 2 : Different users may require different models Personalized Recommendation Model user-cluster similarity L (2) c total soft user clusters 10

Parameter Estimation • Bayesian personalized ranking (Rendle UAI’ 09) • Objective function sigmoid function min (3) Θ for each correctly ranked item pair i.e., 𝑣 𝑗 gave feedback to 𝑓 𝑏 but not 𝑓 𝑐 Generate For each user Soft cluster users personalized model cluster, learn one with NMF + k-means for each user on the model with Eq. (3) fly with Eq. (2) Learning Personalized Recommendation Model 11

Experiment Setup • Datasets • Comparison methods: • Popularity: recommend the most popular items to users • Co-click: conditional probabilities between items • NMF: non-negative matrix factorization on user feedback • Hybrid-SVM: use Rank-SVM with plain features (utilize both user feedback and information network) 12

Performance Comparison HeteRec personalized recommendation (HeteRec-p) provides the best recommendation results 13

Performance under Different Scenarios p p user HeteRec – p consistently outperform other methods in different scenarios better recommendation results if users provide more feedback better recommendation for users who like less popular items 14

Entity Recommendation in Information Contributions Networks with Implicit User Feedback (RecSys’13, WSDM’14a) • Propose latent representations for users and items by propagating user preferences along different meta-paths • Employ Bayesian ranking optimization technique to correctly evaluate recommendation models • Further improve recommendation quality by considering user differences at model level and define personalized recommendation models • Two levels of personalization 15

Information Diffusion in Networks • Action of a node is triggered by the actions of their neighbors 17

Linear Threshold Model • [Granovetter, 1978] • If the weighted activation number of its neighbors is bigger than a pre-specified threshold 𝜄 𝑣 , the node u is going to be activated • In other words • 𝑞 𝑣 (𝑢 + 1) = 𝐹[1 𝑤∈Γ 𝑣 𝑥 𝑤,𝑣 𝜀 𝑣, 𝑢 > 𝜄 𝑣 ] 18

Heterogeneous Bibliographic Network • Multiple types of objects • Multiple types of links 19

Derived Multi-Relational Bibliographic Network • Collaboration: Author-Paper-Author • Citation: Author-Paper->Paper-Author • Sharing Co-authors: Author-Paper-Author-Paper-Author • Co-attending venues: Author-Paper-Venue-Paper-Author How to generate these meta-paths ? PathSim : Sun et.al, VLDB’11 20

How Topics Are Propagated among Authors? • To Apply Existing approaches • Select one relation between authors (say, A-P-A) • Use all the relations, but ignore the relation types • Do different relation types play different roles? • Need new models! 21

Two Assumptions for Topic Diffusion in Multi- Relational Networks • Assumption 1: Relation independent diffusion Model-level aggregation 22

• Assumption 2: Relation interdependent diffusion Relation-level aggregation 23

Two Models under the Two Assumptions • Two multi-relational linear threshold models • Model 1: MLTM-M • Model-level aggregation • Model 2: MLTM-R • Relation-level aggregation 24

MLTM-M • For each relation type k • The activation probability for object i at time t+1: • The collective model • The final activation probability for object i is an aggregation over all relation types 25

Properties of MLTM-M 26

MLTM-R • Aggregate multi-relational network with different weights • Treat the activation as in a single-relational network • To make sure the activation probability non-negative, weights 𝛾 ′ 𝑡 are required non-negative 27

Properties of MLTM-R 28

How to Evaluate the Two Models? • Test on the real action log on multiple topics! • 𝐵𝑑𝑢𝑗𝑝𝑜 𝑚𝑝𝑕: {< 𝑣 𝑗 , 𝑢 𝑗 > } • Diffusion model learning from action log • MLE estimation over 𝛾 ′ 𝑡 29

Two Real Datasets • DBLP • Computer Science • Relation types • APA, AP->PA, APAPA, APVPA • APS • Physics • Relation types • APA, AP->PA, APAPA, APOPA 30

Topics Selected • Select topics with increasing trends 31

Evaluation Methods • Global Prediction • How many authors are activated at t+1 • Error rate = ½(predicted#/true# + true#/predicted#)-1 • Local Prediction • Which author is likely to be activated at t+1 • AUPR (Area under Precision-Recall Curve) 32

Global Prediction 33

Local Prediction - AUPR • 1: Different Relation Play Different Roles in Diffusion Process • 2: Relation-Level Aggregation is better than Model- Level Aggregation 34

Case Study 35

Prediction Results on “social network” Diffusion 36

WIN! 38

• Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network (KDD’14, Gu, Sun et al.) 40

Background Federal The House Legislation Law Senate (bill) …… Bill 1Bill 2 United Stated Congress Ronald Paul Barack Obama The House Senate Politician Ronald Paul Republican Democrat Barack Obama liberal conservative 41

Legislative Voting Network 42

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou - PowerPoint PPT Presentation

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015 Heterogeneous Information Networks Multiple object types and/or

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Introduction What is data mining? to Data mining functionalities Data Mining Major

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

NANO MINING POOL CLOUD CONTRACTS AND MINING SERVICES OUR PRODUCTS Cloud cards are mining cards

Demo Abstract: Real-time Heterogeneous Edge Computing System for Social Sensing Applications Yue

Hyperscale FPGAs for HPC and Cloud Christoph Hagleitner, hle@zurich.ibm.com IBM Research -

Performance Enhancement of Extended AFDX via Bandwidth Reservation for TSN/BLS Shapers Ana s

DDoS Defense by Defense by DDoS Offense Offense Published in ACM SIGCOMM06 Presented By:

Community Viewing meets Network Coding: New Strategies for Distribution, Consumption and

Northbound Connections of VPP for NFV in Containers and Kubernetes FastData.io VPP Billy

Local Interactions in a Market with Heterogeneous Expectations Mikhail Anufriev 1 Andrea

Implementation of an Automated Virus Response System October 26, 2004 Todd Acheson, David