Data Mining for Social Network Analysis Australasian Data Mining - PowerPoint PPT Presentation

Data Mining for Social Network Analysis Australasian Data Mining Conference (AusDM) 2007 December 3 rd – 4 th , 2007 Jaideep Srivastava University of Minnesota Gold Coast, Australia srivasta@cs.umn.edu Joint work with: Arindam Banerjee, Nishith Pathak, Sandeep Mane, Muhammad A. Ahmad David Kuo-Wei Hsu, Young Ae Kim, University of Minnesota Noshir S. Contractor, Northwestern University Dmitri Williams, University of Southern California Sony Online Entertainment Special thanks to Enron (via US DoJ) Sponsors: US National Science Foundation US Army Research Institute Digital Technology Center, University of Minnesota 12/2/2007 1

Outline � Introduction to Social Network Analysis (SNA) � Computer Science and SNA � A Detailed Case Study � Socio-cognitive analysis from e-mail logs � Modeling socio-cognitive networks � Analysis of a socio-cognitive network � Experiments with the Enron dataset � Extracting concealed relationships � An IR-inspired approach � Other applications � SNA from MMORPG logs � Trust in social networks � Expert finding in social networks � Social networks in health care management � Some Emerging Applications � References 12/2/2007 Jaideep Srivastava 2

Introduction to Social Network Analysis Introduction to Social Network Analysis

Social Networks � A social network is a social structure of people, related (directly or indirectly) to each other through a common relation or interest � Social network analysis (SNA) is the study of social networks to understand their structure and behavior �� 12/2/2007 Jaideep Srivastava 4

SNA in Popular Science Press Social Networks have captured the public imagination in recent years as evident in the number of popular science treatment of the subject 12/2/2007 Jaideep Srivastava 5

Networks in Social Sciences � Types of Networks (Contractor, 2006) � Social Networks � “who knows who” � Socio-Cognitive Networks � “who thinks who knows who” � Knowledge Networks � “who knows what” � Cognitive Knowledge Networks � “who thinks who knows what” 12/2/2007 Jaideep Srivastava 6

Types of Social Network Analysis � Sociocentric (whole) network analysis � Emerged in sociology � Involves quantification of interaction among a socially well- defined group of people � Focus on identifying global structural patterns � Most SNA research in organizations concentrates on sociometric approach � Egocentric (personal) network analysis � Emerged in anthropology and psychology � Involves quantification of interactions between an individual (called ego ) and all other persons (called alters ) related (directly or indirectly) to ego � Make generalizations of features found in personal networks � Difficult to collect data, so till now studies have been rare 12/2/2007 Jaideep Srivastava 7

Networks Research in Social Sciences Organizational Organizational Social Social Anthropology Anthropology Theory Theory Psychology Psychology Cognitive Cognitive Cognitive Cognitive Perception Perception Perception Socio-Cognitive Socio-Cognitive Socio-Cognitive Socio-Cognitive Knowledge Knowledge Knowledge Knowledge Networks Networks Networks Networks Networks Networks Networks Networks Social Social Social Social Knowledge Knowledge Knowledge Knowledge Reality Reality Reality Networks Networks Networks Networks Networks Networks Networks Networks Acquaintance Acquaintance Acquaintance Knowledge Knowledge Knowledge (links) (links) (links) (content) (content) (content) Epidemiology Epidemiology Sociology Sociology � Social science networks have widespread application in various fields � Most of the analyses techniques have come from Sociology, Statistics and Mathematics � See (Wasserman and Faust, 1994) for a comprehensive introduction to social network analysis 12/2/2007 Jaideep Srivastava 8

Computer Science and Social Network Analysis

Computer networks as social networks � “Computer networks are inherently social networks, linking people, organizations, and knowledge” (Wellman, 2001) � Data sources include newsgroups like USENET; instant messenger logs like AIM; e-mail messages; social networks like Orkut and Yahoo groups; weblogs like Blogger; and online gaming communities �� 12/2/2007 Jaideep Srivastava 10

Key Drivers for CS Research in SNA � Computer Science has created the über-cyber- infrastructure for � Social Interaction � Knowledge Exchange � Knowledge Discovery � Ability to capture � different about various types of social interactions � at a very fine granularity � with practically no reporting bias � Data mining techniques can be used for building descriptive and predictive models of social interactions � � � Fertile research area for data mining research � 12/2/2007 Jaideep Srivastava 11

A shift in approach: from ‘synthesis’ to ‘analysis’ �� - � �� !� �� !�� " ��#�� $� �$��!�� %�� * �� " &��'�� (��#��$�� ) , �� #��!�� !� �� *��!�� !�� ,�� Shift in approach ��(��+ ��(��+ ��(��+ 12/2/2007 Jaideep Srivastava 12

Data Mining for SNA Case Study Socio-Cognitive Analysis from E-mail Logs

Modeling a Socio-Cognitive Network 12/2/2007 14

Example of E-mail Communication � A sends an e-mail to B � With Cc to C � And Bcc to D � C forwards this e-mail to E � From analyzing the header, we can infer � A and D know that A, B, C and D know about this e-mail � B and C know that A, B and C know about this e-mail � C also knows that E knows about this e-mail � D also knows that B and C do not know that it knows about this e- mail; and that A knows this fact � E knows that A, B and C exchanged this e-mail; and that neither A nor B know that it knows about it � and so on and so forth … 12/2/2007 Jaideep Srivastava 15

Modeling Pair-wise Communication � Modeling pair-wise communication between actors � Consider the pair of actors (A x ,A y ) � Communication from A x to A y is modeled using the Bernoulli distribution L(x,y)=[p,1-p] � Where, � p = (# of emails from A x with A y as recipient)/(total # of emails exchanged in the network) � For N actors there are N(N-1) such pairs and therefore N(N-1) Bernoulli distributions � Every email is a Bernoulli trial where success for L(x,y) is realized if A x is the sender and A y is a recipient 12/2/2007 Jaideep Srivastava 16

Modeling an agent’s belief about global communication � Based on its observations, each actor entertains certain beliefs about the communication strength between all actors in the network � A belief about the communication expressed by L(x,y) is modeled as the Beta distribution, J(x,y) , over the parameter of L(x,y) � Thus, belief is a probability distribution over all possible communication strengths for a given ordered pair of actors (A x ,A y ) 12/2/2007 Jaideep Srivastava 17

Model for Belief Update � J k (x,y) is the Beta distribution maintained by actor A k regarding its belief about the communication from A x to A y � a and b , the two parameters of J k (x,y), are associated with the number of emails observed by A k which are � from A x to A y , i.e. number of successes, and � from A x not to A y , i.e. number of failures � Initialization � a and b start out with default initial values � Many different possibilities � For example, values can be chosen to be small so that they do not have much of an impact and can be “washed out” by future observations � Belief update � on observing a success or failure, A k increments a or b respectively 12/2/2007 Jaideep Srivastava 18

Belief State of an Actor � Every actor maintains Beta distributions (or beliefs) for all ordered pairs of actors in the network � Actor A k ’s belief state is defined to be the set of all N(N-1) Beta distributions (one for every Bernoulli distribution) � We also introduce a “super-actor” in the network � The super-actor is an actor who observes all the communication in the network � Super-actor is used as the baseline for reality � E-mail server is the “super-actor” 12/2/2007 Jaideep Srivastava 19

Data Mining for Social Network Analysis Australasian Data Mining - PowerPoint PPT Presentation

Data Mining for Social Network Analysis Australasian Data Mining Conference (AusDM) 2007 December 3 rd 4 th , 2007 Jaideep Srivastava University of Minnesota Gold Coast, Australia srivasta@cs.umn.edu Joint work with: Arindam Banerjee,

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Twitter Data Analysis with R Text Mining and Social Network Analysis 1 Yanchang Zhao

Data Mining 2020 Mining Social Network Data: Link Prediction Ad Feelders Universiteit Utrecht

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

THE DATA MINING PIPELINE What is data? The data mining pipeline: collection, preprocessing,

Data Mining 2018 Mining (Social) Network Data Ad Feelders Universiteit Utrecht Ad Feelders (

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

Regression, Curve Fitting and Optimisation Sam Tickle Supervised by Elena Zanini STOR-i,

Probabilistic Solution Discovery for Network Reliability Optimization Jose E. Ramirez-Marquez

CCWI2005 Presentation Schedule Day 1: Monday 5 th September 2005 08:00 09:30 Registration and

Using Python to Solve Computationally Hard Problems Using Python to Solve Computationally Hard

Department of Industrial Engineering: Research Groups Engineering Management and Sustainable

Carlos A. Coello Coello Departamento de Computacin CINVESTAV-IPN ccoello@cs.cinvestav.mx

Application of metaheuristics to task-to-processors assignation problems Domingo Gim enez

Machine Learning for Financial Forecasting Ali Habibnia Department of Statistics, LSE May , 2016