[PPT] - Message Passing and Node Classification Prof. Srijan Kumar 1 PowerPoint Presentation

SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Message Passing and Node Classification

Prof. Srijan Kumar

SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Outline

Main question today: Given a network

with labels on some nodes, how do we labels all the other nodes?

Example: In a network, some nodes are

fraudsters and some nodes are fully

trusted. How do you find the other

fraudsters and trustworthy nodes?

SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Intuition

Collective classification: Idea of assigning

labels to all nodes in a network together

– Leverage the correlations in the network!

We will look at three techniques today:

– Relational classification – Iterative classification – Belief propagation

SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Today’s Lecture

Overview of collective classification
Relational classification
Iterative classification
Belief propagation

The lecture slides are borrowed from Prof. Jure Leskovec’s slides from CS224W

SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Correlations Exists in Networks

Example:

Real social network

– Nodes = people – Edges = friendship – Node color = race

People are

segregated by race due to homophily

(Easley and Kleinberg, 2010)

SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

Classification with Network Data

How to leverage this correlation observed

in networks to help predict user attributes or interests? How to predict the labels for the nodes in yellow?

SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

7

Motivation

Similar entities are typically close together or

directly connected:

– “Guilt-by-association”: If I am connected to a node with label X, then I am likely to have label X as well. – Example: Malicious/benign web page:

Malicious web pages link to one another to increase visibility, look credible, and rank higher in search engines

SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

Intuition

Classification label of a node O in network

may depend on:

– Features of O – Labels of the objects in O’s neighborhood – Features of objects in O’s neighborhood

SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

Guilt-by-association

Given:

few graph and
labeled nodes

Find: class (red/green) for rest nodes Assuming: networks have homophily

SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Guilt-By-Association

Let 𝑿 be a 𝑜×𝑜 (weighted) adjacency matrix
ver 𝑜 nodes
Let Y = −1, 0, 1 ) be a vector of labels:

– 1: positive node, known to be involved in a gene function/biological process – -1: negative node – 0: unlabeled node

Goal: Predict which unlabeled nodes are

likely positive

SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

11

Collective Classification

Intuition: simultaneous classification of

interlinked objects using correlations

Several applications

– Document classification – Part of speech tagging – Link prediction – Optical character recognition – Image/3D data segmentation – Entity resolution in sensor networks – Spam and fraud detection

SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Collective Classification Overview

Markov Assumption: the label Yi of one

node i depends on the label of its neighbors Ni

Collective classification involves 3 steps:

Local Classifier

Assign initial label

Relational Classifier

Capture

correlations between nodes Collective Inference

Propagate

correlations through network

𝑄(𝑍

|𝑗) = 𝑄 𝑍
𝑂-)

SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

13

Predicts label based on node attributes/features
Classical classification
Does not employ network information

Collective Inference

Propagate

correlations through network

Local Classifier

Assign initial

label

Relational Classifier

Capture

correlations between nodes

Learn a classifier from the labels or/and attributes of

its neighbors to label one node

Network information is used
Apply relational classifier to each node iteratively
Iterate until the inconsistency between neighboring

labels is minimized

Network structure substantially affects the final

prediction

Collective Classification Overview

SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Today’s Lecture

Overview of collective classification
Relational classification
Iterative classification
Belief propagation

SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

Problem Setting

How to predict the labels Yi for the nodes i in

yellow?

– Each node i has a feature vector fi – Labels for some nodes are given (+ for green, - for blue)

Task: find P(Yi) given the network and features

P(Yi) = ?

SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Probabilistic Relational Classifier

Basic idea: Class probability of Yi is a

weighted average of class probabilities of its neighbors.

For labeled nodes, initialize with ground-

truth Y labels

For unlabeled nodes, initialize Y uniformly
Update all nodes in a random order till

convergence or till maximum number of iterations is reached

SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

Probabilistic Relational Classifier

Repeat for each node i and label c

– W(i,j) is the edge strength from i to j – |Ni| is the number of neighbors of I

Challenges:

– Convergence is not guaranteed – Model cannot use node feature information

SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

Example

Initialization: All labeled nodes to their labels and all unlabeled nodes uniformly

P(Y = 1) = 0 P(Y = 1) = 0 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 1

P(Y=1) = 0.5

SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

Update for the 1st Iteration:

– For node 3, N3={1,2,4}

Example

P(Y = 1) = 0 P(Y = 1) = 0 P(Y=1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 1

P(Y=1|N3) = 1/3 (0 + 0 + 0.5) = 0.17

SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

Update for the 1st Iteration:

– For node 4, N4={1,3, 5, 6}

Example

P(Y = 1) = 0 P(Y = 1) = 0 P(Y=1|N4)= ¼(0+ 0.17+0.5+1) = 0.42

P(Y=1) = 0.17

P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 1

SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Update for the 1st Iteration:

– For node 5, N5={4,6,7,8}

Example

P(Y = 1) = 0 P(Y = 1) = 0 P(Y=1|N4)= 0.42

P(Y=1) = 0.17

P(Y=1|N5) = ¼ (0.42+1+1+0.5) = 0.73 P(Y = 1) = 0.5 P(Y = 1) = 0.5 P(Y = 1) = 1 P(Y = 1) = 1

SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

After Iteration 1

P(Y = 1) = 0 P(Y = 1) = 0 P(Y = 1) = 0.17 P(Y = 1) = 0.42 P(Y = 1) = 0.73 P(Y = 1) = 0.91 P(Y = 1) = 1.00

Example

SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

After Iteration 2

P(Y = 1) = 0 P(Y = 1) = 0 P(Y = 1) = 0.14 P(Y = 1) = 0.47 P(Y = 1) = 0.85 P(Y = 1) = 0.95 P(Y = 1) = 1.00

Example

All neighbors values are

fixed. So the

value can not change.

SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

After Iteration 3

P(Y = 1) = 0 P(Y = 1) = 0 P(Y = 1) = 0.16 P(Y = 1) = 0.50 P(Y = 1) = 0.86 P(Y = 1) = 0.95 P(Y = 1) = 1.00

Example

SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

After Iteration 4

P(Y = 1) = 0 P(Y = 1) = 0 P(Y = 1) = 0.16 P(Y = 1) = 0.51 P(Y = 1) = 0.86 P(Y = 1) = 0.95 P(Y = 1) = 1.00

Example

SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

All scores stabilize after 5 iterations
Final labeling

– Nodes 5, 8, 9 are + (P(Yi = 1) > 0.5) – Node 3 is – (P(Yi = 1) < 0.5) – Node 4 is in between (P(Yi = 1) =0.5)

+ + +

+/-

Example

SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

Today’s Lecture

Overview of collective classification
Relational classification
Iterative classification
Belief propagation

SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

Iterative Classification

Relational classifiers do not use node

attributes

– How can one leverage them?

Main idea of iterative classification:

classify node i based on its attributes as well as labels of neighbor set Ni

SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

Iterative Classification: Process

1. Create a feature vector ai for each node i
2. Train a classifier to classify using ai
3. Node may have various number of

neighbors, so we can aggregate using: count , mode, proportion, mean, exists, etc.

SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

Basic Architecture

Bootstrap phase

– Convert each node i to a flat vector ai – Use local classifier f(ai) (e.g., SVM, kNN, …) to compute best value for Yi

Iteration phase: Iterate till convergence

– Repeat for each node i

Update node vector ai
Update label Yi to f(ai). This is a hard assignment

– Iterate until class labels stabilize or max number

f iterations is reached
Note: Convergence is not guaranteed

– Run for max number of iterations

SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

Application of Iterative Classification Framework: Fake Reviewer/Review Detection

REV2: Fraudulent User Predictions in Rating Platforms Kumar et al. ACM Web Search and Data Mining, 2018

SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

Fake Review Spam

Review sites are an attractive target for

spam: a +1 star increase in rating increases 5-9% revenue!

Often hype/defame spam
Paid spammers

SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

Fake Review Spam Detection

Behavioral analysis

– individual features, geographic locations, login times, session history, etc.

Language analysis

– use of superlatives, many self-referencing, rate of misspell, many agreement words, …

Behavior and language is easy to fake!
Graph structure is hard to fake

– Graphs capture relationships between reviewers, reviews, stores

SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Problem Setup

Input: bipartite rating

graph as a weighted signed network:

– Nodes: users, products – Edges: rating scores between -1 and +1

Output: set of users

that give fake ratings

34

Red edges = -1 rating Green edges = +1 rating

SLIDE 35

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Basic idea: Users,

products, and ratings have intrinsic quality scores: – Users have fairness scores – Products have goodness

scores

– Ratings have reliability

scores

All values are unknown

35

Each product has a ‘goodness’ score G 𝑞 ∈ −1,1 Each user has a ‘fairness’ score 𝐺 𝑣 ∈ 0,1 Each rating has a ‘reliability’ score R 𝑣, 𝑞 ∈ 0,1

REV2 Solution Formulation

SLIDE 36

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

How can one calculate the

values for all nodes and edges simultaneously?

Solution: Collective

classification

36

Each product has a ‘goodness’ score G 𝑞 ∈ −1,1 Each user has a ‘fairness’ score 𝐺 𝑣 ∈ 0,1 Each rating has a ‘reliability’ score R 𝑣, 𝑞 ∈ 0,1

REV2 Solution Formulation

SLIDE 37

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

37

Fairness of Users

Fixing goodness and reliability, fairness is

updated as:

SLIDE 38

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

38

Goodness of Products

Fixing fairness and reliability, goodness is

updated as:

SLIDE 39

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

39

Reliability of Ratings

Fixing fairness and goodness, reliability is

updated as:

SLIDE 40

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

40

Initialization: Start with Best Scores

G(p) = 1 G(p) = 1 G(p) = 1 F(u) = 1 F(u) = 1 F(u) = 1 R(u,p) = 1 R(u,p) = 1 R(u,p) = 1 R(u,p) = 1

SLIDE 41

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

41

Updating Goodness, Iteration 1

F(u) = 1 F(u) = 1 F(u) = 1 F(u) = 1 F(u) = 1

R(r) = 1

R(r) = 1 G(p) = 0.67 G(p) = 0.67 G(p) = -0.67 R(r) = 1 R(r) = 1

SLIDE 42

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

42

Updating Reliability, Iteration 1

F(u) = 1 F(u) = 1 F(u) = 1 F(u) = 1 F(u) = 1 F(u) = 1 R(r) = 0.92 R(r) = 0.92 R(r) = 0.92 R(r) = 0.58 R(r) = 0.58 G(p) = 0.67 G(p) = 0.67 G(p) = -0.67 Both gamma values are set to 1

SLIDE 43

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

43

Update Fairness, Iteration 1

F(u) = 0.92 F(u) = 0.92 F(u) = 0.58 F(u) = 0.92 F(u) = 0.92 F(u) = 0.92 R(r) = 0.92 R(r) = 0.92 R(r) = 0.58 R(r) = 0.58 R(r) = 0.92 G(p) = 0.67 G(p) = 0.67 G(p) = -0.67

SLIDE 44

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

44

After Convergence

F(u) = 0.83 F(u) = 0.83 F(u) = 0.17 F(u) = 0.83 F(u) = 0.83 F(u) = 0.83 R(r) = 0.83 R(r) = .83 R(r) = 0.83 R(r) = 0.17 R(r) = 0.83 R(r) = 0.17 G(p) = 0.67 G(p) = 0.67 G(p) = -0.67

SLIDE 45

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

45

Properties of REV2 Solution

Guaranteed to converge
Number of iterations till convergence is

upper-bounded

Time–complexity: linear

SLIDE 46

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

46

Performance

Low fairness users = Fraudsters
127 of 150 lowest fairness users in Flipkart

were real fraudsters

REV2 is being used in production at

Flipkart

SLIDE 47

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

47

Linear Scalability

Multiple iterations, but linear scalability

SLIDE 48

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

48

Today’s Lecture

Overview of collective classification
Relational classification
Iterative classification
Belief propagation

SLIDE 49

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

49

Loopy belief propagation

Intuition: Use neighbors belief about a node to

predict node label

– Used to estimate marginals (beliefs) or the most likely

states of all variables (nodes)

Iterative process in which neighbor variables “talk” to

each other, passing messages

When consensus is reached, calculate final belief

SLIDE 50

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Message Passing Basics

Task: Count the number of nodes in a graph* Condition: Each node can only interact (pass message) with its neighbors Example: straight line graph

50

adapted from MacKay (2003) textbook

* Graph can not have loops. Explanation later.

SLIDE 51

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1 before you

2 before you there's 1 of me 3 before you 4 before you 5 before you

Task: Count the number of nodes in a graph Condition: Each node can pass message to its neighbors Solution: Each node listens to the message from its neighbor, updates it, and passes it forward

51

1 after you 2 after you 3 after you 4 after you 5 after you 6 after you

Message Passing Basics

SLIDE 52

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3 behind you

2 before you

there's 1 of me Belief: Must be 2 + 1 + 3 = 6

f us
nly see

my incoming messages

52

2 before you

Each node only sees incoming messages

Message Passing Basics

SLIDE 53

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4 behind you 1 before you there's 1 of me

nly see

my incoming messages

53

Belief: Must be 2 + 1 + 3 = 6

f us

Belief: Must be 1 + 1 + 4 = 6

f us

Each node only sees incoming messages

Message Passing Basics

SLIDE 54

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Message Passing in a Tree

7 here 3 here 11 here (= 7+3+1) 1 of me

54

Each node receives reports from all branches of tree

SLIDE 55

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3 here 3 here 7 here (= 3+3+1)

Each node receives reports from all branches of tree

55

Message Passing in a Tree

SLIDE 56

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Message Passing in a Tree

7 here 3 here 11 here (= 7+3+1)

56

Each node receives reports from all branches of tree

SLIDE 57

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Message Passing in a Tree

7 here 3 here 3 here Belief: Must be 14 of us

57

Each node receives reports from all branches of tree

SLIDE 58

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Message Passing in a Tree

7 here 3 here 3 here Belief: Must be 14 of us

58

Each node receives reports from all branches of tree

SLIDE 59

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

59

Loopy BP algorithm

What message will i send to j?

It depends on what i hears

from its neighbors k

Each neighbor k passes a

message to i: k’s beliefs of the state to i

SLIDE 60

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

60

Notations

Label-label potential matrix : Dependency

between a node and its neighbor. equals the probability of a node i being in state given that it has a j neighbor in state

Prior belief : Probability of node i

being in state

is i’s estimate of j being in state
is the set of all states

SLIDE 61

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

61

Loopy BP algorithm

1. Initialize all messages to 1
2. Repeat for each node

61

Label-label potential Prior All messages from neighbors Sum over all states

SLIDE 62

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

62

Loopy BP algorithm

After convergence: = i’s belief of being in state

Prior All messages from neighbors

SLIDE 63

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

63

Loopy belief propagation

What if our graph has cycles?

– Message from different subgraphs are no longer independent! – BP will give wrong results

SLIDE 64

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

BP and Loops

64

T 2 F 1 T 2 F 1 T 2 F 1 T 2 F 1 T 2 F 1 T 4 F 1 T 4 F 1

Messages loop around and around:

2, 4, 8, 16, 32, ... More and more convinced that these variables are T!

BP incorrectly treats this message as

separate evidence that the variable is T.

Multiplies these two messages as if

they were independent.

But they don’t actually come from

independent parts of the graph.

One influenced the other (via a cycle).

SLIDE 65

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

65

Advantages of Belief Propagation

Advantages:

– Easy to program & parallelize – General: can apply to any graphical model w/ any form of potentials (higher order than pairwise)

Challenges:

– Convergence is not guaranteed (when to stop), especially if many closed loops

Potential functions (parameters)

– require training to estimate – learning by gradient-based optimization: convergence issues during training

SLIDE 66

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

66

Application of belief propagation: Online auction fraud

Netprobe: A Fast and Scalable System for Fraud Detection in Online Auction Networks Pandit et al., World Wide Web conference 2007

SLIDE 67

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

67

Online Auction Fraud

Auction sites: attractive target for fraud
63% complaints to Federal Internet Crime

Complaint Center in U.S. in 2006

Average loss per incident: = $385

SLIDE 68

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

68

Online Auction Fraud Detection

Insufficient solution to look at individual

features: user attributes, geographic locations, login times, session history, etc.

Hard to fake: graph structure
Capture relationships between users
Main question: how do fraudsters interact

with other users and among each other?

– In addition to buy/sell relations, are there more complex relations?

SLIDE 69

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

69

Feedback Mechanism

Each user has a reputation score
Users rate each other via feedback
Question: How do fraudsters game the

feedback system?

SLIDE 70

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

70

Auction “Roles” of Users

Do they boost each
ther’s reputation?

– No, because if one is

caught, all will be caught

They form near-bipartite

cores (2 roles)

– Accomplice: trades with

honest, looks legit

– Fraudster: trades with

accomplice, fraud with honest

SLIDE 71

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

71

Detecting auction fraud

How to find near-bipartite cores? How to find

roles (honest, accomplice, fraudster)?

– Use belief propagation!

How to set BP parameters (potentials)?

– prior beliefs: prior knowledge, unbiased if none – compatibility potentials: by insight

SLIDE 72

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

72

Belief propagation in action

Initialize all nodes as unbiased

SLIDE 73

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

73

Belief propagation in action

Initialize all nodes as unbiased At each iteration, for each node, compute messages to its neighbors

SLIDE 74

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

74

Belief propagation in action

Initialize all nodes as unbiased Continue till convergence At each iteration, for each node, compute messages to its neighbors

SLIDE 75

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

75

Final belief scores = final roles

P(fraudster) P(associate) P(honest)

SLIDE 76

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

76

Today’s Lecture

Overview of collective classification
Relational classification

– Weighted average of neighborhood properties – Can not take node attributes while labeling

Iterative classification

– Takes node features while labeling

Belief propagation

– Message passing to update each node’s belief

f itself based on neighbors’ beliefs