Two Step Graph-based Semi-supervised Learning for Online Auction Fraud Detection
Phiradet Bangcharoensap1, Hayato Kobayashi2, Nobuyuki Shimizu2, Satoshi Yamauchi2, and Tsuyoshi Murata1
1Tokyo Institute of Technology, 2Yahoo Japan Corporation
Learning for Online Auction Fraud Detection Phiradet Bangcharoensap 1 - - PowerPoint PPT Presentation
Two Step Graph-based Semi-supervised Learning for Online Auction Fraud Detection Phiradet Bangcharoensap 1 , Hayato Kobayashi 2 , Nobuyuki Shimizu 2 , Satoshi Yamauchi 2 , and Tsuyoshi Murata 1 1 Tokyo Institute of Technology, 2 Yahoo Japan
Phiradet Bangcharoensap1, Hayato Kobayashi2, Nobuyuki Shimizu2, Satoshi Yamauchi2, and Tsuyoshi Murata1
1Tokyo Institute of Technology, 2Yahoo Japan Corporation
Fraudster Product Online auction website ID1 ID2 Sell Bid
€ € €
2
3
4
5
? ? ? ? ?
Auction Transaction
Blacklisted IDs
Whitelisted
IDs
unlabeled graph
? ? ?
? ? ?
soft labels matrix
Most suspicious Least suspicious
Graph Construction Initial Label Assignment Fraud Scoring Modified Adsorption
6
Product Seller Bidder P1 A B P1 A C P2 A C P3 B A P3 B C P3 B C P3 B C
|{P1, P3}|=2
|{P3}|=1
=|{P1, P2}| =2
7
Assign a score indicating likelihood of being each label (soft labels) Whitelisted node Unlabeled node
…
Dummy label
No enough information
|Nodes| |Possible Labels|+1
Blacklisted node Node: instance that want to classify Edge: similarity between instances
8
9
Weighted degree of vertex v Neighbors of vertex v
10
? ? ? ? ?
Auction Transaction
Blacklisted IDs
Whitelisted
IDs
unlabeled graph
? ? ? ? ?
soft labels matrix Graph Construction Initial Label Assignment Modified Adsorption
11
Most suspicious Least suspicious
Fraud Scoring
?
…
MAD
12
|Nodes|
13
14
2-STEP
Neighbors of vertex v
Weight of an edge (u,v)
15
…
MAD |Nodes|
16
17
1auctions.yahoo.co.jp/
Seller Bidder Mixe d
All
18
Compare with 1) Weighted degree centrality (WDC) 2) Eigenvector centrality (Eigen. C.) 2-STEP method
Unsupervised methods yield poor results. Fraudulent sellers are more difficult.
19
All Bidder Mixed Seller
20
21
Calculated from top 100 Calculated from top 500
All All
Sybil Sybil
22
24
Homogeneous network Heterogeneous network
25