Finding Camoufmaged Needle in a Haystack? Pornographic Products - - PowerPoint PPT Presentation

finding camoufmaged needle in a haystack pornographic
SMART_READER_LITE
LIVE PREVIEW

Finding Camoufmaged Needle in a Haystack? Pornographic Products - - PowerPoint PPT Presentation

. Zhuoren Jiang 3 . . . . . . . Finding Camoufmaged Needle in a Haystack? Pornographic Products Detection via Berrypicking Tree Model Guoxiu He 1,2 Yangyang Kang 2 Zhe Gao 2 Changlong Sun 2 . Xiaozhong Liu *4,2 Wei Lu *1 Qiong Zhang 2 Luo


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Finding Camoufmaged Needle in a Haystack? Pornographic Products Detection via Berrypicking Tree Model

Guoxiu He1,2 Yangyang Kang2 Zhe Gao2 Zhuoren Jiang3 Changlong Sun2 Xiaozhong Liu*4,2 Wei Lu*1 Qiong Zhang2 Luo Si2

1Wuhan University {guoxiu.he, weilu}@whu.edu.cn 2Alibaba Group {yangyang.kangyy, gaozhe.gz, qz.zhang, luo.si}@alibaba-inc.com, changlong.scl@taobao.com 3Sun Yat-sen University jiangzhr3@mail.sysu.edu.cn 4Indiana University Bloomington liu237@indiana.edu

SIGIR 2019

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

  • Background:

In the past decade, decentralized eCommerce services, e.g., eBay, eBid, and Taobao, are challenging traditional monopolistic intermediaries. Through these eCommerce ecosystems, everyone could easily become an e-merchant, and eCommerce platforms provide extra incentives to sellers with convenient marketing and buyer-access channels and resources.

  • Problem:

While most of decentralized eCommerce platforms don’t have their own inventory, the illegal products, uploaded by some problematic sellers, can spread more easily than ever before. Such risk can be quite harmful to both buyers and cybermarkets.

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Detection System in an eCommerce Service

  • Challenges:

More specifjcally, though almost every eCommerce service has its own detection system, this strategy doesn’t work online well because sellers could easily hack the detection system.

seller audit algorithm product query buyer eCommerce service submit return purchase result list report reject upload pass

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Pornographic Products Detection Dataset (PPDD)

strategy: locate target (purchased) products -> extract 2-hour buyers’ seeking behavior logs before purchasing From Aug. 1, 2016 To Sep. 1, 2018 From Nov. 3, 2018 To Nov. 16, 2018 From Dec. 3, 2018 To Dec. 16, 2018 Accumulation in History Online Recalled Pool

  • Simple Idea:

With the local training dataset, pornographic product detection can be a straightforward binary classifjcation problem.

  • Brutal Reality:

When the current learning algorithm fjnds a seller is listing a pornographic product, the seller could easily change the product title or description and release it again with a new seller/product ID, which means pornographic products and their sellers hide like chameleons in the eCommerce ecosystem while traditional learning algorithms can hardly detect them efgectively.

https://github.com/GuoxiuHe/BIRD

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Product Content Statistics in PPDD

  • Question:

Is it true that sellers often change the product content?

20 40 60 80 100

product content word

0.00 0.02 0.04 0.06 0.08

proportion

local pornography local normal

  • nline pornography
  • nline normal

Signifjcant difgerence Hard to distinguish

LN-LP ON-OP LN-ON LP-OP product 2.4515 0.3317 0.0802 2.6292

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Performance Comparison With Text Classifjcation Baselines

  • Question:

Do text classifjers work online?

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interaction Statistics between Buyers and Products

  • Opportunity:

Employing camoufmaged content can be a double-edged sword.

  • Example:

When buyers search for ‘porn video USB’, which is an illegal query, via Taobao, they won’t get any result.

  • Berrypicking (Marcia Bates, 1989):

In order to locate what they are looking for, buyers will have to update the query content a few times and also check/consume the retrieved products carefully.

local pornography local normal

(a) numbers of queries in each session

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5

numbers

local pornography local normal

(b) numbers of records in each session

5 10 15 20 25

numbers

1 2 3 4 5 6 7

(c) number of times buyers purchase products

0.0 0.2 0.4 0.6 0.8

proportion

local pornography

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Performance Comparison with Difgerent Features Combinations

  • Question:

Is simple behavioral information suffjcient to distinguish the pornographic products?

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Berrypicking Tree

Berrypicking Tree buyer query product target root branch1 branch2 branch3 branch4

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BerryPIcking TRee MoDel (BIRD): Branch Representation

branch4 branch3 branch2 branch1 bi-BPTRU backward latent intent mlp

y′ x4 x3 x2 x1

q4 q4 q3 q2 q2 q1 q1

pm

ˆ p p42 p42 p41 p41 p21 p21 p13 p13 p12 p12 p11 p11 a43 a42 a41 a21 a13 a12 a11 a4 a2 a1 q3 p1 p2 p4 p43 root bi-BPTRU bi-BPTRU bi-BPTRU mlp hidden semantics hidden semantics hidden semantics hidden semantics forward latent intent vector prune mechanism attention combine gate bi-BPTRU label average pooling

Attention Gate Attention Gate Attention Gate

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BerryPIcking TRee MoDel (BIRD): Tree Representation

branch4 branch3 branch2 branch1 bi-BPTRU backward latent intent mlp

y′ x4 x3 x2 x1

q4 q4 q3 q2 q2 q1 q1

pm

ˆ p p42 p42 p41 p41 p21 p21 p13 p13 p12 p12 p11 p11 a43 a42 a41 a21 a13 a12 a11 a4 a2 a1 q3 p1 p2 p4 p43 root bi-BPTRU bi-BPTRU bi-BPTRU mlp hidden semantics hidden semantics hidden semantics hidden semantics forward latent intent vector prune mechanism attention combine gate bi-BPTRU label average pooling

Recurrent Neural Network

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Berrypicking Tree Recurrent Unit (BPTRU)

  • Motivation:

As the buyer is the root in the berrypicking tree, besides the semantics hidden in the sequence of branches, we also explore the latent purchase intent of buyer among all the information seeking efgorts in the tree.

two hidden gates to determine the combination of the previous hidden state (latent intent) and the current branch. an interact gate to supplement the joint information.

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BerryPIcking TRee MoDel (BIRD): Pruning Mechanism

branch4 branch3 branch2 branch1 bi-BPTRU backward latent intent mlp

y′ x4 x3 x2 x1

q4 q4 q3 q2 q2 q1 q1

pm

ˆ p p42 p42 p41 p41 p21 p21 p13 p13 p12 p12 p11 p11 a43 a42 a41 a21 a13 a12 a11 a4 a2 a1 q3 p1 p2 p4 p43 root bi-BPTRU bi-BPTRU bi-BPTRU mlp hidden semantics hidden semantics hidden semantics hidden semantics forward latent intent vector prune mechanism attention combine gate bi-BPTRU label average pooling

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Pruning Mechanism

  • Motivation:

User’s behavior, in the eCommerce environment, can be somehow noisy.

  • Example:

For instance, in a 2-hour window, buyer’s search and browsing behavior may focus on multiple information needs, e.g., looking for normal products and also a pornographic product, which might pollute the target berrypicking tree.

consine similarity sigmoid the last branch contains the target (purchased) product

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

BerryPIcking TRee MoDel (BIRD): Overview

Berrypicking Tree buyer query product target root branch1 branch2 branch3 branch4

branch4 branch3 branch2 branch1 bi-BPTRU backward latent intent mlp y′ x4 x3 x2 x1 q4 q4 q3 q2 q2 q1 q1 pm ˆ p p42 p42 p41 p41 p21 p21 p13 p13 p12 p12 p11 p11 a43 a42 a41 a21 a13 a12 a11 a4 a2 a1 q3 p1 p2 p4 p43 root bi-BPTRU bi-BPTRU bi-BPTRU mlp hidden semantics hidden semantics hidden semantics hidden semantics forward latent intent vector prune mechanism attention combine gate bi-BPTRU label average pooling
slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Performance Comparison with Base Models Built on Berrypicking Tree

  • Question:

Can BIRD including BPTRU and PM outperform other models built on Berrypicking Tree?

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Performance Comparison among the Proposed BIRD and Several Representative Models

SVM SWEMHiera SVM(queries) SRU(query) SRU LSTM(query) LSTM BPTRU BIRD

(a) Online Test 1

50 100 150 200 250 300 350 400

scores(%)

110.14 136.52 128.42 193.70 176.55 170.25 160.51 272.84 313.44 P R F1 F2 AP NDCG SVM SWEMHiera SVM(queries) SRU(query) SRU LSTM(query) LSTM BPTRU BIRD

(b) Online Test 2

50 100 150 200 250 300 350 400 127.52 158.98 163.59 214.58 202.61 207.34 240.02 309.84 346.68 P R F1 F2 AP NDCG

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Conclusions

  • Task:

We raise the question of automatically detecting pornographic products in an eCommerce ecosystem, which, to the best of our knowledge, is the fjrst inquiry efgort to this problem.

  • Model:

We propose an innovative algorithm, BIRD, to locate the pornographic products by leveraging the massive buyers’ information seeking data. In particular, the berrypicking tree with pruning is used to encapsulate the buyers’ seeking behavior, and the hidden semantics and latent buyer intent are encoded for efgective detection.

  • Dataset and Experiment:

In order to prove the hypothesis, we collect a large product plus buyers’ seeking behavior dataset from one of the world largest eCommerce sites. Extensive online experimental results show that the proposed model can successfully identify the pornographic products and outperform a number of alternative baselines. And we make the codes and dataset publicly available.

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Q&A

Thanks for Listening!

E-mail: guoxiu.he@whu.edu.cn Github: https://github.com/GuoxiuHe/BIRD