Mining bipartite graphs to improve semantic pedophile activity - - PowerPoint PPT Presentation

mining bipartite graphs to improve semantic pedophile
SMART_READER_LITE
LIVE PREVIEW

Mining bipartite graphs to improve semantic pedophile activity - - PowerPoint PPT Presentation

Mining bipartite graphs to improve semantic pedophile activity detection (short paper) R. Fournier , M. Danisch L2TI / Institut Galile Universit Paris-Nord, Sorbonne Paris-Cit LIP6 CNRS et Universit Pierre et Marie Curie May 28th, 2014


slide-1
SLIDE 1

Mining bipartite graphs to improve semantic pedophile activity detection (short paper)

  • R. Fournier, M. Danisch

L2TI / Institut Galilée Université Paris-Nord, Sorbonne Paris-Cité LIP6 CNRS et Université Pierre et Marie Curie

May 28th, 2014

slide-2
SLIDE 2

Context

Paedophile activity in P2P systems

Children victimization Danger for innocent users Policy making issues

Recent research

Identification of large file providers Collection of large sets of queries Design and validation of a detection tool [IPM 2012]

Extend this effort

  • R. Fournier, M. Danisch

Mining bipartite graphs… May 28th, 2014 2 / 8

slide-3
SLIDE 3

Datasets

Queries submitted to eDonkey search engine 2007 10 weeks, 100 millions queries, 24 million IP addresses Set of queries : qi = (t,u,k1,k2,...,kn)

t timestamp u user information (IP address, connection port) (k1,k2,...,kn) sequence of keywords

Duly anonymised

  • R. Fournier, M. Danisch

Mining bipartite graphs… May 28th, 2014 3 / 8

slide-4
SLIDE 4

Pedophile detection tool

4 semantic categories of paedophile queries

? and sex matches child explicit ? matches with age<17 and

  • r child

( )? matches agesuffix sex familyparents and familychild and sex ? matches tag as paedophile query

False positives (``sexy daddy destinys child'') False negatives (``pjk 12yo'') Focus on reduced false positives rate (< 1.4%) False negatives rate: 24.5%

  • R. Fournier, M. Danisch

Mining bipartite graphs… May 28th, 2014 4 / 8

slide-5
SLIDE 5

Our approach

Goals

Reduce the number of queries to process manually Validate existing classification

Bipartite graphs and communities

q1 q2 q3 q4 q5 u1 u2 u3 u4 1 1

sC(r) = ∑u∈V(r) |C∩R(u)\{r}| ∑u∈V(r) |R(u)\{r}| s1(q2) = 0.5 s′

C(r) =

1 |V(r)| ∑

u∈V(r)

|C∩R(u)\{r} R(u)\{r} |

  • R. Fournier, M. Danisch

Mining bipartite graphs… May 28th, 2014 5 / 8

slide-6
SLIDE 6

Results

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

8

0.0 0.2 0.4 0.6 0.8 1.0 SCORE 10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

8

RANK OF THE REQUEST ACCORDING TO ITS SCORE 0.0 0.2 0.4 0.6 0.8 1.0 P(TRUE | RANKED X)

4,518 queries (out of 12,858) with score 1 not detected new keywords and combinations obtained

further analysis required to avoid increased FP rate

  • R. Fournier, M. Danisch

Mining bipartite graphs… May 28th, 2014 6 / 8

slide-7
SLIDE 7

Results

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

8

0.0 0.2 0.4 0.6 0.8 1.0 SCORE 10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

8

0.0 0.2 0.4 0.6 0.8 1.0 P(1 | RANKED X) 10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

8

RANK OF THE REQUEST ACCORDING TO ITS SCORE 0.00 0.02 0.04 0.06 0.08 0.10 0.12 P(234 | RANKED X)

categories 2,3 and 4 fewly connected with category 1

  • R. Fournier, M. Danisch

Mining bipartite graphs… May 28th, 2014 7 / 8

slide-8
SLIDE 8

Conclusion

Measure of topological similarity between queries Limitation of the number of errors to process manually Semantic and topological categories seem linked

Future work

Explore other scoring functions Explore local community completion methods Update the original filter by refining its lists of keywords

introduce new categories subdivide existing categories

  • R. Fournier, M. Danisch

Mining bipartite graphs… May 28th, 2014 8 / 8

slide-9
SLIDE 9

Thank you for your attention. Questions? raphael.fournier@lip6.fr

slide-10
SLIDE 10