mining bipartite graphs to improve semantic pedophile
play

Mining bipartite graphs to improve semantic pedophile activity - PowerPoint PPT Presentation

Mining bipartite graphs to improve semantic pedophile activity detection (short paper) R. Fournier , M. Danisch L2TI / Institut Galile Universit Paris-Nord, Sorbonne Paris-Cit LIP6 CNRS et Universit Pierre et Marie Curie May 28th, 2014


  1. Mining bipartite graphs to improve semantic pedophile activity detection (short paper) R. Fournier , M. Danisch L2TI / Institut Galilée Université Paris-Nord, Sorbonne Paris-Cité LIP6 CNRS et Université Pierre et Marie Curie May 28th, 2014

  2. Context Paedophile activity in P2P systems Children victimization Danger for innocent users Policy making issues Recent research Identification of large file providers Collection of large sets of queries Design and validation of a detection tool [ IPM 2012 ] Extend this effort R. Fournier , M. Danisch Mining bipartite graphs… May 28th, 2014 2 / 8

  3. Datasets Queries submitted to eDonkey search engine 2007 10 weeks, 100 millions queries, 24 million IP addresses t timestamp u user information (IP address, connection port) Duly anonymised R. Fournier , M. Danisch Mining bipartite graphs… May 28th, 2014 3 / 8 Set of queries : q i = ( t , u , k 1 , k 2 ,..., k n ) ( k 1 , k 2 ,..., k n ) sequence of keywords

  4. Pedophile detection tool 4 semantic categories of paedophile queries May 28th, 2014 Mining bipartite graphs… R. Fournier , M. Danisch False negatives rate: 24.5% False negatives ( ``pjk 12yo'' ) False positives ( ``sexy daddy destinys child'' ) 4 / 8 query matches matches agesuffix matches matches child familyparents and with age<17 and explicit ? and sex ? familychild and sex ? ( sex or child )? tag as paedophile Focus on reduced false positives rate ( < 1 . 4 % )

  5. Our approach Goals May 28th, 2014 Mining bipartite graphs… R. Fournier , M. Danisch 1 5 / 8 Reduce the number of queries to process manually Validate existing classification Bipartite graphs and communities s C ( r ) = ∑ u ∈ V ( r ) | C ∩ R ( u ) \{ r }| 1 q1 u1 ∑ u ∈ V ( r ) | R ( u ) \{ r }| 0 q2 u2 s 1 ( q 2 ) = 0 . 5 1 q3 u3 | C ∩ R ( u ) \{ r } 0 q4 s ′ | V ( r ) | ∑ u4 C ( r ) = | R ( u ) \{ r } u ∈ V ( r ) 0 q5

  6. Results 4,518 queries (out of 12,858) with score 1 not detected May 28th, 2014 Mining bipartite graphs… R. Fournier , M. Danisch further analysis required to avoid increased FP rate new keywords and combinations obtained 6 / 8 1.0 0.8 SCORE 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 7 8 10 10 10 10 10 10 10 10 10 1.0 P(TRUE | RANKED X) 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 7 8 10 10 10 10 10 10 10 10 10 RANK OF THE REQUEST ACCORDING TO ITS SCORE

  7. Results categories 2,3 and 4 fewly connected with category 1 May 28th, 2014 Mining bipartite graphs… R. Fournier , M. Danisch 7 / 8 1.0 0.8 SCORE 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 7 8 10 10 10 10 10 10 10 10 10 1.0 P(1 | RANKED X) 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 6 7 8 10 10 10 10 10 10 10 10 10 0.12 P(234 | RANKED X) 0.10 0.08 0.06 0.04 0.02 0.00 0 1 2 3 4 5 6 7 8 10 10 10 10 10 10 10 10 10 RANK OF THE REQUEST ACCORDING TO ITS SCORE

  8. Conclusion Measure of topological similarity between queries Limitation of the number of errors to process manually Semantic and topological categories seem linked Future work Explore other scoring functions Explore local community completion methods Update the original filter by refining its lists of keywords introduce new categories subdivide existing categories R. Fournier , M. Danisch Mining bipartite graphs… May 28th, 2014 8 / 8

  9. Thank you for your attention. Questions? raphael.fournier@lip6.fr

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend