Awais Rashid
Your words betray you! The role of language in cyber crime inves9ga9ons
Your words betray you! The role of language in cyber crime - - PowerPoint PPT Presentation
Your words betray you! The role of language in cyber crime inves9ga9ons Awais Rashid Digital World Online World Physical World dual use P2P Study 1.6% of searches and 2.4% responses on Gnutella network alone (Study by Hughes et al.
Awais Rashid
Your words betray you! The role of language in cyber crime inves9ga9ons
Physical World Online World
Digital World
alone (Study by Hughes et al. 2006)
– Approx. 600,000 searches per day on Gnutella alone
and 88% of responses.
– Vocabulary changes over Gme.
P2P Study
SEARCH FREQUENCY
Topic Popularity
Top 100 Frequent Searches
Core of Distributors
Chat and Social Networking
Digital Personas
Do you Know Who you are Talking to?
Isis: ProtecGng Children in Online Social Networks (EPSRC/ESRC) iCOP: IdenGfying and Catching Originators in P2P Networks (EC Safer Internet Programme)
Experience from
DetecGng DecepGve Digital Personas
StylisGc Language “Fingerprint”
Individual_1 New text Individual_2 New text Individual_3 New text Individual_4 New textAge and Gender Analysis
Distance Measure Male Female Reference Data Sets Stylis;c Features Classifier Word level SyntacGc level SemanGc level
No DecepGon – Age (Precision)
10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Level 1 Level 2 Level 3 Level 4 Level 5
Threshold (%) Precision (%)
72.24% 77.35%
No DecepGon – Age (Recall)
10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Level 1 Level 2 Level 3 Level 4 Level 5
Threshold (%) Recall (%)
No DecepGon - Gender
10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Recall Precision
Threshold (%) Recall / Precision (%)
66.86% 71.07%
DecepGon DetecGon
DecepGon DetecGon
DecepGon DetecGon
commercialisaGon via a spin-out company (RelaGve Insight)
and EU Policy frameworks.
UniversiGes UK and Research Councils UK (2011)
demonstrated to the Prime Minister at WeProtect, Dec. 2014
Is it being used?
DetecGng Specialist TacGcs, e.g., Vocabulary
§ Using query analysis to automaGcally triage and idenGfy potenGal candidates for new CSA media
§ New text analysis techniques to automaGcally flag potenGal CSA media based on their filename
techniques to assess CSA content
30
DetecGng new/unknown CSA media in P2P Networks
§ Compiling a CSA dataset § Filenames = short text samples § Presence of non-standard forms & “specialised” vocabulary
31
Filename ClassificaGon
Key challenges
§ Manual collecGon through LE à 268 CSA filenames
§ Legal pornography sites à 10K non-CSA filenames § simulate real-life data distribuGon in P2P
32
Filename ClassificaGon (2)
Dataset
§ Seman9c features
33
Filename ClassificaGon (3)
Feature Selec;on
Original filename
ptl0lita12yo.jpeg
Seman9c Feats.
[paedo_keyword] [child_ref]
§ Character n-grams
34
Filename ClassificaGon (4)
Feature Selec;on
Original filename
ptl0lita12yo.jpeg
pt tl l0 0l li it ta a1 12 2y yo
ptl tl0 l0l 0li lit ita ta1 a12 12y 2yo
ptl0 tl0l l0li 0lit lita ita1 ta12 a12y 12yo
§ Support Vector Machines (LibShortText) § 5-fold cross-validaGon § EvaluaGon:
35
Filename ClassificaGon (5)
Experimental Setup
36
Filename ClassificaGon (6)
Results
Scores SVM classifier (%) Precision Recall F-score Seman;c feats. CSA 5.7 21.3 9.0 Non-CSA 97.7 90.6 94.0
89.8 62.3 73.6 Non-CSA 99.0 99.8 99.4 Combined CSA 89.9 66.1 76.1 Non-CSA 99.1 99.8 99.5
The iCOP Toolkit
– ParGcipants from 8 European countries and Interpol – Hands-on sessions on live P2P data
Is it being used?
Isis
Walkerdine (2013). “Who Am I? Analysing Digital Personas in Cyber Crime Inves;ga;ons”, IEEE Computer, 46(4).
Walkerdine, P. Rayson (2014). “Safeguarding Cyborg Childhoods: Incorpora;ng the On/Offline Behaviour of Children into Everyday Social Work Prac;ces”, BriGsh Journal of Social Work.
Further InformaGon
iCOP
(2014). “iCOP: Automa;cally Iden;fying New Child Abuse Media in P2P Networks”, IEEE Symposium on Security and Privacy Workshops 2014: 124-131
(2016). “iCOP: live forensics to reveal previously unknown criminal media on P2P networks”, Digital InvesGgaGon, 18, pp. 50-64.
Further InformaGon
General
Online Data Mining Technology Intended for Law Enforcement”, ACM CompuGng Surveys, 48(1).
Ethics in a Digital World” IEEE Computer 42(6): 34-41.
“Managing emergent ethical concerns for soZware engineering in society”, Proc. ICSE 2015, Soqware Engineering in Society, pp. 523-526. IEEE
Further InformaGon