CI SC 879 - Machine Learning for Solving Systems Problems
Presented by: Ashique Mahmood
Dept of Computer & Information Sciences University of Delaware
Learning to Detect Phishing Emails
Ian Fette Norman Sadeh Anthony Tomasic
(School of CS, CMU)
Learning to Detect Phishing Emails Ian Fette Norman Sadeh Anthony - - PowerPoint PPT Presentation
Learning to Detect Phishing Emails Ian Fette Norman Sadeh Anthony Tomasic (School of CS, CMU) Presented by: Ashique Mahmood Dept of Computer & Information Sciences University of Delaware CI SC 879 - Machine Learning for Solving Systems
CI SC 879 - Machine Learning for Solving Systems Problems
Dept of Computer & Information Sciences University of Delaware
(School of CS, CMU)
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
Dataset Feature Extraction
10-fold cross validation Training
Tree) Testing
tenth of the dataset)
( Mix of “clean” and “phishing” emails ) ( using scripts) Training the model and testing - together
10-fold Cross-validation :
The dataset is divided into 10 distinct parts. Each part is Tested using the other 9 parts as training data.
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
Ex: http://192.168.0.1/ebay.cgi?fix_account
WHOIS query, to detect for how long the domain was active
<a href=“badsite.com”>paypal.com</a>
Non-modal : not the most frequently linked domain
CI SC 879 - Machine Learning for Solving Systems Problems
MIME type text/html indicates possible phishing attack
does the string “javascript” appears in the email?
Output from stand-alone spam-filters is also a feature, which indicates “ham” or “spam”. (SpamAssassin is used for PILFER)
CI SC 879 - Machine Learning for Solving Systems Problems
Count of how many distinct domains are present in the email, starting with http:// or https://
Maximum no. of dots contained in any of the links. http://www.my-bank.update.data.com http://www.google.com/url?q=http://www.badsite.com
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
v
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
v
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
CI SC 879 - Machine Learning for Solving Systems Problems
ham phish phish
phish ham ham