Presentation: Scalable Detection of Botnets Based on DGA Presentation - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/333652427 Presentation: Scalable Detection of Botnets Based on DGA Presentation · June 2019 DOI: 10.13140/RG.2.2.24134.32322 CITATIONS READS 0 26 3 authors: Mattia Zago Manuel Gil Pérez University of Murcia University of Murcia 19 PUBLICATIONS 32 CITATIONS 87 PUBLICATIONS 461 CITATIONS SEE PROFILE SEE PROFILE Gregorio Martinez Perez University of Murcia 252 PUBLICATIONS 2,224 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: AuthCODE View project Selfnet View project All content following this page was uploaded by Mattia Zago on 07 June 2019. The user has requested enhancement of the downloaded file.

S CALABLE D ETECTION OF B OTNETS BASED ON DGA E FFICIENT F EATURE D ISCOVERY P ROCESS IN M ACHINE L EARNING T ECHNIQUES Speaker: Mattia Zago Authors: M. Zago, M. Gil Pérez, G. Martínez Pérez Available Online – Soft Computing – Q2 IF: 2.367 Zago, M., Gil Pérez, M. & Martínez Pérez, G. Soft Comput (2019). 10.1007/s00500-018-03703-8

O UR A GENDA FOR T ODAY Background & Motivation State of The  Subject localisation  Relevance Art  Objective  Machine learning Analysis algorithms  Feature sets and families  Exploratory feature analysis Challenges  Classification results  Binary problem  Multiclass problem  Data  Best practices March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 2

W HAT IS A B OTNET ? March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 3

DGA: D OMAIN G ENERATION A LGORITHM March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 4

DGA: D OMAIN G ENERATION A LGORITHM Objective Analyse DNS queries to detect malicious AGDs connections March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 5

A PPROACHES TO THE DETECTION – ML March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 6

S TATE OF T HE A RT REGARDING A LGORITHMS Identified Since Selected 2010 +30 More than researches 100 articles We have identified six comparison metrics Machine Learning approach Type of application (either supervised, non (e.g., binary or multiclass classifier, supervised) correlation, anomaly detection, 01 02 etc.) Family of features used Comparisons 06 03 (i.e., either Context-Free or with other works, approaches or Context-Aware) algorithms 05 04 Real-time analysis Achieved results (i.e., online detection, (either poor, average, good and performance scalability, etc.) excellent) March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 7

A PPROACHES TO DGA DETECTION – F EATURES L ANGUAGE A NALYSIS Context-Free Usage of Natural Language Process A feature that is related only to a techniques to estimate FQDN and thus is independent of if the domain is legit or not contextual information, including, but (i.e. test the randomness) not limited to, timing, origin or any Examples other environment configuration. – Length of the string – Entropy – Frequency analysis – Vowels ratio DNS Q UERY A NALYSIS Context-Aware Decode sniffed queries and responses A feature that is dependent on the and look for “troublesome” indicators specific malware sample execution, that may suggest a regular pattern. which is realised in a precise environment with a specific config. Examples and in a particular time frame. Num. of connections – Num. of IP addresses– Num. of NXDomains – Longevity of domain – March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 8

S TATE OF T HE A RT REGARDING F EATURES Used By Code Description 11 14 16 24 28 33 34 38 45 46 48 49 51 52 54 56 57 58 59 65 66 Tot. 3 5 9 NLP-L-x String length ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 16 NLP-LDN Number of domain levels 3 ✔ ✔ ✔ NLP-R-NUM-x Ratio of numerical characters ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 8 NLP-R-VOW-x Ratio of vowel characters 4 ✔ ✔ ✔ ✔ NLP-R-CON-x Ratio of consonants characters 4 ✔ ✔ ✔ ✔ NLP-LANG Language hypothesis 2 ✔ ✔ NLP-LC-C Longest consecutive cons. sequence 5 ✔ ✔ ✔ ✔ ✔ NLP-LC-V Longest consecutive vowel sequence ✔ 1 NLP-LC-D Longest consecutive number seq. 3 ✔ ✔ ✔ NLP-COV Covariance matrix ✔ 1 NLP-R-MC Ratio of meaningful characters 3 ✔ ✔ ✔ NLP-LMS Length of longest meaningful string 0 NLP-WLU Number of “word-like” units 1 ✔ NLP-SQS Domain squatting score 1 ✔ NLP-LED Levenshtein Edit Distance ✔ ✔ 2 NLP-nG-FR Frequency distribution (histogram) 4 ✔ ✔ ✔ ✔ NLP-nG-E Entropy 11 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ NLP-nG-COV Covariance 1 ✔ NLP-nG-MEAN Mean of frequencies 1 ✔ NLP-nG-MED Median of frequencies ✔ 1 NLP-nG-VAR Variance of frequencies 1 ✔ NLP-nG-STD Standard deviation of frequencies ✔ 1 NLP-nG-PRO Pronounceability score 3 ✔ ✔ ✔ NLP-nG-NORM Normality score 3 ✔ ✔ ✔ NLP-nG-PRT Transition probability 2 ✔ ✔ NLP-nG-PRA Probability of appearance 2 ✔ ✔ NLP-nG-PRI Index probability ✔ ✔ 2 NLP-nG-DST-KL Kullback-Leiber divergence 2 ✔ ✔ NLP-nG-DST-JI Jaccard Index measure ✔ ✔ ✔ ✔ 4 NLP-nG-DST-TH Distance - Threshold 1 ✔ NLP-nG-DST-AF Distance - Avg. frequency 1 ✔ NLP-nG-DST-AC Distance - Avg. count 2 ✔ ✔ Total 1 4 7 3 1 4 7 1 9 1 1 2 7 5 8 3 8 6 1 2 3 4 5 3 March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 9

S TATE OF T HE A RT REGARDING F EATURES Used By Code Description 11 14 16 24 28 33 34 38 45 46 48 49 51 52 54 56 57 58 59 65 66 Tot. 3 5 9 NLP-L-x String length ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 16 NLP-LDN Number of domain levels 3 ✔ ✔ ✔ NLP-R-NUM-x Ratio of numerical characters ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 8 NLP-R-VOW-x Ratio of vowel characters 4 ✔ ✔ ✔ ✔ NLP-R-CON-x Ratio of consonants characters 4 ✔ ✔ ✔ ✔ NLP-LANG Language hypothesis 2 ✔ ✔ NLP-LC-C Longest consecutive cons. sequence 5 ✔ ✔ ✔ ✔ ✔ NLP-LC-V Longest consecutive vowel sequence ✔ 1 NLP-LC-D Longest consecutive number seq. 3 ✔ ✔ ✔ NLP-COV Covariance matrix ✔ 1 NLP-R-MC Ratio of meaningful characters 3 ✔ ✔ ✔ NLP-LMS Length of longest meaningful string 0 NLP-WLU Number of “word-like” units 1 ✔ NLP-SQS Domain squatting score 1 ✔ NLP-LED Levenshtein Edit Distance ✔ ✔ 2 NLP-nG-FR Frequency distribution (histogram) 4 ✔ ✔ ✔ ✔ NLP-nG-E Entropy 11 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ NLP-nG-COV Covariance 1 ✔ NLP-nG-MEAN Mean of frequencies 1 ✔ NLP-nG-MED Median of frequencies ✔ 1 NLP-nG-VAR Variance of frequencies 1 ✔ NLP-nG-STD Standard deviation of frequencies ✔ 1 NLP-nG-PRO Pronounceability score 3 ✔ ✔ ✔ NLP-nG-NORM Normality score 3 ✔ ✔ ✔ NLP-nG-PRT Transition probability 2 ✔ ✔ NLP-nG-PRA Probability of appearance 2 ✔ ✔ NLP-nG-PRI Index probability ✔ ✔ 2 NLP-nG-DST-KL Kullback-Leiber divergence 2 ✔ ✔ NLP-nG-DST-JI Jaccard Index measure ✔ ✔ ✔ ✔ 4 NLP-nG-DST-TH Distance - Threshold 1 ✔ NLP-nG-DST-AF Distance - Avg. frequency 1 ✔ NLP-nG-DST-AC Distance - Avg. count 2 ✔ ✔ Total 1 4 7 3 1 4 7 1 9 1 1 2 7 5 8 3 8 6 1 2 3 4 5 3 March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 10

S TATE OF T HE A RT REGARDING F EATURES - E XPLORE Scatter plot of 10.000 FQDNs Axis: Horizontal Length • Vertical Entropy • Dots: Green Legitimate • Other colours Malware • March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 11

S TATE OF T HE A RT REGARDING F EATURES - E XPLORE Scatter plot of 10.000 FQDNs Axis: Horizontal Length • Vertical Entropy • Dots: Light Blue Legitimate • Red Malware • March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 12

E XAMPLE OF F EATURE A NALYSIS Features that are interesting for Features that are interesting for their values their shapes Length of domain name Longest Consecutive (excluding TLD) Consonant Sequence March 2019 Mattia Zago – Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in ML Techniques 13

Presentation: Scalable Detection of Botnets Based on DGA Presentation - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/333652427 Presentation: Scalable Detection of Botnets Based on DGA Presentation June 2019 DOI: 10.13140/RG.2.2.24134.32322 CITATIONS

BotNets BotNets- Cybe Cyber T r Torrirism orrirism Ba Batt ttling ling th the t e thr

faster c&c detection - strategies for finding algorithmically generated domain names

Black Market Botnets Black Market Botnets Nathan Friess Friess Nathan John Aycock Aycock

Effective features for detecting Effective features for detecting IRC botnets IRC botnets

Botnets CS 598: Advanced Internet Presented by: Imranul Hoque How to Study Botnets? Passive

BOTNETS GRAD SEC NOV 21 2017 TODAYS PAPERS BOTNETS Collection of compromised machines

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

DGA Oncors Past, Present, and Future Dustin Best Russell W. Smith Hannah Webb SEPTEMBER

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

1 Domain Flux-based DGA Botnet Detection Using Feedforward Neural Network Md. Ishtiaq Ashiq

Spamming Botnets: Signatures and Characteris5cs Xie et al.

Bot BotNets Nets- Cy Cyber ber To Torr rriris irism Battling Battling the the threats

NECST laboratory BOTNETS FUNDING ETC. . @syssecproject SysSec Project:

Botnets Secret Puppetry With Computers Balaji Prasad T.K ( bpt@email.arizona.edu ) Nupur

Botnets: a Growing Threat Increasing awareness, but there is a dearth of hard facts especially

Detecting Botnets with Temporal Persistence Jaideep Chandrashekar Frederic

Legal Review An Association TRENDS Special Focus sponsored by VENABLE LLP Venable is pleased

STEM Schools Recruit and retain highly Use principles of inquiry qualified, committed and

Domain-Specific Reduction of Language Model Databases: Overcoming Chatbot Implementation

Predictive Coding: The g Future of eDiscovery presenters Stephanie A. Tess Blair Scott

Agenda Provisions Policy Punishable Acts Penalties Penalties Enforcement Issues Status

FATIGUE AND ACL INJURY RISK Mason Chen Stanford Online High School 2020 JMP US Discovery

Pequea Valley School District Cathy Koenig August 30,2016 Eligibility for Gifted programming

LCAP Meeting 10.10.19 Dr. Veronica Ortega Agenda Introductions What is the LCAP?

Presentation: Scalable Detection of Botnets Based on DGA Presentation - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/333652427 Presentation: Scalable Detection of Botnets Based on DGA Presentation June 2019 DOI: 10.13140/RG.2.2.24134.32322 CITATIONS

BotNets BotNets- Cybe Cyber T r Torrirism orrirism Ba Batt ttling ling th the t e thr

faster c&amp;c detection - strategies for finding algorithmically generated domain names

Black Market Botnets Black Market Botnets Nathan Friess Friess Nathan John Aycock Aycock

Effective features for detecting Effective features for detecting IRC botnets IRC botnets

Botnets CS 598: Advanced Internet Presented by: Imranul Hoque How to Study Botnets? Passive

BOTNETS GRAD SEC NOV 21 2017 TODAYS PAPERS BOTNETS Collection of compromised machines

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

DGA Oncors Past, Present, and Future Dustin Best Russell W. Smith Hannah Webb SEPTEMBER

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

1 Domain Flux-based DGA Botnet Detection Using Feedforward Neural Network Md. Ishtiaq Ashiq

Spamming Botnets: Signatures and Characteris5cs Xie et al.

Bot BotNets Nets- Cy Cyber ber To Torr rriris irism Battling Battling the the threats

NECST laboratory BOTNETS FUNDING ETC. . @syssecproject SysSec Project:

Botnets Secret Puppetry With Computers Balaji Prasad T.K ( bpt@email.arizona.edu ) Nupur

Botnets: a Growing Threat Increasing awareness, but there is a dearth of hard facts especially

Detecting Botnets with Temporal Persistence Jaideep Chandrashekar Frederic

Legal Review An Association TRENDS Special Focus sponsored by VENABLE LLP Venable is pleased

STEM Schools Recruit and retain highly Use principles of inquiry qualified, committed and

Domain-Specific Reduction of Language Model Databases: Overcoming Chatbot Implementation

Predictive Coding: The g Future of eDiscovery presenters Stephanie A. Tess Blair Scott

Agenda Provisions Policy Punishable Acts Penalties Penalties Enforcement Issues Status

FATIGUE AND ACL INJURY RISK Mason Chen Stanford Online High School 2020 JMP US Discovery

Pequea Valley School District Cathy Koenig August 30,2016 Eligibility for Gifted programming

LCAP Meeting 10.10.19 Dr. Veronica Ortega Agenda Introductions What is the LCAP?

faster c&c detection - strategies for finding algorithmically generated domain names