 
              Procopiou Anna 931769 Professor: Dr. Athanasopoulos Elias 1
Articles 1. Detecting Malware Domains at the Upper DNS Hierarchy -M.Antonakakis et. al 2. ZMap: Fast Internet-wide Scanning and Its Security Applications -Z. Durumeric et. al 2
Agenda 1. ARTICLE 1: Detecting Malware Domains at the Upper DNS Hierarchy • Main idea of the article • Background and Related work • Author ’ s proposal • Differences from prior art • Statistical features • Evaluation • Conclusions 3
Detecting Malware Domains at the Upper DNS Hierarchy Key Ideas MISCREANTS USE DNS TO BUILD MALWARE-RELATED DOMAIN NAMES DETECTION SYSTEM  KOPIS MALICIOUS NETWORK INFRASTRUCTURES FOR MALWARE C&C 4
BACKGROUND NEEDED Domain Name System (DNS) 5
Domain name system (DNS) hierarchy 6
Domain Name System ty types of of DN DNS queries — recursive, iterative 7
DNS Query Recursive query 8
DNS Query Iterative query www.google.com. 9
Resource Record Resource Record (RR)? Mapping from a domain name to IP Authoritative domain name tuples Who is looking up what and where is pointing? Q j (d) = (T j , R j , d, IPs j ) 10
Benign websites An attacker compromises a benign site (domain) to How distribute malware, or perform other nefarious activity (e.g., Phishing, Spam). attackers Malicious site leverage An attacker creates a malicious site (domain) to distribute malware, or perform other nefarious domains in activity (e.g., Phishing, Spam). their attacks? Command & Control (C&C) What an attacker uses to execute their own code on the victim’s computer using payload.
Related Work 1. Measurements and Laboratory Simulations of the Upper DNS Hierarchy, D. Wessels, et. Al • Examining DNS caching behavior of RECURSIVE DNS serves from the point of view of TLD and AuthNS servers 2. An Internet Wide View into DNS Lookup Patterns, Hao et al. • DNS look-up patterns measured from .com TLD servers • DRAWBACK: not discuss how the findings may be leveraged for detection purposes 3. D. Dagon, C. Zou, and W. Lee. Modeling botnet propagation using time zones. In In Proceedings of the 13 th Network and Distributed System Security Symposium NDSS, 2006 4. S. Staniford, V. Paxson, and N. Weaver. How to own the internet in your spare time. In Proceedings of the 11th USENIX Security Symposium, pages 149 – 167, Berkeley, CA, USA, 2002. USENIX Association 5. N. Weaver, S. Staniford, and V. Paxson. Very fast containment of scanning worms. In In Proceedings of the 13th USENIX Security Symposium, pages 29 – 44, 2004 6. M. P. Collins, T. J. Shimeall, S. Faber, J. Janies, R. Weaver, M. De Shon, and J. Kadane. Using uncleanliness to predict future botnet addresses. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC ’ 07, pages 93 – 104, New York, NY, USA, 2007. ACM 7. M. Felegyhazi, C. Keibich, and V. Paxson. On the potential of proactive domain blacklisting. In Third USENIX LEET Workshop, 2010 12
Motivation of the paper IP-based blocking techniques cannot stay up-to-date due to the humongous number of domains attackers use for C&C DNSBL-based technologies cannot keep up with the volume of new domains botnets use Malware families utilize domains to discover the up-to-date C&C addresses Time gap between the release of a malware and its discovery 13
Response Problem malicious use of DNS Existed solution static domain blacklists containing known malware domains Drawback limited effectiveness as humongous number of new domains appear on the Internet every day  difficult to keep them up-to-date 14
Qualities of DYNAMICALLY Detection Global visibility into DNS request and DNS operators independently deploy Accurately detect malware-related system response messages related to large the system and detect malware- domains even in the absence of DNS zones related domains within their reputation data for the IP address authority zones space pointed to by the domains 15
Qualities of DYNAMICALLY Detection system GLOBAL VISIBILITY INTO DNS DNS OPERATORS INDEPENDENTLY ACCURATELY DETECT MALWARE- REQUEST DEPLOY THE SYSTEM AND DETECT RELATED DOMAINS EVEN IN THE MALWARE-RELATED DOMAINS ABSENCE OF REPUTATION DATA FOR RESPONSE MESSAGES RELATED TO WITHIN THEIR AUTHORITY ZONES THE IP ADDRESS SPACE POINTED TO LARGE DNS ZONES BY THE DOMAINS Important as IP Practical, low-cost reputation data is Early warning and time-efficient difficult to accumulated detection & response and fragile 16
New Approach: Kopis DNS hierarchical distributed database thus in some point we can have global visibility TLDs & AuthNSs Why not roots? Even we could have the same visibility the caching effect is too high to actually make that signal significant 17
System’s Goal Detect malware-related domain names as they rise WITHOUT need of a sample 18
Passively monitors DNS traffic at the upper levels of DNS hierarchy Introduces alternative IP-reputation agnostic Contributions classification signal for DNS Of Kopis Identify rising botnets weeks before corresponding malware is found Detect accurately malware domains by analysing global DNS query resolution patterns 19
Notos Prior Art Exposure 20
Differences from Prior Art NOTOS & EXPOSURE KOPIS • Can be applied on the recursive ISP • Global visibility on requesters focused in specific set of zones • Almost global visibility on the point of view of zones (e.g. • Predictions on domains only in that zone .ru, .cn) • Predict accurately malware-related domain without the need • Partial visibility of the point of view of requesters of IP reputation data • Rely heavily on features based on IP reputation • Extracts statistical features specifically chosen to harvest the “malware signal” as seen from the upper DNS hierarchy 21 • Leverage RDNS-level DNS traffic monitoring
Results SYSTEMS TRUE POSITIVES FALSE POSITIVES NOTOS 96.80% 0.38% EXPOSURE 98% 7.90% KOPIS 98.40% 0.3%-0.5% 22
System Overview Training Mode Operation Mode 23
Requester Diversity Statistical Feature Requester Profile Families Resolved IPs Reputation 24
̵ ̵ • Req equester Div Diversi sity er 1 0.9 0.8 0.7 CDF Characterize if the machines that query a given 0.6 • 0.5 domain name are localized or globally 0.4 0.3 distributed al 0.2 0.1 0 0 200 400 600 800 1000 1200 Malicious domain names have a different (a) AS Diversity average distribution from legitimate ones • 1 0.9 0.8 For both features the benign domain names 0.7 CDF 0.6 have a bimodal distribution 0.5 0.4 0.3 0.2 •  ain 0.1 Malicious domain names are spread across 0 0 20 40 60 80 100 the spectrum (b) CC Diversity m Malicious Benign The malwarerelated domain names cover a larger spectrum of diversities: Maybe due to the success of the malware distribution mechanisms they employ • 25
Requester Profile • s. • h (A) ed Benign Req equester Malicious 0.0 0.1 0.2 0.3 0.4 PDF Profi file le • We model differently 0 5 10 15 20 Determine if machines resolving the Number of requester IPs per CIDR  domain names are from networks that (B) historically have been prone to infections or not Benign Malicious • 0e+00 2e−04 4e−04 • PDF Assign higher weight to servers with many clients – ke it significantly harder to dilute • Not all querying machines have 0 1000 2000 3000 4000 5000 similar characteristics the overall classification signal Average Weight • We would like to distinguish between requesters located in ISP/small business and home networks 26
Describes whether, and to what extent, the IP address space pointed to by Req equester r IPs- a given domain has been historically linked with known malicious activities Rep eputation or known legitimate services. (Malware Evidence, SBL Evidence, Whitelist Evidence) 27
Evaluation Key Observation from Datasets Datasets: Traffic from 1. 2 major domain registrars (8 months) 2. .ca TLD (~2months) Observations: Data reductions process (tuples) does not affect the detection ability of Kopis. Not all domain names have the same interest. Spend time to analyze the most interesting based on the lookup requester diversity (100.000 most diverse domains). 28
Model selection Random Forest Classifier(RF) • 2 – 5 day training window • RESULTS: (5day observation window) TP-rate = 98.4% FP-rate = 0.3% 29
Evaluation Overall Detection Performance 30
Evaluation New and Previously Unclassified Domains TP-rates for different observation periods using an 80/20 train/test dataset split FP-rates for different observation periods using an 80/20 train/test dataset split 31
Deltas The distribution of the malware as they arrived in our Using domain name from testing dataset in 80/20 mode malware feed follow the same distribution as the botnet 32
Canadian TLD 33
IMDDOS-Start 34
IMDDOS Peak 35
Botnet ’ s Growth 36
IMDDOS-Early Detection • The average lookup volume every day was 438,471 with the average de  duplicated query tuples in the range of 3,883. 37
Recommend
More recommend