Procopiou Anna
931769 Professor: Dr. Athanasopoulos Elias
1
Procopiou Anna 931769 Professor: Dr. Athanasopoulos Elias 1 - - PowerPoint PPT Presentation
Procopiou Anna 931769 Professor: Dr. Athanasopoulos Elias 1 Articles 1. Detecting Malware Domains at the Upper DNS Hierarchy -M.Antonakakis et. al 2. ZMap: Fast Internet-wide Scanning and Its Security Applications -Z. Durumeric et. al 2
Procopiou Anna
931769 Professor: Dr. Athanasopoulos Elias
1
Articles
2
Agenda
3
Hierarchy
Detecting Malware Domains at the Upper DNS Hierarchy Key Ideas
MISCREANTS USE DNS TO BUILD MALICIOUS NETWORK INFRASTRUCTURES FOR MALWARE C&C MALWARE-RELATED DOMAIN NAMES DETECTION SYSTEM KOPIS 4
Domain Name System (DNS)
5
BACKGROUND NEEDED
6
Domain Name System ty types of
DNS queries—recursive, iterative
7
DNS Query
Recursive query
8
DNS Query
Iterative query www.google.com.
9
10
Resource Record (RR)?
Mapping from a domain name to IP
Authoritative domain name tuples Qj (d) = (Tj , Rj , d, IPsj )
Who is looking up what and where is pointing?
Benign websites An attacker compromises a benign site (domain) to distribute malware, or perform other nefarious activity (e.g., Phishing, Spam). Malicious site An attacker creates a malicious site (domain) to distribute malware, or perform other nefarious activity (e.g., Phishing, Spam). Command & Control (C&C) What an attacker uses to execute their own code
Related Work
12
1. Measurements and Laboratory Simulations of the Upper DNS Hierarchy, D. Wessels, et. Al
2. An Internet Wide View into DNS Lookup Patterns, Hao et al.
3.
Network and Distributed System Security Symposium NDSS, 2006 4.
USENIX Security Symposium, pages 149–167, Berkeley, CA, USA, 2002. USENIX Association 5.
USENIX Security Symposium, pages 29–44, 2004 6.
predict future botnet addresses. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC ’07, pages 93–104, New York, NY, USA, 2007. ACM 7.
Workshop, 2010
13
IP-based blocking techniques cannot stay up-to-date due to the humongous number of domains attackers use for C&C DNSBL-based technologies cannot keep up with the volume of new domains botnets use Malware families utilize domains to discover the up-to-date C&C addresses Time gap between the release of a malware and its discovery
Problem malicious use of DNS Existed solution static domain blacklists containing known malware domains Drawback limited effectiveness as humongous number of new domains appear on the Internet every day difficult to keep them up-to-date
14
Qualities of DYNAMICALLY Detection system
Global visibility into DNS request and response messages related to large DNS zones DNS operators independently deploy the system and detect malware- related domains within their authority zones Accurately detect malware-related domains even in the absence of reputation data for the IP address space pointed to by the domains
15
Qualities of DYNAMICALLY Detection system
GLOBAL VISIBILITY INTO DNS REQUEST RESPONSE MESSAGES RELATED TO LARGE DNS ZONES DNS OPERATORS INDEPENDENTLY DEPLOY THE SYSTEM AND DETECT MALWARE-RELATED DOMAINS WITHIN THEIR AUTHORITY ZONES ACCURATELY DETECT MALWARE- RELATED DOMAINS EVEN IN THE ABSENCE OF REPUTATION DATA FOR THE IP ADDRESS SPACE POINTED TO BY THE DOMAINS
Early warning
Important as IP reputation data is difficult to accumulated and fragile
Practical, low-cost and time-efficient detection & response
16
New Approach: Kopis
DNS hierarchical distributed database thus in some point we can have global visibility TLDs & AuthNSs
Even we could have the same visibility the caching effect is too high to actually make that signal significant Why not roots?
17
Detect malware-related domain names as they rise WITHOUT need of a sample
18
19
Passively monitors DNS traffic at the upper levels of DNS hierarchy Introduces alternative IP-reputation agnostic classification signal for DNS Identify rising botnets weeks before corresponding malware is found Detect accurately malware domains by analysing global DNS query resolution patterns
Notos Exposure
20
NOTOS & EXPOSURE KOPIS
.ru, .cn)
“malware signal” as seen from the upper DNS hierarchy
21
Results
22
23
Training Mode Operation Mode
Requester Diversity Requester Profile Resolved IPs Reputation
24
Req equester Div Diversi sity
ain m
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 200 400 600 800 1000 1200
(a) AS Diversity
20 40 60 80 100
(b) CC Diversity
Malicious Benign
CDF CDF
Characterize if the machines that query a given domain name are localized or globally distributed Malicious domain names have a different average distribution from legitimate ones ̵ For both features the benign domain names have a bimodal distribution ̵ Malicious domain names are spread across the spectrum The malwarerelated domain names cover a larger spectrum of diversities: Maybe due to the success of the malware distribution mechanisms they employ
25
Req equester Profi file le
Determine if machines resolving the domain names are from networks that historically have been prone to infections or not
many clients
similar characteristics
between requesters located in ISP/small business and home networks
Benign Malicious Benign Malicious
ed
ke it significantly harder to dilute the overall classification signal
(A)
5 10 15 20 Number of requester IPs per CIDR(B)
1000 2000 3000 4000 5000 Average Weight PDF PDF 0e+00 2e−04 4e−04 0.0 0.1 0.2 0.3 0.426
Req equester r IPs- Rep eputation
Describes whether, and to what extent, the IP address space pointed to by a given domain has been historically linked with known malicious activities
(Malware Evidence, SBL Evidence, Whitelist Evidence)
27
Evaluation Key Observation from Datasets
Datasets: Traffic from 1. 2 major domain registrars (8 months) 2. .ca TLD (~2months) Observations: Data reductions process (tuples) does not affect the detection ability
Not all domain names have the same interest. Spend time to analyze the most interesting based on the lookup requester diversity (100.000 most diverse domains).
28
Random Forest Classifier(RF)
TP-rate = 98.4% FP-rate = 0.3%
29
Evaluation Overall Detection Performance
30
Evaluation New and Previously Unclassified Domains
31 TP-rates for different observation periods using an 80/20 train/test dataset split FP-rates for different observation periods using an 80/20 train/test dataset split
The distribution of the malware as they arrived in our malware feed follow the same distribution as the botnet Using domain name from testing dataset in 80/20 mode
32
33
IMDDOS-Start
34
IMDDOS Peak
35
Botnet’s Growth
36
IMDDOS-Early Detection
every day was 438,471 with the average deduplicated query tuples in the range of 3,883.
37
Kopis is the first system that:
domains (even without information of the associated malware)
Kopis models three key signals at DNS authorities
Need of additional classification signals Kopis use more than half year real world data
Kopis used to identify the creation of DDoS botnet in China
38
Kopis achieve in almost all evaluation modes:
Kopis detect newly created and previously unclassified malware-related domain names This ability to identify malware-related domains on the rise can provide the DNS operators the preemptive ability to remove rapidly growing botnets at the very early stage before the responsible malware is found Malware is out there up to a couple of months before the security community finds a related Malicious Domains!! Kopis contributions:
39
40
Agenda
41
1. ARTICLE 2: ZMap: Fast Internet-wide Scanning and Its Security Applications
ZMap: Fast Internet-wide Scanning and Its Security Applications
INTERNET-WIDE NETWORK SCANNING HAS NUMEROUS SECURITY APPLICATIONS ZMAP : MODULAR OPEN-SOURCE NETWORK SCANNER
42
PRACTISES FOR GOOD INTERNET CITIZENSHIP WHEN PERFORMING INTERNET-WIDE SURVEYS
43
Can reveal new kinds
Monitor deployment
Shed light on previously opaque distributed ecosystems
Mining Ps and Qs: Widespread weak keys in network devices (2012)
25 hours across 25 Amazon EC2 Instances (625 CPU-hours)
EFF SSL (TLS) Observatory (2010)
3 months on 3 Linux desktop machines (6500 CPU-hours)
Census and Survey of the Visible Internet (2008)
3 months to complete ICMP census (2200 CPU-hours)
44
surveys
single mid-range machine in less than 45 minutes with 98% coverage
$ zmap –p 443 –o results.txt 34,132,693 listening hosts (time: 44m12s)
Over 97% of the theoretical speed of gigabit Ethernet linespeed
DOWNLOAD IT: https://zmap.io/
45
46
Build a scanner that designed especially for Internet scale scanning For scans from 1 to 100% of the Internet rather just a single slash 24 (CIDR/slash notation)
47
Optimized probing
Assume well-provisioned uplinks. Assumed that the targets are randomly ordered and widely dispersed. Attempt to send probes as quickly as the source’s NIC can support.
No per-connection state
Does not maintain state for each connection to track which hosts have been scanned.
No retransmission
Does not retransmit probes that are lost due to packet loss. Sends a fixed number of probes per target.
Addressing probes
48
4 5 mod 7 = 6
6
6 5 mod 7 = 2
3 2
2 5 mod 7 = 3
1
3 5 mod 7 = 1 1 5 mod 7 = 5 5 5 mod 7 = 4
5 4
Scan hosts according to random permutation Iterate over multiplicative group of integers modulo primitive p Instead of keeping track for every IP address we keep 3 integers: 1. Primitive root 2. Current location 3. First address
Checking Response Integrity - Validating Responses
Way to validate responses without keeping local per-target state: Encode secrets into mutable fields of probe packets like sender IP address and current iteration of scan that will have recognizable effect on responses technique like syn cookies
49
Packet Transmission and Receipt
50
Validation and Measurement Scan Rate: How Fast is Too Fast?
51
Validation and Measurement Coverage: Is One SYN Enough?
52
Validation and Measurement Variation by Time of Day: Diurnal Effect on Hosts Found
53
Comparison with Previous Studies
54
Related Work
55
Gordon Fyodor Lyon. Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning. Insecure, USA, 2009.
Drawback: Too slow.
Internet-wide scanner. In 10th ACM SIGCOMM conference on Internet measurement (IMC), pages 109–122, 2010.
Drawback: Does not process responses. They need custom network drivers.
Comparison withNmap
ZMap :
Nmap default configuration with equivalent accuracy
Averages for scanning 1 million random hosts
56
Probe ResponseTimes
57
Applications and Security Implications Visibility into Distributed Systems -Top 10 Certificate Authorities
58
Applications and Security Implications Tracking Protocol Adoption -HTTPS adoption
59
Applications and Security Implications Enumerating Vulnerable Hosts-UPnP Vulnerabilities
60
2013.
Applications and Security Implications Enumerating Vulnerable Hosts –Weak Public Keys
61
Applications and Security Implications Discovering Unadvertised Services –T
62
and 443
this type of attack
Applications and Security Implications Discovering Unadvertised Services
63
Further Potential Applications Detect Service Disruptions Track Adoption of Defenses Study Criminal Behavior Other Security Implications Anonymous Communication Track users between IP leases
Scanning and Good Internet Citizenship Recommended Practises
64
UserResponses
Approximately 200 Internet-wide scans over the past year
(3.743.899 total addresses)
65
FutureWork
66
Scanning IPv6 10gigE Network Surveys TLS Server Name Indication Scanning Exclusion Standards
Conclusions
67
surveys
consider when defending systems
̵ IPv4 can be quickly, exhaustively scanned ̵ IPv6 has not yet been widely deployed
̵ Now it is possible to scan the entire IPv4 address space from: 1 host in under 45 minutes with 98% coverage
endeavor to a ROUTINE methodology for future security research
68
69
Domain Name System (DNS)
BACKGROUND NEEDED
70
Full example
71
DNS ATTACKS
WordPress hacks that exploited a vulnerability in a well-known plugin. This exploit affected thousands of sites like Mailgun service. Attackers used their access to embed JS code on the sites that would initiate calls to a number of different domains that would then initiate different actions (including stealing credit card information) depending on the request. The embedded JS payload initiates a DNS request.
72
DNS ATTACKS
laterally through the network. In the process, the attacker sprinkles droppers across the network designed to phone home to their C&C. The phone home initiates a DNS request.
User’s click Clicking the link initiates a DNS request.
73
Command and Control
browser plugins or other infected software)
74
What Can Hackers Accomplish Through Command and Control?
75
Malicious software
Definition Any software that the user did not authorize to be loaded or collects data about him without permission Types malware, spyware, viruses, worm, logic bomb, trapdoor, trojan, RATs, mobile malicious code, malicious font, rootkits
76
Definition: number of Internet-connected devices, each of which is running
Used to perform distributed denial-of-service attack (DDoS attack), steal data, send spam, and allows the attacker to access the device and its connection. The owner can control the botnet using command and control (C&C) software
77
Divides monitored data streams to Epochs: {Ei} i =1..m Feature computation module: function F(d, Ei) = v i
d that maps the DNS traffic in epoch Ei related to d domain
name into a feature vector v i
d .
78
Set of known malware-related and known legitimate domain names and related resolved IPs Authoritative or point of delegation
Learning Module
79
Internet surveys humongous effort HTTPS ecosystem scanning every day not 1-2 times per year New whole-Internet scanner
80
mask of 255.255.255.0 in slash notation:
In this example, the binary representation of 255.255.255.0 is: 11111111.11111111.11111111.00000000.
In this example, there are twenty-four (24).
number from Step 2. The result is 192.168.42.23/24.
81
Evaluation Coverage
82
Scanning the IPv4 address space in under an hour gives the ability to: – gain visibility into previously opaque distributed systems – understand protocol adoption – uncover security phenomenon High-speed scanning also has potentially malicious applications:
83
Ethics of ActiveScanning
84
Considerations
Reducing Scan Impact
Privacy and Anonymous Communication
85
possibility of tracking user devices between IP addresses
IP addresses based on the HTTPS certificate
address instead of sending probes
Scanning and Good Internet Citizenship
86
hosts and networks worldwide
systems
87
ZMap
Existing Network Scanners
Eliminate local per-connection state
Reduce state by scanning in batches
Shotgun ScanningApproach
Track individual hosts and retransmit
Scan widely dispersed targets
Avoid flooding through timing
Probe-optimized Network Stack
Ethernet frames Utilize existing OS network stack
88
ZMap Architecture
89