Procopiou Anna 931769 Professor: Dr. Athanasopoulos Elias 1 - - PowerPoint PPT Presentation

procopiou anna
SMART_READER_LITE
LIVE PREVIEW

Procopiou Anna 931769 Professor: Dr. Athanasopoulos Elias 1 - - PowerPoint PPT Presentation

Procopiou Anna 931769 Professor: Dr. Athanasopoulos Elias 1 Articles 1. Detecting Malware Domains at the Upper DNS Hierarchy -M.Antonakakis et. al 2. ZMap: Fast Internet-wide Scanning and Its Security Applications -Z. Durumeric et. al 2


slide-1
SLIDE 1

Procopiou Anna

931769 Professor: Dr. Athanasopoulos Elias

1

slide-2
SLIDE 2

Articles

2

  • 1. Detecting Malware Domains at the Upper DNS Hierarchy
  • M.Antonakakis et. al
  • 2. ZMap: Fast Internet-wide Scanning and Its Security Applications
  • Z. Durumeric et. al
slide-3
SLIDE 3

Agenda

3

  • 1. ARTICLE 1: Detecting Malware Domains at the Upper DNS

Hierarchy

  • Main idea of the article
  • Background and Related work
  • Author’s proposal
  • Differences from prior art
  • Statistical features
  • Evaluation
  • Conclusions
slide-4
SLIDE 4

Detecting Malware Domains at the Upper DNS Hierarchy Key Ideas

MISCREANTS USE DNS TO BUILD MALICIOUS NETWORK INFRASTRUCTURES FOR MALWARE C&C MALWARE-RELATED DOMAIN NAMES DETECTION SYSTEM  KOPIS 4

slide-5
SLIDE 5

Domain Name System (DNS)

5

BACKGROUND NEEDED

slide-6
SLIDE 6

Domain name system (DNS) hierarchy

6

slide-7
SLIDE 7

Domain Name System ty types of

  • f DN

DNS queries—recursive, iterative

7

slide-8
SLIDE 8

DNS Query

Recursive query

8

slide-9
SLIDE 9

DNS Query

Iterative query www.google.com.

9

slide-10
SLIDE 10

Resource Record

10

Resource Record (RR)?

Mapping from a domain name to IP

Authoritative domain name tuples Qj (d) = (Tj , Rj , d, IPsj )

Who is looking up what and where is pointing?

slide-11
SLIDE 11

How attackers leverage domains in their attacks?

Benign websites An attacker compromises a benign site (domain) to distribute malware, or perform other nefarious activity (e.g., Phishing, Spam). Malicious site An attacker creates a malicious site (domain) to distribute malware, or perform other nefarious activity (e.g., Phishing, Spam). Command & Control (C&C) What an attacker uses to execute their own code

  • n the victim’s computer using payload.
slide-12
SLIDE 12

Related Work

12

1. Measurements and Laboratory Simulations of the Upper DNS Hierarchy, D. Wessels, et. Al

  • Examining DNS caching behavior of RECURSIVE DNS serves from the point of view of TLD and AuthNS servers

2. An Internet Wide View into DNS Lookup Patterns, Hao et al.

  • DNS look-up patterns measured from .com TLD servers
  • DRAWBACK: not discuss how the findings may be leveraged for detection purposes

3.

  • D. Dagon, C. Zou, and W. Lee. Modeling botnet propagation using time zones. In In Proceedings of the 13 th

Network and Distributed System Security Symposium NDSS, 2006 4.

  • S. Staniford, V. Paxson, and N. Weaver. How to own the internet in your spare time. In Proceedings of the 11th

USENIX Security Symposium, pages 149–167, Berkeley, CA, USA, 2002. USENIX Association 5.

  • N. Weaver, S. Staniford, and V. Paxson. Very fast containment of scanning worms. In In Proceedings of the 13th

USENIX Security Symposium, pages 29–44, 2004 6.

  • M. P. Collins, T. J. Shimeall, S. Faber, J. Janies, R. Weaver, M. De Shon, and J. Kadane. Using uncleanliness to

predict future botnet addresses. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC ’07, pages 93–104, New York, NY, USA, 2007. ACM 7.

  • M. Felegyhazi, C. Keibich, and V. Paxson. On the potential of proactive domain blacklisting. In Third USENIX LEET

Workshop, 2010

slide-13
SLIDE 13

Motivation of the paper

13

IP-based blocking techniques cannot stay up-to-date due to the humongous number of domains attackers use for C&C DNSBL-based technologies cannot keep up with the volume of new domains botnets use Malware families utilize domains to discover the up-to-date C&C addresses Time gap between the release of a malware and its discovery

slide-14
SLIDE 14

Response

Problem malicious use of DNS Existed solution static domain blacklists containing known malware domains Drawback limited effectiveness as humongous number of new domains appear on the Internet every day  difficult to keep them up-to-date

14

slide-15
SLIDE 15

Qualities of DYNAMICALLY Detection system

Global visibility into DNS request and response messages related to large DNS zones DNS operators independently deploy the system and detect malware- related domains within their authority zones Accurately detect malware-related domains even in the absence of reputation data for the IP address space pointed to by the domains

15

slide-16
SLIDE 16

Qualities of DYNAMICALLY Detection system

GLOBAL VISIBILITY INTO DNS REQUEST RESPONSE MESSAGES RELATED TO LARGE DNS ZONES DNS OPERATORS INDEPENDENTLY DEPLOY THE SYSTEM AND DETECT MALWARE-RELATED DOMAINS WITHIN THEIR AUTHORITY ZONES ACCURATELY DETECT MALWARE- RELATED DOMAINS EVEN IN THE ABSENCE OF REPUTATION DATA FOR THE IP ADDRESS SPACE POINTED TO BY THE DOMAINS

Early warning

Important as IP reputation data is difficult to accumulated and fragile

Practical, low-cost and time-efficient detection & response

16

slide-17
SLIDE 17

New Approach: Kopis

DNS hierarchical distributed database thus in some point we can have global visibility TLDs & AuthNSs

Even we could have the same visibility the caching effect is too high to actually make that signal significant Why not roots?

17

slide-18
SLIDE 18

System’s Goal

Detect malware-related domain names as they rise WITHOUT need of a sample

18

slide-19
SLIDE 19

Contributions Of Kopis

19

Passively monitors DNS traffic at the upper levels of DNS hierarchy Introduces alternative IP-reputation agnostic classification signal for DNS Identify rising botnets weeks before corresponding malware is found Detect accurately malware domains by analysing global DNS query resolution patterns

slide-20
SLIDE 20

Prior Art

Notos Exposure

20

slide-21
SLIDE 21

Differences from Prior Art

NOTOS & EXPOSURE KOPIS

  • Can be applied on the recursive ISP
  • Almost global visibility on the point of view of zones (e.g.

.ru, .cn)

  • Partial visibility of the point of view of requesters
  • Rely heavily on features based on IP reputation
  • Leverage RDNS-level DNS traffic monitoring
  • Global visibility on requesters focused in specific set of zones
  • Predictions on domains only in that zone
  • Predict accurately malware-related domain without the need
  • f IP reputation data
  • Extracts statistical features specifically chosen to harvest the

“malware signal” as seen from the upper DNS hierarchy

21

slide-22
SLIDE 22

Results

SYSTEMS TRUE POSITIVES FALSE POSITIVES NOTOS 96.80% 0.38% EXPOSURE 98% 7.90% KOPIS 98.40% 0.3%-0.5%

22

slide-23
SLIDE 23

System Overview

23

Training Mode Operation Mode

slide-24
SLIDE 24

Statistical Feature Families

Requester Diversity Requester Profile Resolved IPs Reputation

24

slide-25
SLIDE 25

Req equester Div Diversi sity

  • er
  • al
  • ­­

ain m

  • 1

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 200 400 600 800 1000 1200

(a) AS Diversity

20 40 60 80 100

(b) CC Diversity

Malicious Benign

CDF CDF

Characterize if the machines that query a given domain name are localized or globally distributed Malicious domain names have a different average distribution from legitimate ones ̵ For both features the benign domain names have a bimodal distribution ̵ Malicious domain names are spread across the spectrum The malwarerelated domain names cover a larger spectrum of diversities: Maybe due to the success of the malware distribution mechanisms they employ

25

slide-26
SLIDE 26

Req equester Profi file le

Determine if machines resolving the domain names are from networks that historically have been prone to infections or not

  • Assign higher weight to servers with

many clients

  • Not all querying machines have

similar characteristics

  • We would like to distinguish

between requesters located in ISP/small business and home networks

Benign Malicious Benign Malicious

Requester Profile

  • s.
  • h

ed

  • We model differently

­­

ke it significantly harder to dilute the overall classification signal

(A)

5 10 15 20 Number of requester IPs per CIDR

(B)

1000 2000 3000 4000 5000 Average Weight PDF PDF 0e+00 2e−04 4e−04 0.0 0.1 0.2 0.3 0.4

26

slide-27
SLIDE 27

Req equester r IPs- Rep eputation

Describes whether, and to what extent, the IP address space pointed to by a given domain has been historically linked with known malicious activities

  • r known legitimate services.

(Malware Evidence, SBL Evidence, Whitelist Evidence)

27

slide-28
SLIDE 28

Evaluation Key Observation from Datasets

Datasets: Traffic from 1. 2 major domain registrars (8 months) 2. .ca TLD (~2months) Observations: Data reductions process (tuples) does not affect the detection ability

  • f Kopis.

Not all domain names have the same interest. Spend time to analyze the most interesting based on the lookup requester diversity (100.000 most diverse domains).

28

slide-29
SLIDE 29

Model selection

Random Forest Classifier(RF)

  • 2 – 5 day training window
  • RESULTS: (5day
  • bservation window)

TP-rate = 98.4% FP-rate = 0.3%

29

slide-30
SLIDE 30

Evaluation Overall Detection Performance

30

slide-31
SLIDE 31

Evaluation New and Previously Unclassified Domains

31 TP-rates for different observation periods using an 80/20 train/test dataset split FP-rates for different observation periods using an 80/20 train/test dataset split

slide-32
SLIDE 32

Deltas

The distribution of the malware as they arrived in our malware feed follow the same distribution as the botnet Using domain name from testing dataset in 80/20 mode

32

slide-33
SLIDE 33

Canadian TLD

33

slide-34
SLIDE 34

IMDDOS-Start

34

slide-35
SLIDE 35

IMDDOS Peak

35

slide-36
SLIDE 36

Botnet’s Growth

36

slide-37
SLIDE 37

IMDDOS-Early Detection

  • The average lookup volume

every day was 438,471 with the average de­duplicated query tuples in the range of 3,883.

37

slide-38
SLIDE 38

Conclusions

Kopis is the first system that:

  • can operate at TLD servers and large authorities
  • provide DNS operators the ability of early detection of malware-related

domains (even without information of the associated malware)

Kopis models three key signals at DNS authorities

  • Daily domain name resolution patterns
  • Significance of each requester for an epoch
  • Domain name’s IP address reputation

Need of additional classification signals Kopis use more than half year real world data

  • Known benign
  • Malware related domains from two major DNS authorities

Kopis used to identify the creation of DDoS botnet in China

38

  • Threat landscape is changing – difficult to keep-up
  • Evasion is harder
slide-39
SLIDE 39

Conclusions cont’

Kopis achieve in almost all evaluation modes:

  • High detection rates (TP)
  • low false positive rates

Kopis detect newly created and previously unclassified malware-related domain names This ability to identify malware-related domains on the rise can provide the DNS operators the preemptive ability to remove rapidly growing botnets at the very early stage before the responsible malware is found Malware is out there up to a couple of months before the security community finds a related Malicious Domains!! Kopis contributions:

39

  • Early warning
  • We can measure and model key properties of malware domain names on the rise
  • Independently deployable by network operators
  • Several weeks before they were listed in blacklists
  • Before information of associate malware appeared in security forums
slide-40
SLIDE 40

Any Questions for the first paper?

40

slide-41
SLIDE 41

Agenda

41

1. ARTICLE 2: ZMap: Fast Internet-wide Scanning and Its Security Applications

  • Introduction- key idea
  • Previous research
  • Motivation
  • ZMap description
  • Comparison with prior art
  • Conclusions
slide-42
SLIDE 42

ZMap: Fast Internet-wide Scanning and Its Security Applications

INTERNET-WIDE NETWORK SCANNING HAS NUMEROUS SECURITY APPLICATIONS ZMAP : MODULAR OPEN-SOURCE NETWORK SCANNER

42

PRACTISES FOR GOOD INTERNET CITIZENSHIP WHEN PERFORMING INTERNET-WIDE SURVEYS

slide-43
SLIDE 43

Internet-wide scanning

43

Can reveal new kinds

  • f vulnerabilities

Monitor deployment

  • f mitigations

Shed light on previously opaque distributed ecosystems

slide-44
SLIDE 44

Previous Research

Mining Ps and Qs: Widespread weak keys in network devices (2012)

25 hours across 25 Amazon EC2 Instances (625 CPU-hours)

EFF SSL (TLS) Observatory (2010)

3 months on 3 Linux desktop machines (6500 CPU-hours)

Census and Survey of the Visible Internet (2008)

3 months to complete ICMP census (2200 CPU-hours)

44

slide-45
SLIDE 45

Zmap: The Scanner

  • Fast single-packet network scanner optimized for Internet-wide network

surveys

  • Open-source tool that can port scan the entire IPv4 address space from a

single mid-range machine in less than 45 minutes with 98% coverage

$ zmap –p 443 –o results.txt 34,132,693 listening hosts (time: 44m12s)

Over 97% of the theoretical speed of gigabit Ethernet linespeed

DOWNLOAD IT: https://zmap.io/

45

slide-46
SLIDE 46

Motivation

46

Build a scanner that designed especially for Internet scale scanning For scans from 1 to 100% of the Internet rather just a single slash 24 (CIDR/slash notation)

slide-47
SLIDE 47

Architectural Choices

47

Optimized probing

Assume well-provisioned uplinks. Assumed that the targets are randomly ordered and widely dispersed. Attempt to send probes as quickly as the source’s NIC can support.

No per-connection state

Does not maintain state for each connection to track which hosts have been scanned.

No retransmission

Does not retransmit probes that are lost due to packet loss. Sends a fixed number of probes per target.

slide-48
SLIDE 48

Addressing probes

48

4  5 mod 7 = 6

6

6  5 mod 7 = 2

3 2

2  5 mod 7 = 3

1

3  5 mod 7 = 1 1  5 mod 7 = 5 5  5 mod 7 = 4

5 4

Scan hosts according to random permutation Iterate over multiplicative group of integers modulo primitive p Instead of keeping track for every IP address we keep 3 integers: 1. Primitive root 2. Current location 3. First address

slide-49
SLIDE 49

Checking Response Integrity - Validating Responses

Way to validate responses without keeping local per-target state: Encode secrets into mutable fields of probe packets like sender IP address and current iteration of scan that will have recognizable effect on responses technique like syn cookies

49

slide-50
SLIDE 50

Packet Transmission and Receipt

50

  • 1. Send ethernet frames over raw socket
  • 2. Process responses using lipcap
  • 3. Flexibility through probe and output modules
slide-51
SLIDE 51

Validation and Measurement Scan Rate: How Fast is Too Fast?

51

slide-52
SLIDE 52

Validation and Measurement Coverage: Is One SYN Enough?

52

slide-53
SLIDE 53

Validation and Measurement Variation by Time of Day: Diurnal Effect on Hosts Found

53

slide-54
SLIDE 54

Comparison with Previous Studies

54

slide-55
SLIDE 55

Related Work

55

Gordon Fyodor Lyon. Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning. Insecure, USA, 2009.

Drawback: Too slow.

  • D. Leonard and D. Loguinov. Demystifying service discovery: Implementing an

Internet-wide scanner. In 10th ACM SIGCOMM conference on Internet measurement (IMC), pages 109–122, 2010.

Drawback: Does not process responses. They need custom network drivers.

slide-56
SLIDE 56

Comparison withNmap

ZMap :

  • Can scan more than 1300 times faster than the most aggressive

Nmap default configuration with equivalent accuracy

  • Finds more results than Nmap

Averages for scanning 1 million random hosts

56

slide-57
SLIDE 57

Probe ResponseTimes

57

slide-58
SLIDE 58

Applications and Security Implications Visibility into Distributed Systems -Top 10 Certificate Authorities

58

slide-59
SLIDE 59

Applications and Security Implications Tracking Protocol Adoption -HTTPS adoption

59

slide-60
SLIDE 60

Applications and Security Implications Enumerating Vulnerable Hosts-UPnP Vulnerabilities

60

  • HD Moore disclosed vulnerabilities in several common UPnP frameworks in January

2013.

  • Under 6 hours to code and run UPnP discovery scan.
  • Custom probe module, 150-SLOC.
  • We found that 3.34 million of 15.7 million devices were
  • vulnerable. (16,5%)
  • Compromise possible with a single UDP packet.
slide-61
SLIDE 61

Applications and Security Implications Enumerating Vulnerable Hosts –Weak Public Keys

61

slide-62
SLIDE 62

Applications and Security Implications Discovering Unadvertised Services –T

  • r Bridges

62

  • Scanning has potential to uncover unadvertised services
  • We perform a Tor handshake with public IPv4 addresses on port 9001

and 443

  • We identified 86% of live allocated bridges with a single scan
  • Tor has developed Obfsproxy that listens on random ports to count

this type of attack

slide-63
SLIDE 63

Applications and Security Implications Discovering Unadvertised Services

63

Further Potential Applications Detect Service Disruptions Track Adoption of Defenses Study Criminal Behavior Other Security Implications Anonymous Communication Track users between IP leases

slide-64
SLIDE 64

Scanning and Good Internet Citizenship Recommended Practises

64

slide-65
SLIDE 65

UserResponses

Approximately 200 Internet-wide scans over the past year

  • Responses from 145 users
  • Blacklisted 91 entities

(3.743.899 total addresses)

  • 15 actively hostile responses
  • 2 cases of retaliatory traffic

65

slide-66
SLIDE 66

FutureWork

66

Scanning IPv6 10gigE Network Surveys TLS Server Name Indication Scanning Exclusion Standards

slide-67
SLIDE 67

Conclusions

67

  • ZMap, a network scanner specifically architected for performing fast, comprehensive Internet-wide

surveys

  • Security applications of high-speed scanning, also provides new attack vectors that we must

consider when defending systems

  • Living in a unique period

̵ IPv4 can be quickly, exhaustively scanned ̵ IPv6 has not yet been widely deployed

  • ZMap lowers barriers of entry for Internet-wide surveys

̵ Now it is possible to scan the entire IPv4 address space from: 1 host in under 45 minutes with 98% coverage

  • Explore potential security applications
  • Future goal: Zmap will elevate Internet-wide scanning from EXPENSIVE and TIME-CONSUMING

endeavor to a ROUTINE methodology for future security research

slide-68
SLIDE 68

Thank you! Any Questions?

68

slide-69
SLIDE 69

BACK-UP SLIDES

69

slide-70
SLIDE 70

Domain Name System (DNS)

BACKGROUND NEEDED

70

slide-71
SLIDE 71

DNS Query

Full example

71

slide-72
SLIDE 72

DNS ATTACKS

  • EXAMPLE 1: THE 2019 MAILGUN HACK

WordPress hacks that exploited a vulnerability in a well-known plugin. This exploit affected thousands of sites like Mailgun service. Attackers used their access to embed JS code on the sites that would initiate calls to a number of different domains that would then initiate different actions (including stealing credit card information) depending on the request. The embedded JS payload initiates a DNS request.

72

slide-73
SLIDE 73

DNS ATTACKS

  • EXAMPLE 2: MANAGING MULTIPLE SERVERS
  • rganization responsible for many servers. An attacker bypasses your defenses and moves

laterally through the network. In the process, the attacker sprinkles droppers across the network designed to phone home to their C&C. The phone home initiates a DNS request.

  • EXAMPLE 3: – MITIGATING USER BEHAVIOR (PHISHING)

User’s click Clicking the link initiates a DNS request.

73

slide-74
SLIDE 74

Command and Control

  • The attacker infect a computer(e.g. via a phishing email or through security holes in

browser plugins or other infected software)

  • Infected machine sends a signal to the attacker’s server looking for its next instruction
  • Carry out commands from the attacker’s C&C server and may install additional software
  • Attacker now has complete control of the victim’s computer and can execute any code.
  • The malicious code will typically spread to more computers, creating a botnet

74

slide-75
SLIDE 75

What Can Hackers Accomplish Through Command and Control?

  • Data theft
  • Shutdown
  • Reboot
  • Distributed denial of service

75

slide-76
SLIDE 76

Malicious software

Definition Any software that the user did not authorize to be loaded or collects data about him without permission Types malware, spyware, viruses, worm, logic bomb, trapdoor, trojan, RATs, mobile malicious code, malicious font, rootkits

76

slide-77
SLIDE 77

Botnets

Definition: number of Internet-connected devices, each of which is running

  • ne or more bots.

Used to perform distributed denial-of-service attack (DDoS attack), steal data, send spam, and allows the attacker to access the device and its connection. The owner can control the botnet using command and control (C&C) software

77

slide-78
SLIDE 78

KOPIS

Divides monitored data streams to Epochs: {Ei} i =1..m Feature computation module: function F(d, Ei) = v i

d that maps the DNS traffic in epoch Ei related to d domain

name into a feature vector v i

d .

78

slide-79
SLIDE 79

System

  • verview-

TRAINING MODE

Set of known malware-related and known legitimate domain names and related resolved IPs Authoritative or point of delegation

KNOWLEDGE BASE

Learning Module

79

slide-80
SLIDE 80

Motivation

Internet surveys humongous effort HTTPS ecosystem scanning every day not 1-2 times per year New whole-Internet scanner

80

slide-81
SLIDE 81

IPv4 subnet mask

  • For example, to write the IPv4 address 192.168.42.23 with a subnet

mask of 255.255.255.0 in slash notation:

  • Convert the subnet mask to binary.

In this example, the binary representation of 255.255.255.0 is: 11111111.11111111.11111111.00000000.

  • Count each 1 in the subnet mask.

In this example, there are twenty-four (24).

  • Write the original IP address, a forward slash (/), and then the

number from Step 2. The result is 192.168.42.23/24.

81

slide-82
SLIDE 82

Evaluation Coverage

82

slide-83
SLIDE 83

Applications and Security Implications

Scanning the IPv4 address space in under an hour gives the ability to: – gain visibility into previously opaque distributed systems – understand protocol adoption – uncover security phenomenon High-speed scanning also has potentially malicious applications:

  • Finding and attacking vulnerable hosts
  • disrupt existing security model

83

slide-84
SLIDE 84

Ethics of ActiveScanning

84

Considerations

  • Impossible to request permission from all owners
  • No IP-level equivalent to robots' exclusion standard
  • Administrators may believe that they are under attack

Reducing Scan Impact

  • Scan in random order to avoid overwhelming networks
  • Signal benign nature over HTTP and w/ DNS hostnames
  • Honor all requests to be excluded from future scans
slide-85
SLIDE 85

Privacy and Anonymous Communication

85

  • High-speed scanning raises potential new privacy threats, such as the

possibility of tracking user devices between IP addresses

  • Companies could track home Internet users between dynamically assigned

IP addresses based on the HTTPS certificate

  • Provide the basis for a system of anonymous communication
  • Use the scanner to broadcast a short encrypted message to every public IP

address instead of sending probes

slide-86
SLIDE 86

Scanning and Good Internet Citizenship

86

  • Internet-wide scanning involves interacting with an enormous number of

hosts and networks worldwide

  • Impossible to request permission in advance from the owners of all these

systems

  • Give traffic recipients the ability to opt out of further probes
  • ZMap scan addresses according to a random permutation to avoid
  • verwhelming destination networks
slide-87
SLIDE 87

RSA keys

87

slide-88
SLIDE 88

Comparison with Existing Network Scanners

ZMap

Existing Network Scanners

Eliminate local per-connection state

  • Fully asynchronous components
  • No blocking except for network

Reduce state by scanning in batches

  • Time lost due to blocking
  • Results lost due to timeouts

Shotgun ScanningApproach

  • Always send n probes per host

Track individual hosts and retransmit

  • Most hosts will not respond

Scan widely dispersed targets

  • Send as fast as network allows

Avoid flooding through timing

  • Time lost waiting

Probe-optimized Network Stack

  • Bypass inefficiencies by generating

Ethernet frames Utilize existing OS network stack

  • Not optimized for immense number
  • f connections

88

slide-89
SLIDE 89

ZMap Architecture

89