Botnet Detection and Response The Network is the Infection David - - PowerPoint PPT Presentation

botnet detection and response
SMART_READER_LITE
LIVE PREVIEW

Botnet Detection and Response The Network is the Infection David - - PowerPoint PPT Presentation

Motivation/Overview Taxonomy Detection Response Botnet Detection and Response The Network is the Infection David Dagon dagon@cc.gatech.edu Georgia Institute of Technology College of Computing OARC Workshop, 2005 David Dagon Botnet


slide-1
SLIDE 1

Motivation/Overview Taxonomy Detection Response

Botnet Detection and Response

The Network is the Infection David Dagon

dagon@cc.gatech.edu Georgia Institute of Technology College of Computing

OARC Workshop, 2005

David Dagon Botnet Detection and Response

slide-2
SLIDE 2

Motivation/Overview Taxonomy Detection Response

Outline

Georgia Tech Campus (Cross Sectional View)

based on joint work with: UMass CS: Cliff Zou GaTech CS: Sanjeev Dwivedi, Robert Edmonds, Wenke Lee, Richard Lipton, and Merrick Furst GaTech ECE: Julian Grizzard

David Dagon Botnet Detection and Response

slide-3
SLIDE 3

Motivation/Overview Taxonomy Detection Response

Outline

1

Motivation/Overview Definitions The Network is the Infection

2

Taxonomy Propagation Command and Control

3

Detection The Rallying Problem Detection Opportunities

4

Response

David Dagon Botnet Detection and Response

slide-4
SLIDE 4

Motivation/Overview Taxonomy Detection Response Definitions The Network is the Infection

Definition: Bots

Hard to Define; Easy to Detect

Definitions, Examples Definition: autonomous programs automatically performing tasks, absent a real user. Benign bots

countless examples at http://www.botknowledge.com/

Gray-area bots

Blogbots, e.g., wikipedia, xanga Note: http://en.wikipedia.org/wiki/Wikipedia:Bots Other examples: xdcc, fserve bots for IRC Trainer bots (MMORPGs)

Malicious bots

Key characteristics: process forking, with network and file access, and propagation potential.

David Dagon Botnet Detection and Response

slide-5
SLIDE 5

Motivation/Overview Taxonomy Detection Response Definitions The Network is the Infection

Definition: Botnets

Botnets: Also hard to define Definition: networks of autonomous programs capable of acting on instructions. Again, gray areas: FServe bot farms, spider farms, etc. Today, just a narrow definition:

  • rganized network of malicious bot clients

Key Insights The network is the infection. We must track botnets, not just bots

David Dagon Botnet Detection and Response

slide-6
SLIDE 6

Motivation/Overview Taxonomy Detection Response Definitions The Network is the Infection

Definition: Botnets

Botnets: Also hard to define Definition: networks of autonomous programs capable of acting on instructions. Again, gray areas: FServe bot farms, spider farms, etc. Today, just a narrow definition:

  • rganized network of malicious bot clients

Key Insights The network is the infection. We must track botnets, not just bots

David Dagon Botnet Detection and Response

slide-7
SLIDE 7

Motivation/Overview Taxonomy Detection Response Definitions The Network is the Infection

Botnets as a Root Cause

Botnets are a Root Problem Spam bots Click fraud Large-scale identity theft; “vicpic” sites Proxynets (for launching other attacks) Lightning Attacks The short vulnerability-to-exploitation window makes bots particularly dangerous.

– Emerging Cybersecurity Issues Threaten Federal Information Systems, GAO-05-231 David Dagon Botnet Detection and Response

slide-8
SLIDE 8

Motivation/Overview Taxonomy Detection Response Definitions The Network is the Infection

Botnets as a Root Cause

Botnets are a Root Problem Spam bots Click fraud Large-scale identity theft; “vicpic” sites Proxynets (for launching other attacks) Lightning Attacks The short vulnerability-to-exploitation window makes bots particularly dangerous.

– Emerging Cybersecurity Issues Threaten Federal Information Systems, GAO-05-231 David Dagon Botnet Detection and Response

slide-9
SLIDE 9

Motivation/Overview Taxonomy Detection Response Definitions The Network is the Infection

Botnet vs Bot Detection

What’s the Difference? Why track both bots and botnets? Bot Detection Benefits RE → signature IDS (content) Partial victim identification

Response Policy: RBL, Quarantine Host vulnerability analysis

David Dagon Botnet Detection and Response

slide-10
SLIDE 10

Motivation/Overview Taxonomy Detection Response Definitions The Network is the Infection

Botnet vs Bot Detection

What’s the Difference? Why track both bots and botnets? Botnet Detection Benefits Critical Infrastructure Protection, prioritize on harm to network, not just victims. RE → signature IDS (flows) More Complete victim identification

Remediation Policies: Windows 2003 Network Access Protection (NAP), ISP quarantines

David Dagon Botnet Detection and Response

slide-11
SLIDE 11

Motivation/Overview Taxonomy Detection Response Propagation Command and Control

Botnet Propagation I

email Requires user interaction, social engineering Easiest method; common. Interesting: pidgin English affects propagation. instant message Various: social eng., file xfer, vulnerabilities

David Dagon Botnet Detection and Response

slide-12
SLIDE 12

Motivation/Overview Taxonomy Detection Response Propagation Command and Control

Botnet Propagation II

remote software vulnerability Often, no interaction needed Predator, Prey and Superpredator: worms vs. worms (dabber) web page Plain vanilla malware, or even Xanga ghetto botnets “seed” botnets Botnets create botnets. Used for upgrades. Most significant for detection

David Dagon Botnet Detection and Response

slide-13
SLIDE 13

Motivation/Overview Taxonomy Detection Response Propagation Command and Control

Command and Control Taxonomy

Goals: Anticipate future botnet structures Taxonomy of botnet controls An “important and sensible goal for an attack taxonomy ... should be to help the defender” – R. Maxion Thus, create a taxonomy based on detection opportunities, instead of random bot/botnet characteristics.

David Dagon Botnet Detection and Response

slide-14
SLIDE 14

Motivation/Overview Taxonomy Detection Response Propagation Command and Control

Command and Control Taxonomy

Resources Public, private Botmaster’s administrative control over a resource Rallying Services

1

Medium used for rallying

2

E.g., HTTP , IRCd, DNS tunnel, etc.

3

Reminder: public and private versions of the above

David Dagon Botnet Detection and Response

slide-15
SLIDE 15

Motivation/Overview Taxonomy Detection Response Propagation Command and Control

Command and Control Taxonomy

Resources (cont’d) Public, private Botmaster’s administrative control over a resource Name Services

1

hosts(5), e.g., corrupting WINDOWS/system32/drivers/etc/hosts

2

DNS, public and private

3

DDNS, public/private

4

Hit lists

David Dagon Botnet Detection and Response

slide-16
SLIDE 16

Motivation/Overview Taxonomy Detection Response Propagation Command and Control

Command and Control Taxonomy I

RFC Compliance The degree of standards compliance. E.g., non-responsive IRCd Ad-hoc protocols.

P2P port-knocking Tunneling (NSTx, sinit, bobax)

David Dagon Botnet Detection and Response

slide-17
SLIDE 17

Motivation/Overview Taxonomy Detection Response Propagation Command and Control

Command and Control Taxonomy II

Activity Level The degree to which bots are in constant contact with botmaster. Time division: periodic phone in, flow-based, sessionless, stateless Proximity: delegation of contact; clique connections Insight Note: other lists possible. Key: organize them into categories. Can we detect these categories?

David Dagon Botnet Detection and Response

slide-18
SLIDE 18

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

The Rallying Problem

Let’s focus on “rallying” to identify detection opportunities. C&C used to rally victims

Detecting C&C ⇒ detecting botnet Goal: detect C&C during formation

Therefore, reason like an attacker Attacker design goals:

Robustness Mobility Stealth

Assumption: The attackers are always motived by these three goals.

David Dagon Botnet Detection and Response

slide-19
SLIDE 19

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

The Rallying Problem

Suppose we create virus

Download vx code; fiddle; compile Uses email propagation/social engr.

We mail it...

V

1

V

2

V

3

V

4

V

5

VX

Welcome to the 1980s. What if we want to use victim resources?

David Dagon Botnet Detection and Response

slide-20
SLIDE 20

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Simple Rallying I

Naively, we could have victims contact us... Problems

VX must include author’s address (no stealth) Single rallying point (not robust) VX has hard-coded address (not mobile)

V

1

V

2

V

3

V

4

V

5

VX David Dagon Botnet Detection and Response

slide-21
SLIDE 21

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Simple Rallying II

Or, the victims could contact a 3d party, e.g., post to Usenet

Some connections dropped, single point of failure (not robust) Rival VXers and AVers obtain list (not stealthy)

Public, lasting record of victims (not stealthy)

V

1

V

2

V

3

V

4

V

5

V

R

VX David Dagon Botnet Detection and Response

slide-22
SLIDE 22

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Simple Rallying III

Or, the victims could contact a robust service, e.g., IRCd

No single point of failure (is robust) Rival VXers and AVers id list (not stealthy)

Addressed by adjusting protocol adherence or private nature

  • f service.

Portability of IRCd DNS (is mobile)

1

V

2

V

3

V

4

V

5

V VX David Dagon Botnet Detection and Response

slide-23
SLIDE 23

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detection In-Protocol

Numerous ad-hoc bot detection frameworks: IRCd, public (DDD, Gnuworld) IRCd, private (RWTH Aachen) E-mail (CipherTrust ZombieMeter; everyone else) AV/Managed network sensing (Sophos) Obvious detection (existing blackhole mining) Problem: Largely post-attack Largely cannot detect structure (rain drop analogy) Expensive to monitor (requires spam filter banks, or difficult IRCd manipulations) Trivially evaded

David Dagon Botnet Detection and Response

slide-24
SLIDE 24

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detection Strategies

What should we do instead of in-protocol sensing? Better approach: find invariant observable by sensors Bot must always exhibit some behaviors If we can sense, we can perform detection One idea: DNS-based detection

David Dagon Botnet Detection and Response

slide-25
SLIDE 25

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Protocol Agnostic Detection: DNS

Intuition

www.example.com/products www.example.com/home botnet1.example.org botnet2.example.org

class 1

  • 3LD .SLD.TLD/ subdir1/subdir2
  • class 2

Incentives for Subdirectories lower skills (dns updates vs mkdir) less risk (fewer $ transactions) lower cost (package 3LD deals)

David Dagon Botnet Detection and Response

slide-26
SLIDE 26

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detecting DDNS Bots

Canonical DNS Request Rate CSLDi = RSLDi +

|SLDi|

  • j=1

R3LDj This is analogous to summing the children for a tree rooted on SLDi. Key Assumption DNS server is not authoritative for many zones with high 3LD count. → Dyn DNS Providers!

David Dagon Botnet Detection and Response

slide-27
SLIDE 27

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detecting DDNS Bots

Canonical DNS Request Rate CSLDi = RSLDi +

|SLDi|

  • j=1

R3LDj This is analogous to summing the children for a tree rooted on SLDi. Key Assumption DNS server is not authoritative for many zones with high 3LD count. → Dyn DNS Providers!

David Dagon Botnet Detection and Response

slide-28
SLIDE 28

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detecting DDNS Bots

Canonical DNS Request Rate CSLDi = RSLDi +

|SLDi|

  • j=1

R3LDj This is analogous to summing the children for a tree rooted on SLDi. Use Chebyshev’s inequality: P(|X − µ| ≥ t) ≤ σ2 t (1) This is analogous to summing the children for a tree rooted on SLDi.

David Dagon Botnet Detection and Response

slide-29
SLIDE 29

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

DDNS-Based Detection

For DDNS customers, botnets tend to use subdomains; legitimate directories use subdirectories We can use SLD/3LD-ratios to identify botnet traffic

10 100 1000 10000 01/23 01/24 01/25 01/26 01/27 01/28 01/29 01/30 01/31 02/01 02/02 DNS lookups/hour time Normal traffic (Weighted and Unweighted) Unweighted bot traffic Weighted bot traffic

David Dagon Botnet Detection and Response

slide-30
SLIDE 30

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

DDNS-Based Detection

For DDNS customers, botnets tend to use subdomains; legitimate directories use subdirectories We can use SLD/3LD-ratios to identify botnet traffic

200 400 600 800 1000 1200 1400 1600 01/23 01/24 01/25 01/26 01/27 01/28 01/29 01/30 01/31 02/01 02/02 DNS Lookups/hour Time (in days) Bot traffic (Canonical SLD Form) Normal traffic (Canonical SLD Form)

David Dagon Botnet Detection and Response

slide-31
SLIDE 31

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detecting DDNS Bots

Does Chebyshev’s inequality always work?

50 100 150 200 250 300 350 400 01/23 01/24 01/25 01/26 01/27 01/28 01/29 01/30 01/31 02/01 02/02 DNS Lookups/hr Time Bot traffic Normal traffic

David Dagon Botnet Detection and Response

slide-32
SLIDE 32

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detecting DDNS Bots

DNS Density Comparison d2(x, ¯ y) = (x − ¯ y)

′C−1(x − ¯

y) (2) variable vectors (features):

x - new observation ¯ y - trained normal profile

C – inverse covariance matrix for each member of training data

David Dagon Botnet Detection and Response

slide-33
SLIDE 33

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detecting DDNS Bots

Simplified Distance Measure Mahalanobis distance considers variance and average request rate‘

Thus, good for outlier detection

We can assume independence of each feature in normal

DNS requests more likely not correlated Thus, drop covariance matric C Also done in Wang, Stolfo, etc.

d(x, ¯ y) =

n−1

  • i=0

|xi − ¯ yi| ¯ σi

  • (3)

David Dagon Botnet Detection and Response

slide-34
SLIDE 34

Motivation/Overview Taxonomy Detection Response The Rallying Problem Detection Opportunities

Detecting DDNS Bots

50 100 150 200 250 300 350 400 5 10 15 20 25 DNS lookups/hr Rank (by decreasing number of lookups) Bot Normal Host

Figure: Comparison of Sorted DNS Rates

David Dagon Botnet Detection and Response

slide-35
SLIDE 35

Motivation/Overview Taxonomy Detection Response

Response Options

Response options include:

DNS Removal Passive Logging (blackhole) Passive Monitoring (sinkhole)

TCP-layer 4 timeout games Application-layer delays

Interactive Monitoring

Proxynet/Man-in-middle Fingerprinting hosts: clock skew, OS services, IP , time, etc. Bot Application versioning Removal interactions (Caution!)

For today: victim epidemiology, and sinkholing

David Dagon Botnet Detection and Response

slide-36
SLIDE 36

Motivation/Overview Taxonomy Detection Response

Victim Epidemiology: Total Population

50000 100000 150000 200000 250000 300000 350000 12/30 00:00 12/30 12:00 12/31 00:00 12/31 12:00 01/01 00:00 01/01 12:00 01/02 00:00 01/02 12:00 01/03 00:00 01/03 12:00 01/04 00:00 01/04 12:00 01/05 00:00 01/05 12:00 Number of Victims Time Total Bot Population Over Time port 2100 port 2125 port 2187 port 54123 port 6564 port 8092 David Dagon Botnet Detection and Response

slide-37
SLIDE 37

Motivation/Overview Taxonomy Detection Response

Victim Epidemiology: Country of Origin

David Dagon Botnet Detection and Response

slide-38
SLIDE 38

Motivation/Overview Taxonomy Detection Response

Victim Epidemiology: All

53K botnet

28.6 % Unknown 70.3 % Windows 0.9 % Misc

David Dagon Botnet Detection and Response

slide-39
SLIDE 39

Motivation/Overview Taxonomy Detection Response

Victim Epidemiology: Windows-Only

2000SP4,XPSP1 Assorted Win95,98,3.11 2000SP2+,XPSP1 XP ,2000SP2 XPSP1,2000SP3 XP/2000 XPProSP1,2000SP3 XPSP1,2000SP4 35.8 % 2.51 % 25.4 % 7.6 % 1.2 % 10.8 % 15.9 % 0.7 %

David Dagon Botnet Detection and Response

slide-40
SLIDE 40

Motivation/Overview Taxonomy Detection Response

Population Estimates

How complete? Analysis of closed systems. Lincoln-Peterson two independent samples, M, and C, for the mark and capture sets. Second is merely random set in N

C

  • .

Define: M – individuals marked by the first sample, C – individuals observed in the second, R – number in both.

With R conditioned on M and C, the distribution of R is hypergeometric: f(R|M, C) = M

R

N−M

C−R

  • N

C

  • David Dagon

Botnet Detection and Response

slide-41
SLIDE 41

Motivation/Overview Taxonomy Detection Response

Population Estimates

If the mark and capture population samples are suitably large percentages of the total population, i.e., M + C ≥ N, the estimate ˆ N is unbiased even for small sample sizes. ˆ N = (M + 1)(C + 1) R + 1 + 1 (4) may not always yield sufficiently large mark and capture samples to estimate ˆ N. With a normal distribution for ˆ N, we can further calculate a 95% confidence interval for this population as ˆ N ± 1.96√v, where: v = (M + 1)(C + 1)(M − R)(C − R) (R + 1)2(R + 2)

David Dagon Botnet Detection and Response

slide-42
SLIDE 42

Motivation/Overview Taxonomy Detection Response

Policy Implications for Sinkhole Collection

Policy First; Data Second Large data collection efforts always have policy implications. Upfront, we consider: Privacy issues (granularity of clock skew) Use of Census data Census of Victim OS/Patch-level Priority rank research into services Policy implication of discontinued/pay patch systems Concrete analysis of “Monoculture” concerns

David Dagon Botnet Detection and Response

slide-43
SLIDE 43

Motivation/Overview Taxonomy Detection Response

Population Estimates

How to improve? Dymanic models needed (non-closed population) Pen tester trend: Interaction with victim services (139, 445) to probe patch level. Borrow Broido’s TTL work Add p0f dbs for NATing routers Add behavioral parameter:

estimate of cache-flushing behavior (cf., Wessels & Fomenkov’s “Wow” paper) purpose/use of botnet (e.g., spam, DDoS, click fraud)

David Dagon Botnet Detection and Response

slide-44
SLIDE 44

Motivation/Overview Taxonomy Detection Response

Summary

So far:

The Network is the Infection Goal: detect botnets, not just bots Existing botnet detection serendipitous, fragile Taxonomy can direct towards solution DDNS-based detection feasible

Not discussed:

Expand DNS monitoring (future talk: algos and hardware) Expanded RE Traceback, LEO involvement Threat metrics (cumulative bw estimation, key cracking potential, evasion potential) Graph theoretic detection (P2P , TOR-based botnets)

David Dagon Botnet Detection and Response

slide-45
SLIDE 45

Motivation/Overview Taxonomy Detection Response

Need Data/Malware?

I have source for hundreds of bots, terabytes of pcaps If you’re a researcher, and need samples or data:

Let’s exchange PGP keys and check with our advisors, net admins, etc.

David Dagon Botnet Detection and Response