Detecting Botnets with Temporal Persistence Jaideep Chandrashekar - - PowerPoint PPT Presentation

detecting botnets with temporal persistence
SMART_READER_LITE
LIVE PREVIEW

Detecting Botnets with Temporal Persistence Jaideep Chandrashekar - - PowerPoint PPT Presentation

Detecting Botnets with Temporal Persistence Jaideep Chandrashekar Frederic Giroire Nina Taft Eve Schooler Mascotte project I3S (CNRS, Univ. of Nice) Intel Labs INRIA Sophia Antipolis Botnets: Why care? Botnet


slide-1
SLIDE 1

Detecting Botnets with Temporal Persistence

Jaideep Chandrashekar Frederic Giroire Nina Taft Eve Schooler

Intel Labs Mascotte project I3S (CNRS, Univ. of Nice) INRIA Sophia Antipolis

slide-2
SLIDE 2

Botnets: Why care?

slide-3
SLIDE 3

exploit

Botnet Life-Cycle

call home actuation

spam DoS clickfraud proxies theft

espionage

IRC HTTP P2P hybrid

dirty webpage drive by download trojans

slide-4
SLIDE 4

exploit

Botnet Life-Cycle

call home actuation

spam DoS clickfraud proxies theft

espionage

IRC HTTP P2P hybrid

dirty webpage drive by download trojans

exploit actuation

signature matching software patching

slide-5
SLIDE 5

exploit

Botnet Life-Cycle

call home actuation

spam DoS clickfraud proxies theft

espionage

IRC HTTP P2P hybrid

dirty webpage drive by download trojans

exploit actuation

signature matching software patching

see RAID’09 proceedings

slide-6
SLIDE 6

exploit

Botnet Life-Cycle

call home actuation

spam DoS clickfraud proxies theft

espionage

IRC HTTP P2P hybrid

dirty webpage drive by download trojans

exploit actuation

signature matching software patching traffic anomaly detectors NBAD

see RAID’09 proceedings

slide-7
SLIDE 7

exploit

Botnet Life-Cycle

call home actuation

spam DoS clickfraud proxies theft

espionage

IRC HTTP P2P hybrid

dirty webpage drive by download trojans

exploit actuation

signature matching software patching traffic anomaly detectors NBAD traffic correlation port inspection payload analysis

noisy prone to false positives see RAID’09 proceedings

slide-8
SLIDE 8

exploit

Botnet Life-Cycle

call home actuation

spam DoS clickfraud proxies theft

espionage

IRC HTTP P2P hybrid

dirty webpage drive by download trojans

exploit actuation

signature matching software patching traffic anomaly detectors NBAD traffic correlation port inspection payload analysis

hard to adapt a-priori knowledge req. noisy prone to false positives see RAID’09 proceedings

slide-9
SLIDE 9

Botnets C&C invariants

  • Botmasters seldom try to connect to the

drones:

  • drones initiate the (rondezvous) connections
  • Drones need to call home often:
  • (if not) drone falls off the radar
slide-10
SLIDE 10

Botnets C&C invariants

  • Botmasters seldom try to connect to the

drones:

  • drones initiate the (rondezvous) connections
  • Drones need to call home often:
  • (if not) drone falls off the radar

watch outgoing traffic use a frequency based metric

slide-11
SLIDE 11

Our Solution: Canary

A general purpose, non- specific learning based behavioral detector to uncover botnet C&C destinations at the end-host

without a-priori assumptions about traffic types, destinations or protocols

slide-12
SLIDE 12

Our Solution: Canary

A general purpose, non- specific learning based behavioral detector to uncover botnet C&C destinations at the end-host

without a-priori assumptions about traffic types, destinations or protocols

slide-13
SLIDE 13

High Level Method

Training

Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

slide-14
SLIDE 14

High Level Method

Training

watch destinations whitelist frequent destinations

Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

slide-15
SLIDE 15

High Level Method

Training Detection

watch destinations whitelist frequent destinations

whitelist

params

Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

slide-16
SLIDE 16

High Level Method

Training Detection

watch destinations whitelist frequent destinations ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations

whitelist

params

Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

slide-17
SLIDE 17

High Level Method

Training Detection

watch destinations whitelist frequent destinations ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations

Botnet C&C’s are likely to be frequently visited whitelist

params

Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

slide-18
SLIDE 18

High Level Method

Training Detection

watch destinations whitelist frequent destinations ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations

Botnet C&C’s are likely to be frequently visited

adding to the whitelist is a very rare event

whitelist

params

Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

slide-19
SLIDE 19

Remaining Detail

Destination granularity: tracking IP address = large whitelists!

➡ destination atoms

(Frequency) metric needs to capture: loosely periodic behavior at unknown timescales

➡ persistence

10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9

We track persistence of destination atoms & build whitelists of destination atoms

slide-20
SLIDE 20

Destination Atoms

mail1.sc.intel.com mail3.sc.intel.com mail3.jf.intel.com xyz.google.com abs.google.com circuit.intel.com cps.circuit.intel.com

slide-21
SLIDE 21

Destination Atoms

mail1.sc.intel.com mail3.sc.intel.com mail3.jf.intel.com xyz.google.com abs.google.com circuit.intel.com cps.circuit.intel.com

mail.intel.com google.com circuit.intel.com

slide-22
SLIDE 22

Persistence

1 1 1 1 1 1 1 1 1

sliding window

slide-23
SLIDE 23

Persistence

1 1 1 1 1 1 1 1 1

sliding window

w

W

slide-24
SLIDE 24

Persistence

1 1 1 1 1 1 1 1 1

sliding window

3/7

w

W

slide-25
SLIDE 25

Persistence

1 1 1 1 1 1 1 1 1

sliding window

3/7 6/7

w

W

slide-26
SLIDE 26

Picking Timescale

Botnet X Botnet Y Botnet Z

connect to C&C hourly p-value: 24/24 =1 connect to C&C every 5-6 hours p-value: 4/24 = 0.17 connect to C&C once a day p-value: 1/24 =0.042

suppose: w= 1hr, W=24hr

slide-27
SLIDE 27

Picking Timescale

Botnet X Botnet Y Botnet Z

connect to C&C hourly p-value: 24/24 =1 connect to C&C every 5-6 hours p-value: 4/24 = 0.17 connect to C&C once a day p-value: 1/24 =0.042

suppose: w= 1hr, W=24hr Cannot assume a single, fixed timescale!!!!

slide-28
SLIDE 28

Selecting Timescale(s)

  • Select n overlapping timescales

TS1=(w1,W1), TS2=(w2,W2), TS3=(w3,W3),...., TSn=(wn,Wn)

  • pi := persistence of atom for TSi=(wi,Wi)
  • track pi concurrently for all the timescales
  • p(atom) := maxi pi(atom)
slide-29
SLIDE 29

Selecting Timescale(s)

  • Select n overlapping timescales

TS1=(w1,W1), TS2=(w2,W2), TS3=(w3,W3),...., TSn=(wn,Wn)

  • pi := persistence of atom for TSi=(wi,Wi)
  • track pi concurrently for all the timescales
  • p(atom) := maxi pi(atom)

Can become very expensive!

Trick: select Wi = k. wi

Then, use a single bitmap of size k.wmax

slide-30
SLIDE 30

Dataset: Training

  • Normal user traces collected from 157 end-hosts

for 4 weeks

  • Data collected on end-hosts
  • winpcap + wrapper code
  • traces assumed clean (some suspicious traffic observed: ground

truth not available)

  • Initial 2 weeks of data used for training
  • pick threshold for persistence
  • construct per user whitelists
slide-31
SLIDE 31

Picking threshold(s)

→seems reasonable

80 % of destinations have a p-value <0.2 20% of destinations have a p-value > 0.2

1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

persistence

# of atoms

if p(atom) > 0.6 add to whitelist

slide-32
SLIDE 32

Whitelist Sizes

14

12.5 25 40 70 90 110 125 145 # users whitelist size

slide-33
SLIDE 33

Validation

  • Started with 55 distinct malware binaries
  • 27 had traffic; 12 had traffic for longer than 1 day
  • Ran each malware for 1 week; all traffic logged
  • Packet traces ➜ flow traces [Bro]
  • Flow traces manually analyzed to isolate C&C traffic

windows auto update

malware

slide-34
SLIDE 34

ClamAV Signature C&C type # of C&C atoms C&C Volume min - max Trojan.Aimbot-25 port 22 1 0-5.7 Trojan.Wootbot-247 IRC port 12347 4 0-6.8 Trojan.Gobot.T IRC port 66659 1 0.2-2.1 Trojan.Codbot-14 IRC port 6667 2 0-9.2 Trojan.Aimbot-5 IRC via http proxy 3 0-10 Trojan.IRCBot-776* HTTP 16 0-1. Trojan.VB-666* IRC port 6667 1 0-1.3 Trojan.IRC-Script-50 IRC ports 6662-6669,9999,7000 8 0-2.1 8 Trojan.Spybot-248 port 9305 4 3.8-4.6 Trojan.MyBot-8926 IRC port 7007 1 0-0.1 Trojan.IRC.Zapchast-11 IRC ports 6666, 6667 9 0-1 Trojan.Peed-69 [Storm] P2P/Overnet 19672 0-30

Converted packet traces to flow traces and hand analyzed each trace individually

to identify/isolate C&C traffic

to identify/isolate attack traffic

slide-35
SLIDE 35

3 Detailed Examples

  • SDBot
  • 2 atoms in covert channel- identified by IRC server names
  • attack traffic- scans on ports 135, 139, 445 & 2097*
  • Zapchast
  • 9 atoms in covert channel- popular IRC ports
  • attack traffic- netbios(?)
  • Storm/Peacomm
  • ~82,000 atoms (almost all atoms are singletons)
  • no well known port/address for C&C destinations
  • attack traffic is SMTP (overwhelmingly), and possibly some http & ssh
slide-36
SLIDE 36

Connection Rates (per min)

37.5 75.0 112.5 150.0 SDBot ZapChast Storm

C&C Attack

slide-37
SLIDE 37

C&C Detection

0.0 0.2 0.4 0.6 0.8 1.0 1 hr 6 hr 12 hr 18 hr 24 hr SDBot ZapChast Storm

persistence

slide-38
SLIDE 38

C&C Detection (all)

Botnet Persistence Timescale # dest. atoms IRCBot-776 1.0 (10,1) 1 IRCBot-776 0.8 (200,20) 2 Aimbot-5 1.0 (10,1) 1 Aimbot-5 1.0 (40,4) 1 Aimbot-5 1.0 (160,16) 1 MyBot-8926 0.6 (160,16) 1 IRC.Zapchast-11 1.0 (40,4) 3 Spybot-248 1.0 (10,1) 2 IRC-Script-50 1.0 (10,1) 7 VB-666 0.7 (10,1) 1 Codbot-14 1.0 (10,1) 1 Gobot.T 1.0 (10,1) 1 Wootbot-247 1.0 (10,1) 3 IRC.Zapchast-11 1.0 (10,1) 6 Aimbot-25 1.0 (10,1) 1 Peed-69 [Storm] 1.0 (10,1) > 1

All samples detected by threshold 0.6

associated false positive rate~ 0.5/day

slide-39
SLIDE 39

Improvement in Anomaly Detection

Filtering traffic via whitelists:

  • Reduces volume of

suspicious traffic

  • Reduces False Positive Rate
  • f Anomaly detection
  • improves sensitivity (allows

lowered thresholds)

thresh

slide-40
SLIDE 40

Traffic Anomaly Detector Gains

slide-41
SLIDE 41

Caveats

  • CANARY is not botnet specific, so alarms are non-

specific (need external intelligence to characterize)

  • Cannot detect some forms of fast fluxing and some

P2P networks (Storm is easily detected though)

  • Single fast flux can be detected
  • New applications can cause false alarms
  • Cannot deal with malware hosted on “whitelisted”

sites

slide-42
SLIDE 42

Conclusions

  • Tracking persistence uncovers low intensity,

stealthy behavior such as C&C channels

  • Filtering by whitelist improves (traffic) anomaly

detection

  • Augments existing techniques and provides

additional coverage against unknown threats

  • Not a silver bullet, but this is an arms race
slide-43
SLIDE 43

Questions?

slide-44
SLIDE 44

False Positives

7 1 2 3 4 5 6

0.75 0.8 0.85 0.9 0.95 1

False Positives /day Detection Rate

0.1 0.2 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3

slide-45
SLIDE 45

Change in AD thresholds

slide-46
SLIDE 46

Change in AD thresholds

  • riginal~200

filtered~ 30