Challenges in Experimenting with Botnet Detection Systems Adam - - PowerPoint PPT Presentation

challenges in experimenting
SMART_READER_LITE
LIVE PREVIEW

Challenges in Experimenting with Botnet Detection Systems Adam - - PowerPoint PPT Presentation

Challenges in Experimenting with Botnet Detection Systems Adam J. Aviv Andreas Haeberlen University of Pennsylvania August 8th, 2011 CSET-2011 1 Alice has developed a new botnet detector!!! What should the evaluation show? Alice's


slide-1
SLIDE 1

August 8th, 2011 CSET-2011 1

Challenges in Experimenting

with

Botnet Detection Systems

Adam J. Aviv Andreas Haeberlen University of Pennsylvania

slide-2
SLIDE 2

August 8th, 2011 CSET-2011 2

Alice has developed a new botnet detector!!!

What should the evaluation show?

Alice's Detector

slide-3
SLIDE 3

August 8th, 2011 CSET-2011 3

Ideal

Alice deploys her detector live on her local network

Alice is provided with a list of hosts that are botnet infected

Alice deploys her detector on various other networks

Academic, Residential, Corporate, etc.

Alice records traces of each deployment

Improve detector in the lab Readily available to other researchers

slide-4
SLIDE 4

August 8th, 2011 CSET-2011 4

Realities

Production-ready deployment? Ground truth of botnet infections? Deployment on various networks? Record trace and replay experiment? Traces available to other researchers?

slide-5
SLIDE 5

August 8th, 2011 CSET-2011 5

T aking a Step Back

slide-6
SLIDE 6

August 8th, 2011 CSET-2011 6

Many Challenges

Multiple Administrative Domains Network Heterogeneity Multimorbidity Privacy Controlled Environments Artifact Overfitting Botnet Overfitting

Focus on Academic Networks Scale Mixing Artifacts False Postives & Negatives Repeatability Comparability Lack of Verification

slide-7
SLIDE 7

August 8th, 2011 CSET-2011 7

privacy

slide-8
SLIDE 8

August 8th, 2011 CSET-2011 8

We have to worry about privacy, but the botnet authors don't!

slide-9
SLIDE 9

August 8th, 2011 CSET-2011 9

Can we do better together?

slide-10
SLIDE 10

August 8th, 2011 CSET-2011 10

Discussion/T

  • pics/Questions

Experimental Ideals vs. Realities

Not just botnet detectors ...

Raw Materials of the Experiment

Sharing and Obtaining Traces Botnet and Background Traces

Can we do better via collaboration?

slide-11
SLIDE 11

August 8th, 2011 CSET-2011 11

Experimental Challenges Overlay Methodology What can be done?

Obtaining Traces Sharing Traces

Pitfalls

Presentation Outline Ideal vs. Reality

slide-12
SLIDE 12

August 8th, 2011 CSET-2011 12

Alice has developed a new botnet detector!!!

What should the evaluation show?

Alice's Detector

slide-13
SLIDE 13

August 8th, 2011 CSET-2011 13

Ideal vs. Reality

Alice deploys her detector live on her local network

Alice is provided with list of hosts that are botnet infected

Alice deploys her detector on other various networks

Corporate, Residential, Corporate, etc.

Alice records traces of each deployment

Improve detector in the lab Readily available to other researchers

Production-ready deployment? Ground truth of botnet infections? Deployment on various networks? Record trace and replay experiment? Traces available to other researchers?

slide-14
SLIDE 14

August 8th, 2011 CSET-2011 14

Evaluation Realities

Network Heterogeneity Multiple Administrative Domains Lack of Ground T ruth Overfitting Privacy

Modernity Comparability & Repeatability Performance Realistic Settings

slide-15
SLIDE 15

August 8th, 2011 CSET-2011 15

Pitfalls

Experimental Challenges Overlay Methodology What can be done?

Obtaining Traces Sharing Traces

Pitfalls

slide-16
SLIDE 16

August 8th, 2011 CSET-2011 16

Overlay Methodology

v v v v v v v v Network Trace Internet Anonymizer

slide-17
SLIDE 17

August 8th, 2011 CSET-2011 17

Replay and Evaluate

Network Trace Detected 2 Bots! v v v v v v v v v v Collected Independently Background Trace is Sensitive

slide-18
SLIDE 18

August 8th, 2011 CSET-2011 18

Prevalence in the Literature

Overlay Methodology Other Methodology [13] [49] [15] [36] [46] [47] [41] [23] [6] [7] [28] [25] [24] [14] [20] [14] [45] [36] [11] [5] * See paper for references.

slide-19
SLIDE 19

August 8th, 2011 CSET-2011 19

Advantages of Overlay Methodology

Ground Truth

v v v v v v v v v v

slide-20
SLIDE 20

August 8th, 2011 CSET-2011 20

Pitfalls

Experimental Challenges Overlay Methodology What can be done?

Obtaining Traces Sharing Traces

Pitfalls

slide-21
SLIDE 21

August 8th, 2011 CSET-2011 21

Obtaining Traces

Realism

Merging of Botnet and Background trace should be realistic

slide-22
SLIDE 22

August 8th, 2011 CSET-2011 22

Collecting Botnet Traces

v

slide-23
SLIDE 23

August 8th, 2011 CSET-2011 23

Realistic Embedding

v v v v v v v v v v SPAM!

Residential ISP

?

slide-24
SLIDE 24

August 8th, 2011 CSET-2011 24

Mixing Artifacts

v v v v v v v v v v

DHCP

v v

slide-25
SLIDE 25

August 8th, 2011 CSET-2011 25

Multimorbidity

v v v v v v v v v v v v v v

slide-26
SLIDE 26

August 8th, 2011 CSET-2011 26

Obtaining Traces

Realism

Merging of Botnet and Background trace should be realistic

Representativeness

Reflect diversity in network scenarios

slide-27
SLIDE 27

August 8th, 2011 CSET-2011 27

Focus on Academic Networks

v v v v v v v v v v

State University Corporate Business

slide-28
SLIDE 28

August 8th, 2011 CSET-2011 28

Prevalence in the Literature

Academic Traces At Least One Other Trace Overlay Methodology Other Methodology [13] [49] [15] [36] [46] [47] [41] [23] [6] [7] [28] [25] [24] [14] [36] [11] [5] [20] [14] [45] * See paper for references.

slide-29
SLIDE 29

August 8th, 2011 CSET-2011 29

Scale

v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v v vv v v v v v

slide-30
SLIDE 30

August 8th, 2011 CSET-2011 30

Obtaining Traces

Realism

Merging of Botnet and Background trace should be realistic

Representativeness

Reflect diversity in network scenarios

Performance

False postives and negatives

slide-31
SLIDE 31

August 8th, 2011 CSET-2011 31

Lack of Verification

v v v v v v v v v

slide-32
SLIDE 32

August 8th, 2011 CSET-2011 32

Example From the Literature T aMD

We suspect that the reason not every bot in the botnet was detected is due to the randomness in our choice of selected internal hosts to which the malware traffic was assigned, such that a selected internal host that was also contacting other suspicious subnets (not relevant to the botnet) is likely to bias the dimension reduction and clustering algorithm.

“ ”

slide-33
SLIDE 33

August 8th, 2011 CSET-2011 33

privacy

slide-34
SLIDE 34

August 8th, 2011 CSET-2011 34

Sharing Traces

Is the experiment independently repeatable? Can we do apples to apples comparison?

v v v v v v v v v v

slide-35
SLIDE 35

August 8th, 2011 CSET-2011 35

What can be done?

Experimental Challenges Overlay Methodology What can be done?

Obtaining Traces Sharing Traces

Pitfalls

slide-36
SLIDE 36

August 8th, 2011 CSET-2011 36

Observations

Much of these challenges stem from difficulties in sharing and obtaining realistic data sets.

Similar to problems faced by researchers studying large scale distributed systems

  • --> PlanetLab
slide-37
SLIDE 37

August 8th, 2011 CSET-2011 37

A PlanetLab for Botnet Detection?

Can we do better together?

slide-38
SLIDE 38

August 8th, 2011 CSET-2011 38

Strawman

Distributed Evaluation

PlanetLab-like nodes on participating networks Cannot communicate network traces outside of network

Researchers Deploy Detector Code on Nodes

Reports are reviewed and declassified by sys-admins Researcher can test and debug on local node

Incentives

Sys-Admins gain access to bleeding edge detectors, for FREE! Researchers gain insight into usefulness of reports or “ground truth”

slide-39
SLIDE 39

August 8th, 2011 CSET-2011 39

Address Challenges

Network Heterogeneity Multiple Administrative Domains Lack of Ground Truth Overfitting Privacy

Modernity Comparability & Repeatability Performance Realistic Settings

slide-40
SLIDE 40

August 8th, 2011 CSET-2011 40

Huge Deployment Challenges Privacy Accountability

slide-41
SLIDE 41

August 8th, 2011 CSET-2011 41

Conclusions

T aking a step back

Literature Review Ideal is hard

Ideal vs. Reality

Privacy! Sharing and Obtaining realistic traces

Overlay Methodology

And, its pitfalls

Can we do better together?

PlanetLab for Botnet detectors?

slide-42
SLIDE 42

August 8th, 2011 CSET-2011 42

Backup