Botnet Detection through Analyzing Network Traffic using - - PowerPoint PPT Presentation

botnet detection through analyzing network traffic using
SMART_READER_LITE
LIVE PREVIEW

Botnet Detection through Analyzing Network Traffic using - - PowerPoint PPT Presentation

Botnet Detection through Analyzing Network Traffic using Statistical Signal Processing Methods Basil AsSadhan Department of Electrical Engineering & Center of Excellence in Information Assurance King Saud University April 13-14, 2014


slide-1
SLIDE 1

Botnet Detection through Analyzing Network Traffic using Statistical Signal Processing Methods

Basil AsSadhan

Department of Electrical Engineering & Center of Excellence in Information Assurance

King Saud University April 13-14, 2014 – Jumada Al-Akhirah 13-14, 1435 KFUPM, Dhahran, Saudi Arabia

slide-2
SLIDE 2

Approach

2

Firewall

LAN

Internet

Analyzer

Control Data

Decomposer

Analyze network traffic through statistical signal processing methods

slide-3
SLIDE 3

Outline

  • Botnets
  • Network Traffic Behavior Analysis
  • Control and Data Planes
  • Periodic Behavior in Network Traffic
  • Results
  • Conclusions and Current Work

3

slide-4
SLIDE 4

Botnets

  • Large network of compromised computers (bots) controlled by bot masters

– Consist of an army of hundreds to ten of thousands of bots – Some predict that 16–25% of Internet hosts are part of botnet

  • Pose a significant threat to network-based applications & communications

– Magnify the level of destruction of attacks such as DDoS attacks and scanning – Make identity theft, phishing, and E-mail spam more effective

  • How big is the problem?

– Over 1 million victim computers in USA1 – $20 M in economic losses in USA1 – 3.04 million distinct botnet computers2 – Responsible for 69% of spam E-mail

1 FBI 2007 2 April 2013 Symantec global Internet security threat report

4

slide-5
SLIDE 5

The Life Cycle of a Bot

  • 1. Be identified as future bot (Scanning)
  • 2. Gaining access to it (Trojan horse)
  • 3. Download bot software
  • 4. Receive botnet commands
  • 5. The actual execution of malicious activities

5

C2 Communication

slide-6
SLIDE 6

Botnets C2 Communication

  • Botnets rely on Command & Control (C2) communication channels
  • A bot master uses C2 communication channels to:

– Command bots to execute attack activities – Control bots to organize themselves to download malware code

  • C2 communication ~ the intelligence communication prior to the attack
  • Detecting C2 => detecting bots before any target is attacked
  • Problem: C2 has low volume and is well behaved
  • Bots are typically pre-programmed to check for updates every T sec
  • C2 traffic in many botnet variants exhibits periodic behavior per host-port pair

6

50 100 150 200 250 0.5 1 1.5 2 2.5 3

slide-7
SLIDE 7

Network Traffic Behavior Analysis

  • Understanding network traffic behavior helps to:

– Detect abnormal behavior

  • In network security: attacks and malicious behavior
  • In network management: failures and malfunctions

– Design and improve future networks

  • Difficulties include:

– Large amount of traffic – Continuous emergence of new high bandwidth applications

7

slide-8
SLIDE 8

Network Traffic Behavior Analysis (Cont.)

  • Problem:

High demands on methods that are:

– Scalable – Efficient (Accurate & Fast)

  • Objective:

– Reduce amount of traffic to consider during analysis – Develop accurate methods under this constrain

  • Proposed Solution:

1. Aggregate network traffic 2. Use control traffic as a surrogate for the whole traffic 3. Apply statistical signal processing methods (correlation, periodogram)

8

slide-9
SLIDE 9

Scalability: Aggregate Traffic

  • Preprocess packet traces to produce a discrete time sequence

– Over a time unit, aggregate packets originating from

  • r destined to a given: host, host-port pair, or subnet

– Extract a count-feature Examples: packet, byte, address, and port count – Coarse (multiple flows) instead of fine (single flow) granularity

  • Use discrete time series analysis to monitor aggregate traffic:

– Pro: Reduce substantially the amount of traffic to process reduce computations and time => higher efficiency & scalability – Con: Track less details might mean having less knowledge => less accurate analysis

9

D S1 S2 S3 S4

slide-10
SLIDE 10

Scalability: Control and Data Planes

  • Use control traffic as a surrogate for whole traffic

– Control traffic’s volume is 4–8% of total traffic – Minimize amount of traffic to consider – Higher efficiency and scalability

  • Assumptions

– Data traffic generation is based on control traffic generation They should have similar behaviors during benign uses – The relationship might differ during abnormal behavior

10

slide-11
SLIDE 11

Control and Data Planes (Cont.)

  • For an enterprise LAN traffic

– Control traffic: Set, maintain, or tear down a connection – Data traffic: Actual transmission of data

  • TCP packets: Use flag and sequence number fields

– Control packets: SYN, FIN, RST, and bare ACKs – Data packets: All other packets

  • UDP packets: Application dependent
  • ICMP packets: All control
  • UDP and ICMP packets are not considered

11

slide-12
SLIDE 12

Preprocessing: Network Traffic

1. Select a suitable time unit to aggregate packets – Objective: high count variability – Depends on the packet rate of the traffic 2. Extract count-features to produce discrete time sequences – Packet count – Distinct address count – Byte count – Distinct port count

12

length src addr dest addr src port # dest port #

time

  • No. packets

small large suitable

slide-13
SLIDE 13

“LBNL/ICSI Enterprise Tracing Project” Packet Trace

  • tcpdump traffic collected from two internal network locations at LBNL
  • LBNL is a research institute with a medium-sized enterprise network
  • Only header information is released to the public for privacy reasons
  • The traffic to and from a given subnet is only collected:

– Intersubnet traffic – Wide area traffic between enterprise network and the rest of the Internet – NOT the intrasubnet traffic that remained within the subnet

  • The packet traces are divided into five datasets, each dataset:

– Has couple of hours of activity – Has 17.8 million to 64.7 million packets – Includes 1,500–2,500 monitored hosts – Covers the traffic communication of 18–22 internal subnets

  • IP addresses are carefully anonymized

– Uses the same anonymized network ID for hosts belonging to the same internal subnet – Except for scanners as this might risk revealing the original host

  • We extracted packet, byte, address, and port counts

13

slide-14
SLIDE 14

Control vs. Data: LBNL/ICSI Dataset

14 Aggregation Interval: 60 seconds Packet Count: Control Traffic 40% Byte Count: Control Traffic 2.5%

Efficiency Scalability

slide-15
SLIDE 15

Control vs. Data: LBNL/ICSI Dataset

15 Aggregation Interval: 60 seconds Address Count: Control Traffic 52% Port Count: Control Traffic 52%

slide-16
SLIDE 16

Attack Detection

An attack that manifests itself in the control plane only An attack that manifests itself in both control and data planes

Packet Traces from 1999 DARPA Dataset

16

slide-17
SLIDE 17
  • Periodic behavior arises in network traffic:

– Benign applications: E-mail, software updates – Malicious applications: Botnets

  • Power Spectral Density (PSD) provides the power at different frequencies

17

Q: How significant should the peak be?

( )

− = − ≤ ≤ 1 1

] [ 2 1 ] [ max

m k xx xx m k

k P m k P

Periodic Behavior in Network Traffic

slide-18
SLIDE 18

RESULTS

18

slide-19
SLIDE 19

SLINGbot: 1st TinyP2P Botnet

  • 5 bots and 1 bot master in a mesh topology
  • TinyP2P protocol on port number 11375

TinyP2P is used by real botnets

  • Each bot is pre-programmed to update its data every 3 seconds by contacting
  • ther bots
  • Run time: ~ 20 seconds
  • C2 traffic in each bot is similar

– Only the traffic of one of them is discussed

  • We use an aggregation interval of 100 ms
  • We select the false alarm probability α = 0.1%

19

slide-20
SLIDE 20

Periodic Behavior: 1st TinyP2P C2 Traffic

20

g*= 61.7 > zα= 23.5 313 mHz, 3.2 second g*= 70.7 > zα= 23.5 313 mHz, 3.2 second

slide-21
SLIDE 21

SLINGbot: 2nd TinyP2P Botnet

  • 5 bots and 1 bot master in a mesh topology
  • TinyP2P protocol on port number 11375

TinyP2P is used by real botnets

  • Bots are pre-programmed to update their data by contacting other bots every

3, 4, 5, 6, 7, 3 seconds

  • Run time: ~ 35 seconds
  • The traffic of the 4 and 6 seconds bots is discussed
  • We use an aggregation interval of 100 ms
  • We select the false alarm probability α = 0.1%

21

slide-22
SLIDE 22

Duty Cycle: 2nd TinyP2P C2 Traffic

22

g*= 101.4 > zα= 24.9 273 mHz, 3.7 second g*= 77.0 > zα= 24.9 176 mHz, 5.9 second g* is higher when the duty cycle is higher

slide-23
SLIDE 23

Periodic Behavior: 2nd TinyP2P C2 Traffic

With Injected Random Poisson Noise (SNR= –6 dB)

23

g*= 31.8 > zα= 24.9 273 mHz, 3.7 second g*= 17.0 < zα= 24.9 No longer periodic What about address count?

slide-24
SLIDE 24

( )

1 1 *

SNR 1 SNR

− −

+ +

x

g

24

1 *

SNR 1

+

x

g

Upper Bound Lower Bound Averaging 2000 runs of added Poisson noise

Bounding the Effect of Random Noise Traffic

slide-25
SLIDE 25

SLINGbot: IRC Botnet

  • 20 bots and 1 bot master in a star topology
  • IRC protocol on port number 6667
  • Each bot is pre-programmed to update its data every 5 seconds by contacting

bot master

  • Run time: ~ 40 seconds
  • C2 traffic in each bot is similar

– Only the traffic of one of them is discussed

  • We use an aggregation interval of 100 ms
  • We select the false alarm probability α = 0.1%

25

slide-26
SLIDE 26

Periodic Behavior: IRC C2 Traffic

26

g*= 40.8 > zα= 24.9 195 mHz, 5.1 second g*= 46.2 > zα= 24.9 195 mHz, 5.1 second

slide-27
SLIDE 27

Periodic Behavior: IRC C2 Control Plane Traffic

27

Bytes: 2.5% g*= 35.7 > zα= 24.9 195 mHz, 5.1 second g*= 34.9 > zα= 24.9 195 mHz, 5.1 second Control plane traffic: Packets: 47%

slide-28
SLIDE 28

C2 Traffic with Background HTTP Traffic

HTTP traffic obtained from LBNL/ICSI

28

  • A bot master can attempt to evade detection by using port 80 (HTTP)
slide-29
SLIDE 29

Periodic Behavior: 1st TinyP2P C2 Traffic

29

g*= 61.7 > zα= 23.5 313 mHz, 3.2 second g*= 70.7 > zα= 23.5 313 mHz, 3.2 second

slide-30
SLIDE 30

P2P C2 Traffic + HTTP 1 Traffic

30

g*= 54.1 > zα= 23.5 313 mHz, 3.2 second g*= 57.7 > zα= 23.5 313 mHz, 3.2 second

slide-31
SLIDE 31

P2P C2 Traffic + HTTP 2 Traffic

31

g*= 21.4 < zα= 23.5 No longer periodic g*= 46.6 > zα= 23.5 313 mHz, 3.2 second

slide-32
SLIDE 32

P2P C2 Traffic + HTTP 3 Traffic

32

g*= 14.6 < zα= 23.5 No longer periodic g*= 49.5 > zα= 23.5 313 mHz, 3.2 second

slide-33
SLIDE 33

P2P C2 Traffic + Total HTTP Traffic

33

g*= 22.4 < zα= 23.5 No longer periodic g*= 35.8 > zα= 23.5 313 mHz, 3.2 second

slide-34
SLIDE 34

Limitations

  • Randomize period within a certain small range (e.g. 4-6 seconds)

– This can be modeled by a random phase – Detectability will depend on:

  • How large the random phase is
  • The period's length and the duty cycle
  • Randomize period within a larger range

– The signal is no longer periodic – This will succeed in evading the test – However will limit the efficiency of the exchange of C2 channel traffic

  • Other applications have periodic nature (e.g., E-mail)

– Can be solved with white listing

34

slide-35
SLIDE 35

Conclusions

  • Monitor network traffic behavior

– Scalability Aggregate traffic – Efficiency Control plane traffic

  • Botnet detection

– Detect periodic behavior of C2 traffic – Independent of the structure and communication protocol used in the botnet – Does not require a priori knowledge (e.g., signature of botnet behavior)

  • Validation

– Botnet C2 traffic exhibits periodic behavior – E-mail traffic exhibits periodic behavior (not shown in presentation) – True whether we look at the control plane traffic only or the whole traffic – Develop bounds on the effect of injected noise traffic – Address count of C2 traffic resists background HTTP traffic – Performance increases with: increase of duty cycle, decrease of period, and/or observing traffic longer

35

slide-36
SLIDE 36

Current Work

  • Done with capturing and preprocessing new dataset

– 50 full days worth of traffic – 11 TB of packets covering more than 10,000 hosts at KSU

  • Ongoing work

– Validating botnet detection results on new dataset – Applying prediction models to detect dissimilarities – Studying LRD behavior of new dataset

36

slide-37
SLIDE 37

Published Work

  • Basil AsSadhan and José M. F. Moura, “An Efficient Method to Detect Periodic Behavior in Botnet Traffic by Analyzing

Control Plane Traffic,” Journal of Advanced Research, 2013, available online at: http://dx.doi.org/10.1016/j.jare.2013.11.005.

  • Abdulmuneem Bashaiwth, Basil AsSadhan, Jalal Al-Muhtadi, and Saleh Alshebeili, “Efficient Detection of Real-World

Botnets' Command and Control Channels Traffic,” in Proceedings of International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, April 1–3, 2014.

  • Rayan AlShaalan, Basil AsSadhan, Jalal Al-Muhtadi, Hesham Bin-Abbas, Fathi Sayed, and Saleh Alshebeili, “Constant

False Alarm Rate Anomaly-Based Approach for Network Intrusion Detection,” in Proceedings of International Conference on High-capacity Optical Networks and Emerging/Enabling Technologies (HONET), Magosa, Cyprus, December 11–13, 2013.

  • Basil AsSadhan, José M. F. Moura, and David Lapsley, “Periodic Behavior in Botnet Command and Control Channels

Traffic,” accepted at IEEE GLOBECOM, Honolulu, Hawaii, USA, Nov 30–Dec 4 2009.

  • Basil AsSadhan, José M. F. Moura, David Lapsley, Christine Jones, and W. Timothy Strayer, “Detecting Botnets using

Command and Control Traffic,” in Proceedings of IEEE International Symposium on Network Computing and Applications (NCA), Cambridge, MA, USA, July 9–11 2009.

  • Basil AsSadhan, Hyong Kim, José M. F. Moura, Xiaohui Wang, “Network Traffic Behavior Analysis by Decomposition

into Control and Data Planes,” in Proceedings of the 4th International Workshop on Security in Systems and Networks (SSN) in conjunction with IEEE IPDPS 2008, Miami, FL, USA, April 18, 2008.

37

slide-38
SLIDE 38

Acknowledgments

  • Prof. José M. F. Moura
  • Dr. David Lapsley
  • Dr. W. Timothy Strayer
  • Dr. Alden Jackson
  • Ms. Christine Jones

38

slide-39
SLIDE 39

Questions

39