 
              Botnet Detection through Analyzing Network Traffic using Statistical Signal Processing Methods Basil AsSadhan Department of Electrical Engineering & Center of Excellence in Information Assurance King Saud University April 13-14, 2014 – Jumada Al-Akhirah 13-14, 1435 KFUPM, Dhahran, Saudi Arabia
Approach Decomposer Control Data Internet Firewall Analyzer LAN Analyze network traffic through statistical signal processing methods 2
Outline • Botnets • Network Traffic Behavior Analysis • Control and Data Planes • Periodic Behavior in Network Traffic • Results • Conclusions and Current Work 3
Botnets • Large network of compromised computers (bots) controlled by bot masters Consist of an army of hundreds to ten of thousands of bots – Some predict that 16–25% of Internet hosts are part of botnet – • Pose a significant threat to network-based applications & communications – Magnify the level of destruction of attacks such as DDoS attacks and scanning – Make identity theft, phishing, and E-mail spam more effective • How big is the problem? Over 1 million victim computers in USA 1 – $20 M in economic losses in USA 1 – 3.04 million distinct botnet computers 2 – – Responsible for 69% of spam E-mail 1 FBI 2007 2 April 2013 Symantec global Internet security threat report 4
The Life Cycle of a Bot 1. Be identified as future bot (Scanning) 2. Gaining access to it (Trojan horse) 3. Download bot software C2 Communication 4. Receive botnet commands 5. The actual execution of malicious activities 5
Botnets C2 Communication Botnets rely on Command & Control (C2) communication channels • A bot master uses C2 communication channels to: • – Command bots to execute attack activities – Control bots to organize themselves to download malware code C2 communication ~ the intelligence communication prior to the attack • • Detecting C2 => detecting bots before any target is attacked • Problem: C2 has low volume and is well behaved • Bots are typically pre-programmed to check for updates every T sec • C2 traffic in many botnet variants exhibits periodic behavior per host-port pair 3 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250 6
Network Traffic Behavior Analysis • Understanding network traffic behavior helps to: – Detect abnormal behavior • In network security: attacks and malicious behavior • In network management: failures and malfunctions – Design and improve future networks • Difficulties include: – Large amount of traffic – Continuous emergence of new high bandwidth applications 7
Network Traffic Behavior Analysis (Cont.) • Problem: High demands on methods that are: – Scalable – Efficient (Accurate & Fast) • Objective: – Reduce amount of traffic to consider during analysis – Develop accurate methods under this constrain • Proposed Solution: 1. Aggregate network traffic 2. Use control traffic as a surrogate for the whole traffic 3. Apply statistical signal processing methods (correlation, periodogram) 8
Scalability: Aggregate Traffic • Preprocess packet traces to produce a discrete time sequence S 1 – Over a time unit, aggregate packets originating from S 2 or destined to a given: host, host-port pair, or subnet D – Extract a count-feature S 3 Examples: packet, byte, address, and port count – Coarse (multiple flows) instead of fine (single flow) granularity S 4 • Use discrete time series analysis to monitor aggregate traffic: – Pro: Reduce substantially the amount of traffic to process reduce computations and time => higher efficiency & scalability – Con: Track less details might mean having less knowledge => less accurate analysis 9
Scalability: Control and Data Planes Use control traffic as a surrogate for whole traffic • – Control traffic’s volume is 4–8% of total traffic – Minimize amount of traffic to consider – Higher efficiency and scalability • Assumptions Data traffic generation is based on control traffic generation – They should have similar behaviors during benign uses – The relationship might differ during abnormal behavior 10
Control and Data Planes (Cont.) • For an enterprise LAN traffic – Control traffic: Set, maintain, or tear down a connection – Data traffic: Actual transmission of data • TCP packets: Use flag and sequence number fields – Control packets: SYN, FIN, RST, and bare ACKs – Data packets: All other packets • UDP packets: Application dependent • ICMP packets: All control • UDP and ICMP packets are not considered 11
Preprocessing: Network Traffic 1. Select a suitable time unit to aggregate packets – Objective: high count variability – Depends on the packet rate of the traffic No. packets time small large suitable 2. Extract count-features to produce discrete time sequences – Packet count – Distinct address count – Byte count – Distinct port count length src addr dest addr src port # dest port # 12
“LBNL/ICSI Enterprise Tracing Project” Packet Trace • tcpdump traffic collected from two internal network locations at LBNL • LBNL is a research institute with a medium-sized enterprise network Only header information is released to the public for privacy reasons • The traffic to and from a given subnet is only collected: • – Intersubnet traffic – Wide area traffic between enterprise network and the rest of the Internet – NOT the intrasubnet traffic that remained within the subnet • The packet traces are divided into five datasets, each dataset: Has couple of hours of activity – – Has 17.8 million to 64.7 million packets Includes 1,500–2,500 monitored hosts – – Covers the traffic communication of 18–22 internal subnets • IP addresses are carefully anonymized Uses the same anonymized network ID for hosts belonging to the same internal subnet – – Except for scanners as this might risk revealing the original host • We extracted packet, byte, address, and port counts 13
Control vs. Data: LBNL/ICSI Dataset Packet Count: Control Traffic 40% Efficiency Byte Count: Control Traffic 2.5% Scalability Aggregation Interval: 60 seconds 14
Control vs. Data: LBNL/ICSI Dataset Address Count: Control Traffic 52% Port Count: Control Traffic 52% Aggregation Interval: 60 seconds 15
Attack Detection An attack that manifests An attack that manifests itself itself in the control plane only in both control and data planes Packet Traces from 1999 DARPA Dataset 16
Periodic Behavior in Network Traffic • Periodic behavior arises in network traffic: – Benign applications: E-mail, software updates – Malicious applications: Botnets • Power Spectral Density (PSD) provides the power at different frequencies Q: How significant should the peak be? ( ) max P [ k ] xx ≤ ≤ − 0 k m 1 − m 1 1 ∑ P [ k ] xx 2 m = k 0 17
RESULTS 18
SLINGbot: 1 st TinyP2P Botnet • 5 bots and 1 bot master in a mesh topology • TinyP2P protocol on port number 11375 TinyP2P is used by real botnets Each bot is pre-programmed to update its data every 3 seconds by contacting • other bots • Run time: ~ 20 seconds • C2 traffic in each bot is similar Only the traffic of one of them is discussed – • We use an aggregation interval of 100 ms We select the false alarm probability α = 0.1% • 19
Periodic Behavior: 1 st TinyP2P C2 Traffic g * = 61.7 > z α = 23.5 g * = 70.7 > z α = 23.5 313 mHz, 3.2 second 313 mHz, 3.2 second 20
SLINGbot: 2 nd TinyP2P Botnet • 5 bots and 1 bot master in a mesh topology • TinyP2P protocol on port number 11375 TinyP2P is used by real botnets Bots are pre-programmed to update their data by contacting other bots every • 3, 4, 5, 6, 7, 3 seconds • Run time: ~ 35 seconds The traffic of the 4 and 6 seconds bots is discussed • We use an aggregation interval of 100 ms • We select the false alarm probability α = 0.1% • 21
Duty Cycle: 2 nd TinyP2P C2 Traffic g * is higher when the duty cycle is higher g * = 101.4 > z α = 24.9 g * = 77.0 > z α = 24.9 273 mHz, 3.7 second 176 mHz, 5.9 second 22
Periodic Behavior: 2 nd TinyP2P C2 Traffic With Injected Random Poisson Noise (SNR= –6 dB) What about address count? g * = 31.8 > z α = 24.9 g * = 17.0 < z α = 24.9 No longer periodic 273 mHz, 3.7 second 23
Bounding the Effect of Random Noise Traffic Averaging 2000 runs of added Poisson noise Upper Bound ( ) − + * 1 g SNR x − + 1 1 SNR Lower Bound * g x − + 1 1 SNR 24
SLINGbot: IRC Botnet • 20 bots and 1 bot master in a star topology • IRC protocol on port number 6667 • Each bot is pre-programmed to update its data every 5 seconds by contacting bot master • Run time: ~ 40 seconds • C2 traffic in each bot is similar Only the traffic of one of them is discussed – We use an aggregation interval of 100 ms • We select the false alarm probability α = 0.1% • 25
Periodic Behavior: IRC C2 Traffic g * = 40.8 > z α = 24.9 g * = 46.2 > z α = 24.9 195 mHz, 5.1 second 195 mHz, 5.1 second 26
Periodic Behavior: IRC C2 Control Plane Traffic Control plane traffic: Packets: 47% Bytes: 2.5% g * = 35.7 > z α = 24.9 g * = 34.9 > z α = 24.9 195 mHz, 5.1 second 195 mHz, 5.1 second 27
Recommend
More recommend