Terzo workshop italiano su PRIvacy e SEcurity – “PRISE" – Roma 20 0ttobre 2008
Effective features for detecting Effective features for detecting - - PowerPoint PPT Presentation
Effective features for detecting Effective features for detecting - - PowerPoint PPT Presentation
Effective features for detecting Effective features for detecting IRC botnets IRC botnets Claudio Mazzariello, Carlo Sansone Carlo Sansone Claudio Mazzariello, Dipartimento di Informatica e Sistemistica Dipartimento di Informatica e
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
2
Problem Statement Problem Statement
- Botnet
A network of infected hosts, named bots, under the control
- f an operator named botmaster
Control performed by using a Command & Control channel
- Centralized (e.g. IRC, HTTP, ...)
- Distributed (e.g. P2P...)
Commands out of a quite large and flexible set can be issued by the botmaster to each bot
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
3
Motivation of this work Motivation of this work
- Botnets keep spreading
- Botnets are able to perform many malicious actions
Spam ID theft Clickfraud (e.g. Google AdSense abuse) Cracking Malware spreading DDoS Traffic Sniffing Keylogging Polls/statistics manipulation …
- Botnets involve economic interests
More dangerous than older attack types
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
4
Contribution Contribution
- Definition of a model of normal and botnet-related IRC
channel usage
- Definition of an architecture exploiting such a model for
botnet detection
- IRC user behavior classification aimed at botnet detection
by means of pattern recognition techniques
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
5
Presentation outline Presentation outline
- An introduction to botnets
- Details on IRC botnets
- The proposed detection approach
IRC user behavior model Detection system reference architecture
- Experimental evaluation
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
6
Centralized botnet's lifecycle Centralized botnet's lifecycle
- bot-herder configures initial bot parameters
and C&C details
- register IP at DNS for rendezvous
- bot-herder launches or seeds new bot(s) -
bots spreading, botnet growing
Vulnerability discovery and exploitation Malicious code download DNS lookup for rendezvous Join the C&C Receive commands from the
Botmaster
- losing bots (stasis), botnet not growing
- abandon botnet and sever traces
- unregister DDNS
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
7
Botnet Statistics Botnet Statistics
- 60% are IRC bots
70% of all the bots connect to a single IRC server
- 57,000 Active Bots per day for the first 6 months of
2006 ( Symantec )
- 4.7 million distinct computers being actively used in
Botnets
- Most Botnets are managed by a single server ( up to
15,000 bots )
- Mocbot seized control of more than 7,700 machines
within 24 hours
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
8
Why IRC? Why IRC?
- Oldest and most popular IM
Bots were commonly user by channel operator for management and monitoring purposes
- Not owned by anyone – public
- Defined in RFC 1459
- Text based
- Designed for both point-to-point and point-to-multipoint
communication
one-to-one, or one-to-group chat
- flexible, open-source protocol
- Potentially able to manage a high number of clients
- Grants anonymity for the botmaster
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
9
Centralized C&C Centralized C&C
- Easier to manage and use
- Easier to disrupt
- How do the bots know where the C&C is?
Hardcoded IP based rendezvous
- easily uncovered
- C&C needs replacement after disruption
- All Bots need replacement
Domain names used for rendezvous
- DNS RR can be updated to current C&C IP
- Bots can dynamically point to the correct C&C IP
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
10
Reference framework Reference framework
- Port based application protocol detection
- RFC based IRC decoder
- Model = representative features
Each IRC channel is represented by a feature vector, representing its status
Feature vectors are updated at each event occurring in the corresponding IRC channel
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
11
Intuitions about IRC based botnets Intuitions about IRC based botnets
- Bursty channel activity
After command is issued, bots may respond at once, then be quiet
- Limited vocabulary
- Sentence structure
May resemble a shell command The same recurring structure may be found in many sentences
- Disproportion between user and control activity in a channel
- “strange” words used for communication
Disproportion of consonants and vowels in words used for chatting
- Language dependent
- Changes and structure of chat room topic
- Unusual nicknames
Completely random OR Unexpextedly regular
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
12
IRC channel features IRC channel features
- Users Number:
total number of users in the channel
- Average words number:
average number of unique words in a sentence
- Average/Variance of Channel Dictionary
Cardinality:
Mean and variance of the vocabulary’s cardinality
- Unusual Nicknames*
- Equal Answers:
number of sentences with a common
- rdered subset of words
- Control Commands Number:
count of channel control commands issued
- Join Number:
JOIN rate in the channel
- SetMode Number:
SetMode rate in the channel
- Nickname Changes:
count of nickname changes in a channel
- Ping Number:
PING rate in the channel
- IRC Commands Number:
overall IRC command rate
- Active Users Number:
number of users active in the channel
*J. Goebel and T. Holz. Rishi: identify bot contaminated hosts by irc nickname evaluation. In HotBots’07: Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, pages 8–8, Berkeley, CA, USA, 2007. USENIX Association.
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
13
Experimental Setup Experimental Setup
- Data collection
Botnet related traffic from the Georgia Institute of Technology network Normal IRC chats logged from the University of Napoli network
- Three datasets
50,000 samples (25,000 normal + 25,000 botnet-related)
- Small, evenly split
149,999 samples (75,010 normal + 74,989 botnet-related)
- Large, evenly split
165,000 samples (150,000 normal + 15,000 botnet-related)
- Large, more realistic distribution of t-uples
- Selected algorithms
SVM (Support Vector Machine) – very “popular” J48 (Decision Tree) – very “quick”
- Performance evaluation
10-fold cross validation
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
14
Classification algorithms Classification algorithms
- SVM – Kernel based method
Search for hyperplanes effectively separating data points Support vectors for providing better prediction performance Non-linearly separable data can be trasformed by means of a kernel function in a space more suitable for linear separability Separation hyperplane search is performed in transformed space
r ρ x x′
φ(.)
φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( )φ( ) Feature space Input space
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
15
Classification algorithms Classification algorithms
- J48 – Decision tree
Each attribute of the data can be used to make a decision which splits the data-set into smaller subsets The normalized information gain is measured The attribute generating the highest normalized information gain is chosen The algorithm is recursively applied to the subsets
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
16
Experimental results Experimental results
Algorithm SVM J48 Samples 50000 149999 165000 50000 149999 165000 False alarm Rate < 0.001 Missed detection rate
- Most representative features
Limited vocabulary cardinality
Limited sentence variability
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
17
Conclusions Conclusions
- Promising model for botnet activity detection
- Tested on “real” data
Results hopefully valid in a general scenario
- Model works with both a very reliable and a very quick
classifier
- Effective classification performed on a per-tuple basis
Botnet detection accuracy within strict performance boundaries
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
18
Future work Future work
- Feature selection for faster training and detection
- Exploitation of “pure” anomaly detection techniques
Malicious traffic no longer necessary for training
- Further model refinement
- Further testing
More data available on eterogeneous IRC servers
- Exploitation of state-of-the-art application layer protocol recognition
techniques Blind protocol identification overcomes the limitations of port-based methods
- Perform classification on a per-channel basis
Try to reduce the number of errors
- Start thinking about complementary solutions for encrypted IRC
sessions
Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets
19