effective features for detecting effective features for
play

Effective features for detecting Effective features for detecting - PowerPoint PPT Presentation

Effective features for detecting Effective features for detecting IRC botnets IRC botnets Claudio Mazzariello, Carlo Sansone Carlo Sansone Claudio Mazzariello, Dipartimento di Informatica e Sistemistica Dipartimento di Informatica e


  1. Effective features for detecting Effective features for detecting IRC botnets IRC botnets Claudio Mazzariello, Carlo Sansone Carlo Sansone Claudio Mazzariello, Dipartimento di Informatica e Sistemistica Dipartimento di Informatica e Sistemistica University of Napoli Federico II University of Napoli Federico II via Claudio 21, 80125 Napoli (Italy) via Claudio 21, 80125 Napoli (Italy) {claudio.mazzariello, carlo.sansone}@unina.it {claudio.mazzariello, carlo.sansone}@unina.it Terzo workshop italiano su PRIvacy e SEcurity – “PRISE" – Roma 20 0ttobre 2008

  2. Problem Statement Problem Statement  Botnet  A network of infected hosts, named bots , under the control of an operator named botmaster  Control performed by using a Command & Control channel • Centralized (e.g. IRC, HTTP, ...) • Distributed (e.g. P2P...)  Commands out of a quite large and flexible set can be issued by the botmaster to each bot Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 2

  3. Motivation of this work Motivation of this work  Botnets keep spreading  Botnets are able to perform many malicious actions  Spam  ID theft  Clickfraud (e.g. Google AdSense abuse)  Cracking  Malware spreading  DDoS  Traffic Sniffing  Keylogging  Polls/statistics manipulation  …  Botnets involve economic interests  More dangerous than older attack types Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 3

  4. Contribution Contribution  Definition of a model of normal and botnet-related IRC channel usage  Definition of an architecture exploiting such a model for botnet detection  IRC user behavior classification aimed at botnet detection by means of pattern recognition techniques Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 4

  5. Presentation outline Presentation outline  An introduction to botnets  Details on IRC botnets  The proposed detection approach  IRC user behavior model  Detection system reference architecture  Experimental evaluation Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 5

  6. Centralized botnet's lifecycle Centralized botnet's lifecycle  bot-herder configures initial bot parameters and C&C details  register IP at DNS for rendezvous  bot-herder launches or seeds new bot(s) - bots spreading, botnet growing  Vulnerability discovery and exploitation  Malicious code download  DNS lookup for rendezvous  Join the C&C  Receive commands from the Botmaster  losing bots (stasis), botnet not growing  abandon botnet and sever traces  unregister DDNS Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 6

  7. Botnet Statistics Botnet Statistics  60% are IRC bots  70% of all the bots connect to a single IRC server  57,000 Active Bots per day for the first 6 months of 2006 ( Symantec )  4.7 million distinct computers being actively used in Botnets  Most Botnets are managed by a single server ( up to 15,000 bots )  Mocbot seized control of more than 7,700 machines within 24 hours Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 7

  8. Why IRC? Why IRC?  Oldest and most popular IM  Bots were commonly user by channel operator for management and monitoring purposes  Not owned by anyone – public  Defined in RFC 1459  Text based  Designed for both point-to-point and point-to-multipoint communication  one-to-one, or one-to-group chat  flexible, open-source protocol  Potentially able to manage a high number of clients  Grants anonymity for the botmaster Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 8

  9. Centralized C&C Centralized C&C  Easier to manage and use  Easier to disrupt  How do the bots know where the C&C is?  Hardcoded IP based rendezvous • easily uncovered • C&C needs replacement after disruption • All Bots need replacement  Domain names used for rendezvous • DNS RR can be updated to current C&C IP • Bots can dynamically point to the correct C&C IP Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 9

  10. Reference framework Reference framework Port based application protocol detection  RFC based IRC decoder  Model = representative features  Each IRC channel is represented by a  feature vector , representing its status Feature vectors are updated at each  event occurring in the corresponding IRC channel Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 10

  11. Intuitions about IRC based botnets Intuitions about IRC based botnets  Bursty channel activity  After command is issued, bots may respond at once, then be quiet  Limited vocabulary  Sentence structure  May resemble a shell command  The same recurring structure may be found in many sentences  Disproportion between user and control activity in a channel  “strange” words used for communication  Disproportion of consonants and vowels in words used for chatting • Language dependent  Changes and structure of chat room topic  Unusual nicknames  Completely random OR  Unexpextedly regular Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 11

  12. IRC channel features IRC channel features  Users Number:  Join Number:  total number of users in the channel  JOIN rate in the channel  Average words number:  SetMode Number:  average number of unique words in a  SetMode rate in the channel sentence  Nickname Changes:  Average/Variance of Channel Dictionary  count of nickname changes in a channel Cardinality:  Ping Number:  Mean and variance of the vocabulary’s  PING rate in the channel cardinality  IRC Commands Number:  Unusual Nicknames*  overall IRC command rate  Equal Answers:  Active Users Number:  number of sentences with a common ordered subset of words  number of users active in the channel  Control Commands Number:  count of channel control commands issued *J. Goebel and T. Holz. Rishi: identify bot contaminated hosts by irc nickname evaluation. In HotBots’07: Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, pages 8–8, Berkeley, CA, USA, 2007. USENIX Association. Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 12

  13. Experimental Setup Experimental Setup  Data collection  Botnet related traffic from the Georgia Institute of Technology network  Normal IRC chats logged from the University of Napoli network  Three datasets  50,000 samples (25,000 normal + 25,000 botnet-related) • Small, evenly split  149,999 samples (75,010 normal + 74,989 botnet-related) • Large, evenly split  165,000 samples (150,000 normal + 15,000 botnet-related) • Large, more realistic distribution of t-uples  Selected algorithms  SVM (Support Vector Machine) – very “popular”  J48 (Decision Tree) – very “quick”  Performance evaluation  10-fold cross validation Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 13

  14. Classification algorithms Classification algorithms  SVM – Kernel based method  Search for hyperplanes effectively separating ρ x data points r x′  Support vectors for providing better prediction performance  Non-linearly separable data can be trasformed by means of a kernel function in a space more suitable for linear separability  Separation hyperplane search is performed in transformed space φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ (.) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) φ ( ) Input space Feature space Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 14

  15. Classification algorithms Classification algorithms  J48 – Decision tree  Each attribute of the data can be used to make a decision which splits the data-set into smaller subsets  The normalized information gain is measured  The attribute generating the highest normalized information gain is chosen  The algorithm is recursively applied to the subsets Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 15

  16. Experimental results Experimental results Algorithm SVM J48 Samples 50000 149999 165000 50000 149999 165000 False alarm 0 0 0 < 0.001 0 0 Rate Missed 0 0 0 0 0 0 detection rate Most representative features  Limited vocabulary cardinality  Limited sentence variability  Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 16

  17. Conclusions Conclusions  Promising model for botnet activity detection  Tested on “real” data  Results hopefully valid in a general scenario  Model works with both a very reliable and a very quick classifier  Effective classification performed on a per-tuple basis  Botnet detection accuracy within strict performance boundaries Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend