Effective features for detecting Effective features for detecting - - PowerPoint PPT Presentation

effective features for detecting effective features for
SMART_READER_LITE
LIVE PREVIEW

Effective features for detecting Effective features for detecting - - PowerPoint PPT Presentation

Effective features for detecting Effective features for detecting IRC botnets IRC botnets Claudio Mazzariello, Carlo Sansone Carlo Sansone Claudio Mazzariello, Dipartimento di Informatica e Sistemistica Dipartimento di Informatica e


slide-1
SLIDE 1

Terzo workshop italiano su PRIvacy e SEcurity – “PRISE" – Roma 20 0ttobre 2008

Effective features for detecting Effective features for detecting IRC botnets IRC botnets

Claudio Mazzariello, Claudio Mazzariello, Carlo Sansone Carlo Sansone Dipartimento di Informatica e Sistemistica Dipartimento di Informatica e Sistemistica University of Napoli Federico II University of Napoli Federico II via Claudio 21, 80125 Napoli (Italy) via Claudio 21, 80125 Napoli (Italy) {claudio.mazzariello, carlo.sansone}@unina.it {claudio.mazzariello, carlo.sansone}@unina.it

slide-2
SLIDE 2

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

2

Problem Statement Problem Statement

  • Botnet

 A network of infected hosts, named bots, under the control

  • f an operator named botmaster

 Control performed by using a Command & Control channel

  • Centralized (e.g. IRC, HTTP, ...)
  • Distributed (e.g. P2P...)

 Commands out of a quite large and flexible set can be issued by the botmaster to each bot

slide-3
SLIDE 3

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

3

Motivation of this work Motivation of this work

  • Botnets keep spreading
  • Botnets are able to perform many malicious actions

 Spam  ID theft  Clickfraud (e.g. Google AdSense abuse)  Cracking  Malware spreading  DDoS  Traffic Sniffing  Keylogging  Polls/statistics manipulation  …

  • Botnets involve economic interests

 More dangerous than older attack types

slide-4
SLIDE 4

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

4

Contribution Contribution

  • Definition of a model of normal and botnet-related IRC

channel usage

  • Definition of an architecture exploiting such a model for

botnet detection

  • IRC user behavior classification aimed at botnet detection

by means of pattern recognition techniques

slide-5
SLIDE 5

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

5

Presentation outline Presentation outline

  • An introduction to botnets
  • Details on IRC botnets
  • The proposed detection approach

 IRC user behavior model  Detection system reference architecture

  • Experimental evaluation
slide-6
SLIDE 6

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

6

Centralized botnet's lifecycle Centralized botnet's lifecycle

  • bot-herder configures initial bot parameters

and C&C details

  • register IP at DNS for rendezvous
  • bot-herder launches or seeds new bot(s) -

bots spreading, botnet growing

 Vulnerability discovery and exploitation  Malicious code download  DNS lookup for rendezvous  Join the C&C  Receive commands from the

Botmaster

  • losing bots (stasis), botnet not growing
  • abandon botnet and sever traces
  • unregister DDNS
slide-7
SLIDE 7

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

7

Botnet Statistics Botnet Statistics

  • 60% are IRC bots

 70% of all the bots connect to a single IRC server

  • 57,000 Active Bots per day for the first 6 months of

2006 ( Symantec )

  • 4.7 million distinct computers being actively used in

Botnets

  • Most Botnets are managed by a single server ( up to

15,000 bots )

  • Mocbot seized control of more than 7,700 machines

within 24 hours

slide-8
SLIDE 8

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

8

Why IRC? Why IRC?

  • Oldest and most popular IM

 Bots were commonly user by channel operator for management and monitoring purposes

  • Not owned by anyone – public
  • Defined in RFC 1459
  • Text based
  • Designed for both point-to-point and point-to-multipoint

communication

 one-to-one, or one-to-group chat

  • flexible, open-source protocol
  • Potentially able to manage a high number of clients
  • Grants anonymity for the botmaster
slide-9
SLIDE 9

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

9

Centralized C&C Centralized C&C

  • Easier to manage and use
  • Easier to disrupt
  • How do the bots know where the C&C is?

Hardcoded IP based rendezvous

  • easily uncovered
  • C&C needs replacement after disruption
  • All Bots need replacement

Domain names used for rendezvous

  • DNS RR can be updated to current C&C IP
  • Bots can dynamically point to the correct C&C IP
slide-10
SLIDE 10

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

10

Reference framework Reference framework

  • Port based application protocol detection
  • RFC based IRC decoder
  • Model = representative features

Each IRC channel is represented by a feature vector, representing its status

Feature vectors are updated at each event occurring in the corresponding IRC channel

slide-11
SLIDE 11

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

11

Intuitions about IRC based botnets Intuitions about IRC based botnets

  • Bursty channel activity

 After command is issued, bots may respond at once, then be quiet

  • Limited vocabulary
  • Sentence structure

 May resemble a shell command  The same recurring structure may be found in many sentences

  • Disproportion between user and control activity in a channel
  • “strange” words used for communication

 Disproportion of consonants and vowels in words used for chatting

  • Language dependent
  • Changes and structure of chat room topic
  • Unusual nicknames

 Completely random OR  Unexpextedly regular

slide-12
SLIDE 12

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

12

IRC channel features IRC channel features

  • Users Number:

 total number of users in the channel

  • Average words number:

 average number of unique words in a sentence

  • Average/Variance of Channel Dictionary

Cardinality:

 Mean and variance of the vocabulary’s cardinality

  • Unusual Nicknames*
  • Equal Answers:

 number of sentences with a common

  • rdered subset of words
  • Control Commands Number:

 count of channel control commands issued

  • Join Number:

 JOIN rate in the channel

  • SetMode Number:

 SetMode rate in the channel

  • Nickname Changes:

 count of nickname changes in a channel

  • Ping Number:

 PING rate in the channel

  • IRC Commands Number:

 overall IRC command rate

  • Active Users Number:

 number of users active in the channel

*J. Goebel and T. Holz. Rishi: identify bot contaminated hosts by irc nickname evaluation. In HotBots’07: Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, pages 8–8, Berkeley, CA, USA, 2007. USENIX Association.

slide-13
SLIDE 13

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

13

Experimental Setup Experimental Setup

  • Data collection

 Botnet related traffic from the Georgia Institute of Technology network  Normal IRC chats logged from the University of Napoli network

  • Three datasets

 50,000 samples (25,000 normal + 25,000 botnet-related)

  • Small, evenly split

 149,999 samples (75,010 normal + 74,989 botnet-related)

  • Large, evenly split

 165,000 samples (150,000 normal + 15,000 botnet-related)

  • Large, more realistic distribution of t-uples
  • Selected algorithms

 SVM (Support Vector Machine) – very “popular”  J48 (Decision Tree) – very “quick”

  • Performance evaluation

 10-fold cross validation

slide-14
SLIDE 14

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

14

Classification algorithms Classification algorithms

  • SVM – Kernel based method

 Search for hyperplanes effectively separating data points  Support vectors for providing better prediction performance  Non-linearly separable data can be trasformed by means of a kernel function in a space more suitable for linear separability  Separation hyperplane search is performed in transformed space

r ρ x x′

φ(.)

φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( ) φ( )φ( ) Feature space Input space

slide-15
SLIDE 15

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

15

Classification algorithms Classification algorithms

  • J48 – Decision tree

 Each attribute of the data can be used to make a decision which splits the data-set into smaller subsets  The normalized information gain is measured  The attribute generating the highest normalized information gain is chosen  The algorithm is recursively applied to the subsets

slide-16
SLIDE 16

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

16

Experimental results Experimental results

Algorithm SVM J48 Samples 50000 149999 165000 50000 149999 165000 False alarm Rate < 0.001 Missed detection rate

  • Most representative features

Limited vocabulary cardinality

Limited sentence variability

slide-17
SLIDE 17

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

17

Conclusions Conclusions

  • Promising model for botnet activity detection
  • Tested on “real” data

 Results hopefully valid in a general scenario

  • Model works with both a very reliable and a very quick

classifier

  • Effective classification performed on a per-tuple basis

 Botnet detection accuracy within strict performance boundaries

slide-18
SLIDE 18

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

18

Future work Future work

  • Feature selection for faster training and detection
  • Exploitation of “pure” anomaly detection techniques

 Malicious traffic no longer necessary for training

  • Further model refinement
  • Further testing

 More data available on eterogeneous IRC servers

  • Exploitation of state-of-the-art application layer protocol recognition

techniques  Blind protocol identification overcomes the limitations of port-based methods

  • Perform classification on a per-channel basis

 Try to reduce the number of errors

  • Start thinking about complementary solutions for encrypted IRC

sessions

slide-19
SLIDE 19

Claudio Mazzariello, Carlo Sansone – Effective features for detecting IRC botnets

19

Thank you for your attention Thank you for your attention

Any questions?