BLINC: Multilevel Traffic Classification in the Dark Thomas - PowerPoint PPT Presentation

BLINC: Multilevel Traffic Classification in the Dark Thomas Karagiannis, UC Riverside Konstantina Papagiannaki, Intel Research Cambridge Michalis Faloutsos, UC Riverside

The problem of workload characterization • The goal: Classify Internet traffic flows according to the applications that generate them “ in the dark ” – No port numbers – No payload streaming web P2P 2

The problem of workload characterization – Why in the dark? • Traffic profiling based on TCP/UDP ports – Misleading • Payload-based classification – Practically infeasible • Applications are “hiding” their traffic – P2P applications, skype, etc. • Recent research approaches – Statistical/machine-learning based classification ( Roughan et al. IMC’04, Moore et al. SIGMETRICS’05) – Sensitive to network dynamics such as congestion 3

Our contributions • We present BLINC (BLINd Classification), a fundamentally different “in the dark” approach – We shift the focus to the Internet host – We analyze host behavior at three levels • Social • Functional • Application • We identify “signature” communication patterns • Highly accurate classification 4

Outline • Developing a classification benchmark – Payload-based classification • BLINC design – Multilevel classification – Signature communication patterns • BLINC evaluation 5

Classification benchmark • Packet-traces with machine readable headers – Residential (2 traces) • 25 hours & 34 hours, 110 Mbps • web (35%), p2p (32%) – Genome campus • 44 hours, 25 Mbps, ftp (67%) • Classification based on payload signatures – Caveats : Nonpayload (1%-2%), Unknown (6%-16%) 6

BLINC overview • In the dark classification – No examination of port numbers – No examination of user payload • Characterize the host – Insensitive to congestion and path changes • Deployable with existing equipment – Operates on flow records 7

BLINC: Classification process • Characterize the host – Social : Popularity/Communities – Functional : Consumer/provider of services – Application : Transport layer interactions • Identify signature communication patterns • Match observed behavior to signatures 8

1. Social level • Characterization of the popularity of hosts • Two types of behavior: – Based on number of destination IPs – Communities: Groups of communicating hosts 9

1. Social level: Popularity • Reveals only basic application traffic properties Heavier tail of CCDF of destination IPs for P2P and malware 10

1. Social level: Communities • Communication cliques • Perfect cliques – Attacks • Partial cliques – Collaborative applications (p2p, games) • Partial cliques with same domain IPs – Server farms (e.g., web, dns, mail) 11

2. Functional level • We characterize based on tuple (IP, Port) • We identify three types of behavior – Client: Consumer of services – Server: Provider of services – Collaborative 12

2. Functional level: Client vs. Server src port: 1000 src port: 1001 src port: 1002 Observation: The host uses a different ephemeral src port for every flow Rule: Hosts that use a large number of source ports are clients 13

2. Functional level: Client vs. Server Rule: Hosts that use a small number of source ports are offering services on these ports Observation: Observation: The host uses a different ephemeral The host uses only two src port for every flow src ports for all flows src port: 80 src port: 80 src port: 443 14

2. Functional level: Characterizing the host flows vs. source ports per application Clients Collaborative applications: No distinction between servers and clients Servers Obscure behavior due to multiple mail protocols and passive ftp 15

3. Application level • Interactions between network hosts display diverse patterns across application types. • We capture patterns using “graphlets” – Target most typical behavior – Relationship between fields of the 4-tuple 16

3. Application level: Graphlets • Graphlets have four columns corresponding to the 4-tuple: src IP, dst IP, src port and dst port • Each node is a distinct entry for each column • Lines connect nodes when flows contain the specific field values 192.168.1.1 10.0.0.0 1026 135 sourceIP destinationIP destinationPort sourcePort 445 192.168.1.1 10.0.0.0 1026 135 17

3. Graphlet Generation (FTP) sourceIP destinationIP destinationPort sourcePort X Y 21 10001 X Y 21 10001 X Y X Y 21 21 10001 10001 X Y 20 10002 X Y X Y 20 20 10002 10002 X Z X Z 21 21 3000 3000 X Z 1026 3001 20 10002 5005 X Y 5000 21 10001 3000 Z X 3001 1026 U 18

3. Graphlet Library 19

Heuristics: Further improving performance • Using the transport layer protocol. 20

Heuristics: Further improving performance • Using the relative cardinality of sets. Cardinality of set of dst IPs versus set of dst ports varies with the application 21

Heuristics: Further improving performance • Using the relative cardinality of sets. WEB: #dst ports >> # dst IPs P2P: #dst ports <= # dst IPs 22

Heuristics: Further improving performance • Using the communities Probably WEB too!! Known: WEB 10.0.0.0 10.0.0.1 23

Heuristics: Further improving performance • Other heuristics: – Using the per-flow average packet size – Recursive (mail/dns servers talk to mail/dns servers, etc.) – Failed flows (malware, p2p) 24

Classification Results • We evaluate BLINC using two metrics: – Completeness • Percentage classified by BLINC – Accuracy • Percentage classified by BLINC correctly • We compare against payload classification – Exclude unknown and nonpayload flows 25

BLINC achieves highly accurate classification 80%-90% completeness ! >90% accuracy !! 26

Characterizing the unknown: Non-payload flows BLINC is not limited by non-payload flows or unknown signatures Flows classified as attacks reveal known exploits 27

BLINC issues and limitations • Extensibility – Creating and incorporating new graphlets • Application sub-types – e.g., BitTorrent vs. Kazaa • Transport-layer encryption – then what? • NATS – Should handle most cases • Access vs. Backbone networks? – Should handle but no data to test 28

Conclusions • A new way of thinking of the classification problem – Classify nodes instead of flows – Multi-level analysis: • social, functional, transport-layer characteristics • each level provides corroborative evidence or insight • BLINC works well in practice – classifies 80-90% of the traffic – with >90% accuracy • Going beyond payload-based classification – Nonpayload/unknown flows • Building block for security applications 29

BLINC: Multilevel Traffic Classification in the Dark Thomas - PowerPoint PPT Presentation

BLINC: Multilevel Traffic Classification in the Dark Thomas Karagiannis, UC Riverside Konstantina Papagiannaki, Intel Research Cambridge Michalis Faloutsos, UC Riverside The problem of workload characterization The goal: Classify Internet

Traffic Classification in the Fog Scott E. Coull February 23, 2006 Overview What is traffic

Beyond Dark Matter and Dark Energy Sean Carroll Beyond Dark Matter and Dark Energy Sean Carroll,

Figure 1a: Multilevel Visualization of Market Power Cases in Tree Format Figure 1b: Multilevel

Agenda 1. More on multilevel R formulas 2. Generalized Multilevel Models 3. GMLM in R 1 More

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

Larg arge e Scale ale Larg arge e Scale ale Dark Dark Matte atter r Dark Dark Matte

Doomsday Dark Matter Doomsday Dark Matter or Some stones are better left unturned Doomsday

Dark Halos Dark Halos Dark Halos of Dark Halos of of of M31 and the Milky Way M31 and the

Chapter 22 Dark Matter, Dark Energy, and the Fate of the Universe 22.1 Unseen Influences in the

Chapter 22 Dark Matter, Dark Energy, and 22.1 Unseen Influences in the Cosmos the Fate of the

Need for Classification Classification required To isolate traffic of interest

Multilevel TRILL draft-perlman-trill-rbridge-multilevel-00.txt Radia Perlman

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

Credit: ESO/G Credit: ESA/Hubble DARK MATTER Dark Matter 10 x Luminous Matter Dark Matter

DARK MATTER IN DSPH Paolo Salucci (& G. Gilmore) SISSA (Oxford) Outline of the Review Dark

ELEC / COMP 177 Fall 2016 Some slides from Kurose and Ross, Computer Networking , 5 th Edition

ts

System-on-Chip Design HW/SW Interfaces and Communica;ons Hao Zheng Comp Sci & Eng U of

Distributed Systems Secure Communication Paul Krzyzanowski pxk@cs.rutgers.edu Except as

Communicating Processors Past, Present and Future David May Bristol University and XMOS David

Dynamic Processors Demand Dynamic Operating Systems Sankaralingam Panneerselvam Michael M. Swift

Programming for Performance 1 Introduction Rich space of techniques and issues Trade off and

PARALLEL MEMORY ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing

BLINC: Multilevel Traffic Classification in the Dark Thomas - PowerPoint PPT Presentation

BLINC: Multilevel Traffic Classification in the Dark Thomas Karagiannis, UC Riverside Konstantina Papagiannaki, Intel Research Cambridge Michalis Faloutsos, UC Riverside The problem of workload characterization The goal: Classify Internet

Traffic Classification in the Fog Scott E. Coull February 23, 2006 Overview What is traffic

Beyond Dark Matter and Dark Energy Sean Carroll Beyond Dark Matter and Dark Energy Sean Carroll,

Figure 1a: Multilevel Visualization of Market Power Cases in Tree Format Figure 1b: Multilevel

Agenda 1. More on multilevel R formulas 2. Generalized Multilevel Models 3. GMLM in R 1 More

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

Larg arge e Scale ale Larg arge e Scale ale Dark Dark Matte atter r Dark Dark Matte

Doomsday Dark Matter Doomsday Dark Matter or Some stones are better left unturned Doomsday

Dark Halos Dark Halos Dark Halos of Dark Halos of of of M31 and the Milky Way M31 and the

Chapter 22 Dark Matter, Dark Energy, and the Fate of the Universe 22.1 Unseen Influences in the

Chapter 22 Dark Matter, Dark Energy, and 22.1 Unseen Influences in the Cosmos the Fate of the

Need for Classification Classification required To isolate traffic of interest

Multilevel TRILL draft-perlman-trill-rbridge-multilevel-00.txt Radia Perlman

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

Credit: ESO/G Credit: ESA/Hubble DARK MATTER Dark Matter 10 x Luminous Matter Dark Matter

DARK MATTER IN DSPH Paolo Salucci (&amp; G. Gilmore) SISSA (Oxford) Outline of the Review Dark

ELEC / COMP 177 Fall 2016 Some slides from Kurose and Ross, Computer Networking , 5 th Edition

ts

System-on-Chip Design HW/SW Interfaces and Communica;ons Hao Zheng Comp Sci &amp; Eng U of

Distributed Systems Secure Communication Paul Krzyzanowski pxk@cs.rutgers.edu Except as

Communicating Processors Past, Present and Future David May Bristol University and XMOS David

Dynamic Processors Demand Dynamic Operating Systems Sankaralingam Panneerselvam Michael M. Swift

Programming for Performance 1 Introduction Rich space of techniques and issues Trade off and

PARALLEL MEMORY ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing

DARK MATTER IN DSPH Paolo Salucci (& G. Gilmore) SISSA (Oxford) Outline of the Review Dark

System-on-Chip Design HW/SW Interfaces and Communica;ons Hao Zheng Comp Sci & Eng U of