Monitoring Tool for Analysing the Use of the Internet Services in - - PDF document

monitoring tool for analysing the use of the internet
SMART_READER_LITE
LIVE PREVIEW

Monitoring Tool for Analysing the Use of the Internet Services in - - PDF document

Monitoring Tool for Analysing the Use of the Internet Services in the Spanish Academic Network Jordi Domingo-Pascual Universitat Politcnica de Catalunya jordi.domingo@ac.upc.es Partners: UC3M, UPC and UPM Project Objectives


slide-1
SLIDE 1

1

“Monitoring Tool for Analysing the Use of the Internet Services in the Spanish Academic Network”

Jordi Domingo-Pascual Universitat Politècnica de Catalunya

jordi.domingo@ac.upc.es

Partners: UC3M, UPC and UPM

Project Objectives

❐ Traffic Capture Subsystem

  • High Speed
  • AAL5 Reassembly
  • Modular and scalable
  • Low cost

❐ Support for many Traffic Analysis tools:

  • Identification and aggregation of bi-directional flows
  • Traffic classification by usage
  • Traffic classification by origin / destination
  • Internet header verification
  • Detailed analysis (including contents for AUP audits)
slide-2
SLIDE 2

2 MEHARI System

Analysis Subsystem

Traffic Samples Analysis Platform(s)

Functional Architecture

Capture Subsystem

ATM 1 ATM 0 ATM 1 ATM 0

Capture Platform(s)

ATM 1 ATM 0

ATM Backbone ATM Cells Capture point

PPS PPS

Application Modules IP Biflows + symptoms

Statistics and Reports Auto- regulation

Preprocessing Module

Data base

  • patterns
  • addresses
  • ...

Operator

Capture Subsystem

❐ Modular and scalable

  • N units over the same or different trunk links
  • Requires high speed connection to the analysis

subsystem ❐ Senses ALL VPI/VCI in the fiber

  • Captures in promiscuous or filtered mode over

VPI/VCI list ❐ Capture capacity for each unit

  • Sustained Average of 8 Mbit/s for a 6,000 Euros unit
  • 3,000% better price/performance than commercial

protocol analyzers (2 Mbit/s on HP BSTS)

  • Capture rate controled by analysis rate
slide-3
SLIDE 3

3

Capture Subsystem: Hardware

❐ 1 PC Pentium 200 MHz ❐ 128M RAM ❐ 4G Hard Disc ❐ 1 Ethernet NIC (10/100) ❐ 2 ATM NICs PCI SC-155, Fore PCA-200EPC 256KB ❐ 2 passive f.o. splitters SC ❐ Cost: ~ 5000 $

Capture Subsystem: Software

❐ S.O. Unix FreeBSD 2.2.5 ❐ Software OC3MON/BSD (modified by DIT-UPM)

  • fatm-driver
  • capture

❐ dumpcap application developped by DIT-UPM

  • VPI/VCI demultiplexing and filtering
  • reassembly of AAL5 frames
  • dump to capture files

❐ NFS client for downloading capture files to the analysis subsystem

slide-4
SLIDE 4

4

Capture Files

frame seq_num timestamp UNIX (seg.µseg) VPI/VCI length (bytes) Truncated AAL5 info field

0:893083746.654070:100/1:1064 :45000428E81B40002F062E36C600B... 1:893083746.654090:100/1:44:4500002C00AC400037069CF5CC4B3C... 2:893083746.654101:100/1:40:45000028455840003606052FCF4F2C1... 3:893083746.654280:103/224:1500:450005DC6C4B4000FD06142640... 4:893083746.654288:103/224:40:45000028240440007B06401E829FD... 5:893083746.654517:103/224:400:45000190B30340001D06B516238A... ...... 1668:893083746.813551:100/1:281:4500011976710000FB04BFFCE40... # init_time=893083746.652986 final_time=893083746.813582 cap_time=0.160596

Files with programmable granularity

MEHARI System

Analysis Subsystem

Traffic Samples Analysis Platform(s)

Functional Architecture

Capture Subsystem

ATM 1 ATM 0 ATM 1 ATM 0

Capture Platform(s)

ATM 1 ATM 0

ATM Backbone ATM Cells Capture point

PPS PPS

Application Modules IP Biflows + symptoms

Statistics and Reports Auto- regulation

Preprocessing Module

Data base

  • patterns
  • addresses
  • ...

Operator

slide-5
SLIDE 5

5

Analysis Subsystem: Hardware

❐ PC Pentium 200 MHz ❐ 128M RAM ❐ 4G Hard Disc ❐ 2 Ethernet NICs (10/100) ❐ Cost: ~ 2500 $

Analysis Subsystem: Software

❐ Modular and Scalable Architecture

  • N analysis platforms connected to one or more capture platforms
  • PC Linux RedHat 5.0

❐ Pre-processing module:

  • Process of the samples “on-the-fly”
  • Extract “relevant” information
  • Erase capture files

❐ Analysis tools or Application modules:

  • Classify traffic by use
  • Classify traffic by destination
  • Packet header verification
  • Server location tool
  • Usage based pricing tool
  • ...
slide-6
SLIDE 6

6

Analysis Subsystem Architecture

Generic Analysis module (filter) Common format Analysis module Specific Analysis module Common Configuration language Preprocessing Module Generic Analysis Module (filter) Analysis Module Specific Analysis Module

Modularity and Scalability

P 1.1 P 1.2 P 1.3 P 1.1.2 P 1.1.1 P 1.3.1 P 1.1.3

❐ Process tree structure for information flow ❐ Interprocess Comunication using shared files ❐ May be distributed among several machines using NFS

slide-7
SLIDE 7

7

Key Issues of the Architecture

❐ Modularity ❐ Common data types and file format definition ❐ Common configuration interface ❐ This allows:

  • the insertion of new processes in the whole analysis

chain

  • to obtain partial results in the analysis chain
  • to control processes and to browse results from a

single GUI

Pre-processing Module

❐ Main functions

  • packet agreggation to flows
  • packet analysis
  • count of symptoms associated to each flow

❐ Produces flow list with associated information:

  • flow desc with packet and byte count
  • weighted list of symptoms

❐ Highly configurable:

  • symptom definition and inter-relation
  • aggregation period
slide-8
SLIDE 8

8

Pre-processing Module

❐ anaflow: análisis de paquetes IP y extracción de síntomas ❐ accum: agregación de nº bytes, nº paquetes y síntomas en flujos unidireccionales, según período ❐ biflow: correlación de tráfico E/S para obtención de flujos bidireccionales, conservando síntomas

Flujos Unidireccionales Flujos Unidireccionales Acumulados Flujos Bidireccionales Paquetes IP AFL*.0 Acum AFL*.1 Acum AAF*.1 AAF*.0 Biflow BFL* CAP*.0 CAP*.1 Anaflow Anaflow Base de datos de patrones

Application Modules

❐ Traffic analysis by volume ❐ Traffic analysis by origin/destination ❐ Header analysis and verification ❐ Server location tool ❐ Traffic classification (explicit routing) ❐ Usage based pricing tool

slide-9
SLIDE 9

9

Trial on Spanish Academic Network: RedIRIS

GIGACOM Telefónica ATM Network RedIRIS Regional Nodes Splitters ATM Access Switch Analysis PC (LINUX)

100 BaseT

Ethernet

NFS

Internet (RedIris ) Remote Access Traffic Capture PC (FreeBSD)

1

STM-1 ATM Optical Interfaces RedIRIS Core Router

RedIRIS: the Spanish NRN

Spanish Academic Network Topology

slide-10
SLIDE 10

10

Some applications of these tools

❐ Traffic monitoring

  • Billing and charging models for NRN and Corporate

Networks

  • Network configuration
  • Resources dimensioning
  • Placing Proxies, ...

❐ Service usage control

  • Control that the services are used responsibly, i. e.

auditing the academic networks AUP (Acceptable Use Policy)

  • Security

Summary

❐ Modular, scalable and extensible architecture ❐ Capture systems with excellent price/performance ❐ Flow information aggregation with symptoms and bi- directional flow correlation ❐ Intermediate data base of patterns and addresses ❐ Application modules currently implemented:

  • Classification by usage (AUP)
  • Classification by origin/destination
  • Packet Header analysis
slide-11
SLIDE 11

11

Traffic Analysis by volume

❐ Bandwidth utilization ❐ Packet size distribution ❐ Protocol distribution ❐ Cumulative graphs

Backbone Utilization

0% 5% 10% 15% 20% 25% 30% 00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00 Utilización (%) Entrada Salida

slide-12
SLIDE 12

12

Utilization per Access Node IP over ATM overhead (in)

Sobrecarga media en Entrada: 15,12%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% < 41 89-136 185-232 281-328 377-424 473-520 569-616 665-712 761-808 857-904 953-1000 1049-1096 1145-1192 1241-1288 1337-1384 1433-1480 > 1528

Longitud de paquetes IP (bytes) % Tráfico total % Tráfico de sobrecarga % Tráfico útil

slide-13
SLIDE 13

13

IP over ATM overhead (out)

Sobrecarga media en Salida: 20,83%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% < 41 89-136 185-232 281-328 377-424 473-520 569-616 665-712 761-808 857-904 953-1000 1049-1096 1145-1192 1241-1288 1337-1384 1433-1480 > 1528

Longitud de paquetes IP (bytes) % Tráfico total % Tráfico de sobrecarga % Tráfico útil

Packet Length Distribution

0% 10% 20% 30% 40% 50% 60% < 41 89-136 185-232 281-328 377-424 473-520 569-616 665-712 761-808 857-904 953-1000 1049-1096 1145-1192 1241-1288 1337-1384 1433-1480 > 1528 Longitud (bytes) % Paquetes Entrada Salida

slide-14
SLIDE 14

14

Protocol Distribution

(mean packet length)

Protocol Distribution

(percentage of packets)

slide-15
SLIDE 15

15

Traffic Distribution

Entrada

50 100 150 200 250 300 350 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:00 Mbytes unknown

  • thers

multicastIP EIGRP games irc snmp dns telnet smtp nntp ftp http

Traffic origin/destination analysis module

IP Biflows Identification

  • f AS

Subnetwork, CIDR, ASs, ... Databases Traffic Origin/Destination Analysis Module (TODM) Pre-processing Module (TCM) Processor Summary Report Files Official IRR Data Bases NRN BGP

  • ther...
slide-16
SLIDE 16

16

Traffic Classification by AS

❐ Maps of AS by destination ❐ Traffic statistics by outside links of RedIRIS (Ibernet, TEN-155 y USA) classified by CCAA (regions) ❐ Statistics of the most visited subnetworks within the relevant AS ❐ This output may be used by other modules

Sample of Results: Main traffic origin/destination (I)

RedIRIS TEN-34/155 Ibernet Rest of Internet (through USA)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

% Bytes (Input traffic) User Groups (17)

slide-17
SLIDE 17

17

Sample of Results: Main traffic origin/destination (II)

Total Input traffic to RedIRIS 26% 21% 12% 41% 36% 16% 21% 27%

RedIRIS TEN-34/155 Ibernet Rest of Internet (through USA)

Total Output traffic from RedIRIS

Sample of Results: % of academic traffic in the link with USA (according with the IRR description)

0 % 10 % 20 % 30 % 40 % 50 % 60 %

Input traffic

% of captured traffic

User Groups (17)

slide-18
SLIDE 18

18

Sample of Results: Top 25 most visited commercial sites in

  • ne of the user groups

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

TSAI REDESTB OLE ES-CTV-980527 ES-TTD-951020 CAIXA-RED CANAL-PLUS-SPAIES ABF ICTNET IBERNETCOM ES-FCR-950607 JETNET GRN CONEXIS IP-MULTIMEDIA SERVICOM2-NETS ABCTELEMATIC SPRITEL SERVICOM1-NETS DAUCOM2MEG-ES RAN FUT INFASE RS INTERCOM Other Sub-Networks: 958

% Bytes (Input traffic to one of the user groups)

Sample of Results (January-February´99): Top 25 most visited TEN-155 ASs in

  • ne of the user groups

0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20%

AS1275 DFN-IP service and DFN customer networks AS786 The JANET IP Service AS1653 SUNET Swedish University Network AS1103 SURFnet AS2856 BTnet UK Regional network AS224 UNINETT, The Norwegian University & Research Network AS1717 RENATER AS3301 TeliaNet Sweden AS2852 CESNET z.s.p.o. - TEN34-CZ AS513 CERN AS1853 ACOnet Backbone AS1239 AS559 SWITCH, Swiss Academic and Research Network AS8761 RETENET Autonomous System AS1741 FUNET autonomous system AS8743 HighwayOne Autonomus System AS1835 DENet - Danish Network for Research and Education AS3269 TELECOM ITALIA AS6805 mediaWays Autonomous System AS3215 RAIN AS5470 AUTH-NET-AS AS5556 Telenordia AB AS8209 A2000 / Kabeltelevisie Amsterdam bv AS2529 Demon Internet Ltd AS1290 PSINet UK Ltd. Other Ass: 433

% Bytes (Input traffic to one of the user groups)

slide-19
SLIDE 19

19

Internet Header Analysis Module

Internet Header analysis (session oriented) Capture Files Pre- processing Unknown Traffic Processor Data base with header patterns Summary Report Files Internet Header Analysis Module (IHM)

  • % Verified traffic
  • % Pending traffic

Summary Report Files (unknown traffic) Summary Report Files

  • Remote and local servers

Protocol Verification

❐ Protocol verification for each packet

  • While no protocol keyword is found...
  • Look for application protocol keywords and packets/bytes

accounting

  • Once a protocol keyword is found...
  • Only accounting of packets/bytes
slide-20
SLIDE 20

20

Protocol Verification

❐ Example for http Dictionary

  • LIMIT 25 # search boundary
  • HTTP # application name
  • 80 800 8080 RANGE 8000 8999 # port list
  • A # ASCII protocol
  • NCS # not case sensitive
  • HTTP/ # protocol keywords
  • OPTIONS
  • GET
  • HEAD
  • POST
  • PUT
  • DELETE

Sample of Results : Internet Header Verification

0.1 % 84.9 % 13.5 % 1.5 % Pending Verified Unknown Rejected (non TCP/UDP)

slide-21
SLIDE 21

21

Server Location Module

❐ To identify main data sources ❐ To identify non-declared servers ❐ To identify misconfigured servers ❐ To identify server for new application protocols

Server Location Approach

❐ Flow aggregation per server (well-known port)

  • Application protocol verification

❐ Classification of unknown traffic

  • Heuristics to detect possible servers
  • Requires further auditing
  • Are they really servers ?
  • Which type of server ?

❐ Server Location

  • Look for a Well known port from a list
  • Application protocol
  • Server side (local/remote) and IP address
slide-22
SLIDE 22

22

Unknown Traffic Heuristics

❐ Unknown traffic (non well-known port) ❐ Classical flow accounting (source/destination) ❐ Criteria for possible server location

  • Most active IP addresses
  • Most active Ports
  • IP addresses with more clients

Flow Identification

slide-23
SLIDE 23

23

Flow Aggregation Server Location

slide-24
SLIDE 24

24

Top 50 Verified Servers List of Possible Servers

slide-25
SLIDE 25

25

Traffic Classification Module

❐ Current categories:

  • LEISURE, COMMERCIAL, ACADEMIC, UNKNOWN

❐ Current heuristics (human auditing):

  • 1º ‘known’ addresses
  • e.g.: banks (COM), playboy (LEI), sports newspapers (LEI)
  • 2º dominant symptoms
  • e.g.: HTTP=2, PASSWD=3, VISA=1 (COM)
  • e.g.: MAIL=1, CHAT=4, SEX=3 (LEI)
  • 3º non standard ports
  • e.g.: ftp over ports other than 20/21 (UNK)
  • 4º ‘known’ ports
  • e.g.: 6969 (LEI)

Academic by default

Sample of Results: Traffic classification by usage (I)

% Bytes (Input traffic)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Academic Leisure Commercial Unknown User Groups (17)

slide-26
SLIDE 26

26

Sample of Results: Traffic classification by usage (II)

Total Input traffic to RedIRIS (% Bytes) Academic 78% Leisure 17% Unknown 2% Commercial 3% Leisure 12% Commercial 2% Total Output traffic to RedIRIS (% Bytes) Unknown 2% Academic 84%

Usage Based Pricing Tool

❐ Input information (provided by other modules)

  • Traffic Volumes
  • Traffic Classification by Usage
  • Traffic Classification by origin/destination
  • Autonomous Systems
  • Sub-networks
slide-27
SLIDE 27

27

Other Tools

❐ Filter

  • Simple spread-sheet
  • filters files based on simple rules
  • sort, merge, cut (number or %)
  • script language (batch or interactive)

❐ Graph

  • Traffic oriented graphic tool (gd based)
  • Basic graph formats (sectors, rows, lines)
  • Multiple row and accumulated results
  • Logarithmic and lineal axis
  • Cut data (% or number)

Other Tools

❐ Dynamic Graphic User Interface

  • tcl/tk based
  • Language for process configuration
  • Generates a GUI for each process

❐ Data publishing

  • Secure http server
  • Hour/day/week/month stored data
  • Browse historic data
  • Filter historic data
  • Dynamic graphic generation
slide-28
SLIDE 28

28

Summary

  • The statistical capture gives results near to reality
  • The modularity of the system allows to integrate
  • ther capture and analysis applications
  • The flexibility of the filter allows to generate new

analysis modules and cross results with other modules

  • New attributes different than those in IP headers are

allowed

  • This is a tool for the network administrator

Future work

❐ Further improvements in capture capacity ❐ Applications to detect security attacks ❐ Graphical user interface ❐ Automatic reaction to incidents:

  • Alarms (mail, pager, SNMP, ...)
  • Flow blocking or re-routing
  • Flow logging for off-line human analysis

❐ Other type of statistics:

  • Traffic statistics, as those provided by the NetFow
  • Top 100 lists of hosts/servers
  • Main origins/destinations of traffic
  • Most popular sites (web, ftp, chat servers, ...)