Statistical Clustering of Internet Statistical Clustering of - PDF document

Statistical Clustering of Internet Communication Patterns Talk at Interface 2003 Félix Hernández-Campos March 13, 2003 The University of North Carolina at Chapel Hill The University of North Carolina at Chapel Hill Talk at Interface 2003 March 13, 2003 Statistical Clustering of Internet Statistical Clustering of Internet Communication Patterns Communication Patterns Félix Hernández- -Campos Campos Félix Hernández (UNC- -Chapel Hill Computer Science) Chapel Hill Computer Science) (UNC Joint work work with: Andrew Nobel Don Smith Kevin Jeffay (UNC-CH Statistics) ( UNC-CH Computer Science) 1 1 Motivation Motivation Modeling Internet Traffic Modeling Internet Traffic INTERNET 2 2 1

Statistical Clustering of Internet Communication Patterns Talk at Interface 2003 Félix Hernández-Campos March 13, 2003 Motivation Motivation Modeling Internet Traffic Modeling Internet Traffic Web Browser Web Browser Email Server Email Server Internet Traffic Internet Traffic Email Client Email Client Web Server Web Server 3 3 Motivation Motivation Experimental Networking Research Experimental Networking Research • Evaluating network technologies requires • Evaluating network technologies requires realistic realistic experiments in a controlled laboratory environment experiments in a controlled laboratory environment • A key component of these experiments is the A key component of these experiments is the traffic traffic • workload workload – – Traffic is created by distributed applications running at the Traffic is created by distributed applications running at the end hosts end hosts • A natural approach for traffic generation is to A natural approach for traffic generation is to • simulate these applications using models of their simulate these applications using models of their behavior behavior – This is known as This is known as source source- -level modeling level modeling – 4 4 2

Statistical Clustering of Internet Communication Patterns Talk at Interface 2003 Félix Hernández-Campos March 13, 2003 Internet Traffic Mixes Internet Traffic Mixes Internet2 Applications (Nov 4 2002) Internet2 Applications (Nov 4 2002) NNTP Newsgroups Individual Packets Web HTTP Applications FTP File Transfer File Sharing File Sharing Audio/Video Audio/Video Groups of Misc Misc Applications Encrypted Encryption Games Games Unidentified Unidentified • Dozens of different applications are commonly used • Dozens of different applications are commonly used • There is a large percentage of unidentified traffic There is a large percentage of unidentified traffic • 5 5 Difficulties in Source- -Level Modeling Level Modeling Difficulties in Source • Real • Real Internet traffic is the result of aggregating many Internet traffic is the result of aggregating many individual applications into a traffic mix individual applications into a traffic mix • Requires protocol specifications Requires protocol specifications • – Closed applications have to be reverse engineered Closed applications have to be reverse engineered – • Applications change quickly • Applications change quickly • Privacy considerations complicate data acquisition • Privacy considerations complicate data acquisition � It is simply infeasible to develop models for each It is simply infeasible to develop models for each � application and maintain them up to date application and maintain them up to date 6 6 3

Statistical Clustering of Internet Communication Patterns Talk at Interface 2003 Félix Hernández-Campos March 13, 2003 Goals Goals • Develop source • Develop source- -level models of traffic mixes level models of traffic mixes – Easy to populate and update – Easy to populate and update – – Derived from very large data sets Derived from very large data sets • Construct flexible traffic generators Construct flexible traffic generators • – Reproduce a wide range of traffic mixes Reproduce a wide range of traffic mixes – 7 7 Our Approach Our Approach • Develop source • Develop source- -level models of traffic mixes level models of traffic mixes – – Easy to populate and update Easy to populate and update – Derived from very large data sets Derived from very large data sets – � Model communication patterns in an abstract manner � Model communication patterns in an abstract manner – – Application Application- -independent source independent source- -level modeling level modeling • Construct flexible traffic generators Construct flexible traffic generators • – Reproduce a wide range of traffic mixes Reproduce a wide range of traffic mixes – � Find the fundamental patterns of communication � Find the fundamental patterns of communication – Cluster – Cluster- -based traffic generation based traffic generation 8 8 4

Statistical Clustering of Internet Communication Patterns Talk at Interface 2003 Félix Hernández-Campos March 13, 2003 Data Acquisition Data Acquisition Inference from TCP Packet Headers Inference from TCP Packet Headers Caller Callee Caller Callee Caller Callee S S Y Y N N -ACK ACK SYN SYN- A A C C K K D D A A T T A A s s e e q q n n o o 3 3 0 5 0 5 a a c k c k n n o o 1 1 305 ackno 305 ackno seqno 1 1 seqno ACK ACK ackno 305 305 1461 ackno seqno 1461 seqno DATA DATA ackno 305 305 2876 ackno seqno 2876 seqno DATA DATA TIME TIME A A C C K K s e s e q q n n o 3 o 3 0 0 5 5 a a c c k k n o n o 2 2 8 8 7 6 7 6 FIN FIN F F I I N N - - A A C C K K F F I I N N -ACK ACK FIN FIN- 9 9 Communication Patterns Communication Patterns Inference from TCP Packet Headers Inference from TCP Packet Headers Caller Callee Caller Caller Callee Callee S S Y N Y N -ACK ACK SYN- SYN A C A C K K D D A A T T A A s s e e q q n o n o 3 3 0 0 5 5 a a c c k k n n o o 1 1 305 bytes 305 bytes 305 bytes ackno 305 305 ackno seqno 1 1 seqno ACK ACK ackno 305 305 1461 ackno seqno 1461 seqno DATA DATA 2876 bytes bytes 2876 2876 bytes ackno 305 305 2876 ackno seqno 2876 seqno DATA DATA TIME TIME A A C C K K s s e e q q n n o o 3 3 0 0 5 5 a a c c k k n n o o 2 2 8 8 7 7 6 6 FIN FIN F I F I N N - - A A C C K K F F I I N N -ACK ACK FIN- FIN 10 10 5

Statistical Clustering of Internet Communication Patterns Talk at Interface 2003 Félix Hernández-Campos March 13, 2003 Communication Patterns Communication Patterns HTTP Connection (Web Traffic) HTTP Connection (Web Traffic) Web Broser Broser Web Server Web Web Server Web Broser Web Server S S Y Y N N HTTP HTTP HTTP -ACK ACK SYN SYN- A Request A C C K K Request Request 305 bytes 305 bytes 305 bytes 305 HTTP HTTP ackno 305 ackno HTTP seqno 1 1 seqno ACK ACK Response Response Response 2,876 bytes bytes 2,876 2,876 bytes TIME TIME A A C C K K s e s e q q n n o 3 o 3 0 0 5 5 a a c c k k n o n o 2 2 8 8 7 6 7 6 FIN FIN F F I I N N - - A A C C K K F F I I N N -ACK ACK FIN- FIN 11 11 Communication Patterns Communication Patterns HTTP Connection (Web Traffic) HTTP Connection (Web Traffic) • Communication pattern was Communication pattern was ( ( a a 1 , b b 1 ) • 1 , 1 ) – E.g. E.g. , (305 bytes, 2,876 bytes) , (305 bytes, 2,876 bytes) – HTTP HTTP HTTP Request Request Request 305 305 bytes bytes 305 bytes Web Client Web Client Web Client Web Server Web Server Web Server TIME TIME HTTP HTTP HTTP Response Response Response 2,876 bytes 2,876 bytes 2,876 bytes 12 12 6

Statistical Clustering of Internet Communication Patterns Talk at Interface 2003 Félix Hernández-Campos March 13, 2003 Abstract Communication Model Abstract Communication Model The a a - - b b - - t t Model Model The • General model (a • General model (a- -b b- -t vector): t vector): , ⊥ ⊥ )) (( a (( a 1 1 , , b b 1 1 , , t t 1 1 ), ), ( ( a a 2 2 , , b b 2 2 , , t t 2 2 ), …, ( ), …, ( a a e e , , b b e e , )) where e e is the number of epochs is the number of epochs where Epoch 1 Epoch 1 Epoch 2 Epoch 2 Epoch 3 Epoch 3 Epoch 1 Epoch 2 Epoch 3 a a 1 1 bytes bytes a 2 a 2 bytes bytes a 3 a 3 bytes bytes a 1 bytes a 2 bytes a 3 bytes Caller Caller Caller Callee Callee Callee b b 1 1 bytes bytes b 2 b 2 bytes bytes b 3 b 3 bytes bytes b 1 bytes b 2 bytes b 3 bytes t 1 seconds t 2 t 2 seconds seconds t 1 seconds t 2 seconds t 1 seconds 13 13 The a a - - b b - - t t Model Model The Typical Communication Patterns Typical Communication Patterns • SMTP (send email) • SMTP (send email) • Telnet (remote terminal) Telnet (remote terminal) • • FTP FTP- -DATA (file download) DATA (file download) • 14 14 7

Statistical Clustering of Internet Statistical Clustering of - PDF document

Statistical Clustering of Internet Communication Patterns Talk at Interface 2003 Flix Hernndez-Campos March 13, 2003 The University of North Carolina at Chapel Hill The University of North Carolina at Chapel Hill Talk at Interface 2003

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

INTERNET FOR A MOBILE INTERNET FOR A MOBILE GENERATION GENERATION www.itu.int/mobileinternet

F1 11/18/2005 10:00 AM L ET ' S M AKE B UGS M ISERABLE Anibal Sousa Microsoft Corporation

Su Support pport Se Sear archers chers in n Se Sear arching ching Susan Dumais

Bitesize Uni Tuesday 19 Friday 22 July 2016 Bitesize Uni Background information What

Measuring What Matters -- CTCs New Approach to Measuring PR, Media Relations and Social Media

Introduction to Linux SkillSwap presentation by Tristan Roddis (tristan@roddis.org) 2/2/2004

2011 Investor Overview JEFF LUNSFORD CHAIRMAN AND CEO DOUG LINDROTH CHIEF FINANCIAL OFFICER Safe

Development and Implementation of Cure4kids: Success Factors and Lessons Learned Raul C. Ribeiro,

Growing participation, growing participants Supporting the users of software projects through

Sambuz

Useful Links

Newsletter

Mail Us