Identifying Frequent Items in Sliding Windows over On-Line Packet - PowerPoint PPT Presentation

Identifying Frequent Items in Sliding Windows over On-Line Packet Streams Alejandro López-Ortiz School of Computer Science University of Waterloo Joint work with Lukasz Golab (Waterloo), David DeHaan (Waterloo), Erik Demaine (MIT), and J. Ian Munro (Waterloo)

Application � Real-time analysis of network traffic � find frequently appearing packet types � Packet type: port #, protocol type, source IP. � But, interested in recent usage trends � E.g. for routing system analysis or anomaly detection � So, want to find frequently appearing packets in a sliding window of N most recent packets IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 2

If we could store the entire window: � Maintain frequency counts of each category in the window � Update counters as new packets arrive and old packets are expired out of the window � Periodically scan counters and return the packet types corresponding to the k largest counters (and possibly the actual counts too) IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 3

What if we can’t store the entire window? � Idea from [Zhu, Shasha, VLDB ’02] : � Divide the sliding window into sub-windows, i.e. use a coarser time grain of T packets � Store summary for each sub-window � Every T packets: � Expire oldest sub - w indow � Add most recent sub - window � Update answer window × summary � Space req: T IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 4

Example: windowed SUM SUM = 5 + … + 3 = 97 5 8 4 9 11 6 8 5 3 20 8 7 3 8 4 9 11 6 8 5 3 20 8 7 3 7 SUM = SUM_OLD – 5 + 7 = 99 IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 5

Updating Top-k counters � T b = current count for packet of type b � Update: T b = T b - T b (old sub-window) + T b (new sub-window) � Problem is: T b ( old sub-window ) might not be part of summary in old sub-window IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 6

…but, let’s use the technique anyway � Sub-window summary: IDs and counts of the k most frequent categories = sum of the occurrence count of least � frequent item in summary of each sub- window � Compute overall occurrence count for each packet type from sub-window summaries � Packets exceeding count are reported as top- k IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 7

The algorithm Let a, b, c, … be distinct packet types, let k = 3 � a:17 a:14 d:16 c:22 e:15 b:24 b:21 e:13 c:18 c:12 d:20 f:15 f:17 g: 9 c:6 g:12 f:10 k:12 h:8 f:6 a:8 n:11 a:6 e:13 d:7 d:6 e:4 h:4 a:3 j:3 b:4 c:4 m:6 k:4 b:4 b:4 p:8 h:3 r:5 • = 4+4+3+…+8+3+5 = 56 • Total frequency counts from the top-k lists: a=48, b=57,c=62,d=49,e=45, f=48,g=21,h=12,j=3,k=16,m=6,n=11,p=4,r=5 • Return b and c as frequent items in this window IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 8

Hypothesis � If categories are +/- equally distributed, previous method may not work � But, in a Power Law distribution, we expect a few heavy flows which should register on many top- k lists � Experimented with a TCP trace � 1 month of traffic from Lawrence Berkeley Lab to the rest of the world; almost 800 000 packets in total � 1647 distinct source IP addresses, which we treated as distinct categories IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 9

Results: accuracy Percentage of identified over-threshold items 100 80 60 Percent 40 b=20 size of each 20 b=100 sub-window b=500 0 1 2 3 4 5 6 7 8 9 10 k Window size = 100 000 packets IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 10

Results: precision of the reported frequencies Relative error in the reported frequencies 14 b=20 size of each 12 sub-window Relative error (percent) b=100 10 b=500 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 k Window size = 100 000 packets IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 11

Conclusions � Extended sub-window model to a holistic aggregate � Good results due to the non-uniform distribution of Internet traffic � Low space requirements IMC ’03 Miami, Florida Alejandro Lopez-Ortiz 12

Identifying Frequent Items in Sliding Windows over On-Line Packet - PowerPoint PPT Presentation

Identifying Frequent Items in Sliding Windows over On-Line Packet Streams Alejandro Lpez-Ortiz School of Computer Science University of Waterloo Joint work with Lukasz Golab (Waterloo), David DeHaan (Waterloo), Erik Demaine (MIT), and J.

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Sliding right into disaster - Left-to-right sliding windows leak Daniel J. Bernstein, Joachim

Glacier Sliding Ian Hewitt, University of Oxford hewitt@maths.ox.ac.uk Sliding / friction laws -

Windows Not just for houses Windows 1-10 Windows Server Essentially a jacked up windows 8 box

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

Sliding system for concertina doors 271 Sliding system for concertina doors - Technical features

Lecture 3: Introduction to Sliding Mode Control Reference: S.C. Tan, Chapter 1. Sliding Mode

Windows 8 Heap Internals Windows 8 Heap Internals Windows 8 Heap Internals INTRODUCTION Windows 8

1. 2. 3. 1. 2. 3. Windows 10 IoT Core Universal Windows Platform (UWP) Microsoft Azure v7

Windows Not Just For Houses Everyone Uses Windows! Versions of Windows 10 There are multiple

Module 1 Overview of Windows 10 Module Overview Introduction to Windows 10 Implementing

Midterm Review Jan-Willem van de Meent Review: Frequent Itemsets Frequent Itemsets Items

Recommendation Systems Stony Brook University CSE545, Fall 2016 From Frequent to Recommended

Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window Yun Chi , Haixun Wang

Frequent Item Sets Chau Tran & Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Lab 4 preview Hung-Wei Tseng In Lab 4... You will be extending the datapath and control unit

Galatians: week 2 Galatians 2:15-21 And count the patience of our Lord as salvation, just as our

Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation:

Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science

FAST DISTRIBUTED RSA KEY GENERATION FOR FAST DISTRIBUTED RSA KEY GENERATION FOR semi-honest

MULTI PB USER DATA MIGRATION MARCH 2019 I MARTIN LISCHEWSKI (JSC) RESEARCH AND DEVELOPMENT on

Kensuke Kakiuchi (Nagoya Univ./ The Univ. of Tokyo) Collaborators: Takeru K. Suzuki (The Univ. of

Identifying Frequent Items in Sliding Windows over On-Line Packet - PowerPoint PPT Presentation

Identifying Frequent Items in Sliding Windows over On-Line Packet Streams Alejandro Lpez-Ortiz School of Computer Science University of Waterloo Joint work with Lukasz Golab (Waterloo), David DeHaan (Waterloo), Erik Demaine (MIT), and J.

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Sliding right into disaster - Left-to-right sliding windows leak Daniel J. Bernstein, Joachim

Glacier Sliding Ian Hewitt, University of Oxford hewitt@maths.ox.ac.uk Sliding / friction laws -

Windows Not just for houses Windows 1-10 Windows Server Essentially a jacked up windows 8 box

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Frequent Itemset Mining Stony Brook University CSE545, Fall 2016 Frequent Itemset Mining aka

Sliding system for concertina doors 271 Sliding system for concertina doors - Technical features

Lecture 3: Introduction to Sliding Mode Control Reference: S.C. Tan, Chapter 1. Sliding Mode

Windows 8 Heap Internals Windows 8 Heap Internals Windows 8 Heap Internals INTRODUCTION Windows 8

1. 2. 3. 1. 2. 3. Windows 10 IoT Core Universal Windows Platform (UWP) Microsoft Azure v7

Windows Not Just For Houses Everyone Uses Windows! Versions of Windows 10 There are multiple

Module 1 Overview of Windows 10 Module Overview Introduction to Windows 10 Implementing

Midterm Review Jan-Willem van de Meent Review: Frequent Itemsets Frequent Itemsets Items

Recommendation Systems Stony Brook University CSE545, Fall 2016 From Frequent to Recommended

Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window Yun Chi , Haixun Wang

Frequent Item Sets Chau Tran &amp; Chun-Che Wang Outline 1. Definitions Frequent Itemsets

Lab 4 preview Hung-Wei Tseng In Lab 4... You will be extending the datapath and control unit

Galatians: week 2 Galatians 2:15-21 And count the patience of our Lord as salvation, just as our

Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation:

Parallelism, Multicore, and Synchronization Hakim Weatherspoon CS 3410 Computer Science

FAST DISTRIBUTED RSA KEY GENERATION FOR FAST DISTRIBUTED RSA KEY GENERATION FOR semi-honest

MULTI PB USER DATA MIGRATION MARCH 2019 I MARTIN LISCHEWSKI (JSC) RESEARCH AND DEVELOPMENT on

Kensuke Kakiuchi (Nagoya Univ./ The Univ. of Tokyo) Collaborators: Takeru K. Suzuki (The Univ. of

Frequent Item Sets Chau Tran & Chun-Che Wang Outline 1. Definitions Frequent Itemsets