Algorithms and Data Structures to Accelerate Network Analysis - - PowerPoint PPT Presentation

algorithms and data structures to accelerate network
SMART_READER_LITE
LIVE PREVIEW

Algorithms and Data Structures to Accelerate Network Analysis - - PowerPoint PPT Presentation

Algorithms and Data Structures to Accelerate Network Analysis Reservoir Labs Jordi Ros-Giralt, Alan Commike, Peter Cullen, Richard Lethin {giralt, commike, cullen, lethin}@reservoir.com 4th International Workshop on Innovating the Network for


slide-1
SLIDE 1

4th International Workshop on Innovating the Network for Data Intensive Science

1

Algorithms and Data Structures to Accelerate Network Analysis

Reservoir Labs

Jordi Ros-Giralt, Alan Commike, Peter Cullen, Richard Lethin {giralt, commike, cullen, lethin}@reservoir.com 4th International Workshop on Innovating the Network for Data Intensive Science November 12, 2017 632 Broadway Suite 803 New York, NY 10012

slide-2
SLIDE 2

4th International Workshop on Innovating the Network for Data Intensive Science

  • Problem definition
  • Optimizations
  • Long queue emulation
  • Lockless bimodal queues
  • Tail early dropping
  • LFN tables
  • Multiresolution priority queues
  • Benchmarks

Roadmap

2

slide-3
SLIDE 3

4th International Workshop on Innovating the Network for Data Intensive Science

  • System wide optimization of network components like routers,

firewalls, or network analyzers is complex.

  • Hundreds of different SW algorithms and data structures interrelated in

subtle ways.

  • Two inter-related problems:
  • Shifting micro-bottlenecks
  • Nonlinear performance collapse

Problem Definition

3

slide-4
SLIDE 4

4th International Workshop on Innovating the Network for Data Intensive Science

Problem Definition

4

Shifting Micro-Bottlenecks

It’s difficult...

slide-5
SLIDE 5

4th International Workshop on Innovating the Network for Data Intensive Science

Problem Definition

5

Shifting Micro-Bottlenecks

...to optimize...

slide-6
SLIDE 6

4th International Workshop on Innovating the Network for Data Intensive Science

Problem Definition

6

Shifting Micro-Bottlenecks

...bottlenecks...

slide-7
SLIDE 7

4th International Workshop on Innovating the Network for Data Intensive Science

Problem Definition

7

Shifting Micro-Bottlenecks

...that keep moving...

slide-8
SLIDE 8

4th International Workshop on Innovating the Network for Data Intensive Science

Problem Definition

8

Shifting Micro-Bottlenecks

...every microsecond...

slide-9
SLIDE 9

4th International Workshop on Innovating the Network for Data Intensive Science

Problem Definition

9

Shifting Micro-Bottlenecks

...or so.

slide-10
SLIDE 10

4th International Workshop on Innovating the Network for Data Intensive Science

Non-linear Performance Collapse

10

Net I/O

PCIE CPU Disk I/O Cache Memory

40Gbps 64 Gbps 1092 Gbps 56 GHz 10.4 Gbps L1-I cache: 896 kB L1-D cache: 896 kB L2 cache: 7168 kB L3 cache: 71680 kB

slide-11
SLIDE 11

4th International Workshop on Innovating the Network for Data Intensive Science

11

Net I/O

PCIE CPU Disk I/O Cache Memory

40Gbps 64 Gbps 1092 Gbps 56 GHz 10.4 Gbps L1-I cache: 896 kB L1-D cache: 896 kB L2 cache: 7168 kB L3 cache: 71680 kB

State 1: network is the bottleneck Healthy cache regime:

  • CPU operates out of cache
  • High cache hit ratios

Non-linear Performance Collapse

slide-12
SLIDE 12

4th International Workshop on Innovating the Network for Data Intensive Science

12

Net I/O

PCIE CPU Disk I/O Cache Memory

40Gbps 64 Gbps 1092 Gbps 56 GHz 10.4 Gbps L1-I cache: 896 kB L1-D cache: 896 kB L2 cache: 7168 kB L3 cache: 71680 kB

State 2: network is no longer the bottleneck Highly inefficient memory regime:

  • CPU operates out of RAM
  • High cache miss ratios

10x penalty

Non-linear Performance Collapse

slide-13
SLIDE 13

4th International Workshop on Innovating the Network for Data Intensive Science

13

Net I/O

PCIE CPU Disk I/O Cache Memory

40Gbps 64 Gbps 1092 Gbps 56 GHz 10.4 Gbps L1-I cache: 896 kB L1-D cache: 896 kB L2 cache: 7168 kB L3 cache: 71680 kB

State 2: network is no longer the bottleneck 10x penalty By removing the network bottleneck, system spends more time processing packets that will need to be dropped anyway → net performance degradation (performance collapse)

Non-linear Performance Collapse

Highly inefficient memory regime:

  • CPU operates out of RAM
  • High cache miss ratios

input

  • utput
slide-14
SLIDE 14

4th International Workshop on Innovating the Network for Data Intensive Science

  • The process of performance optimization needs to be a

meticulous one involving small but safe steps to avoid the pitfall

  • f pursuing short term gains that can lead to new and bigger

bottlenecks down the path. Performance Optimization: Approach

14

slide-15
SLIDE 15

4th International Workshop on Innovating the Network for Data Intensive Science

Performance Optimization: Algorithms and Data Structures

15

Long queue emulation Reduces packet drops due to fixed-size hardware rings Lockless bimodal queues Improves packet capturing performance Tail early dropping Increases information entropy and extracted metadata LFN tables Reduces state sharing overhead Multiresolution priority queues Reduces cost of processing timers

slide-16
SLIDE 16

4th International Workshop on Innovating the Network for Data Intensive Science

16

Dispatcher Model: Long queue emulation Model:

  • Packet read cache penalty.
  • Descriptor read cache penalty
  • Packet drop penalty under certain

conditions

Long Queue Emulation

slide-17
SLIDE 17

4th International Workshop on Innovating the Network for Data Intensive Science

17

Long Queue Emulation

slide-18
SLIDE 18

4th International Workshop on Innovating the Network for Data Intensive Science

18

Use LQE

Long Queue Emulation

slide-19
SLIDE 19

4th International Workshop on Innovating the Network for Data Intensive Science

19

Long Queue Emulation

  • Optimal LQE size
slide-20
SLIDE 20

4th International Workshop on Innovating the Network for Data Intensive Science

20

  • Goal: move packets from the memory ring to the disk

without using locks Lockless Bimodal Queues

slide-21
SLIDE 21

4th International Workshop on Innovating the Network for Data Intensive Science

21

  • Goal: move packets from the memory ring to the disk

without using locks Lockless Bimodal Queues

slide-22
SLIDE 22

4th International Workshop on Innovating the Network for Data Intensive Science

22

Lockless Bimodal Queues

slide-23
SLIDE 23

4th International Workshop on Innovating the Network for Data Intensive Science

Tail Early Dropping

23

Information/Entropy Connection bits in sequence of arrival

slide-24
SLIDE 24

4th International Workshop on Innovating the Network for Data Intensive Science

Tail Early Dropping

24

Information/Entropy Connection bits in sequence of arrival

slide-25
SLIDE 25

4th International Workshop on Innovating the Network for Data Intensive Science

Tail Early Dropping

25

slide-26
SLIDE 26

4th International Workshop on Innovating the Network for Data Intensive Science

LFN Tables

26

slide-27
SLIDE 27

4th International Workshop on Innovating the Network for Data Intensive Science

LFN Tables

27

slide-28
SLIDE 28

4th International Workshop on Innovating the Network for Data Intensive Science

LFN Tables

28

slide-29
SLIDE 29

4th International Workshop on Innovating the Network for Data Intensive Science

  • Priority queue: element at the front of the queue is the greatest of all

the elements it contains, according to some total ordering defined by their priority.

  • Found at the core of important computer science problems:
  • Shortest path problem
  • Packet scheduling in Internet routers
  • Event driven engines
  • Huffman compression codes
  • Operating systems
  • Bayesian spam filtering
  • Discrete optimization
  • Simulation of colliding particles
  • Artificial intelligence

Multiresolution Priority Queues

29

slide-30
SLIDE 30

4th International Workshop on Innovating the Network for Data Intensive Science

n: number of elements in the queue c: maximum integer priority value r: number of resolution groups supported by the multiresolution priority queue

30

Year Author Data structure Insert Extract Notes 1964 Williams [3] Binary heap O(log(n)) O(log(n)) Simple to implement. 1984 Fredman et al. [4] Fibonacci Heaps O(1) O(log(n)) More complex to implement. 1988 Brown [8] Calendar queues O(1) O(c) Need to be balanced and resolution cannot be tuned. 2000 Chazelle [5] Soft heaps O(1) O(1) Unbounded error. 2008 Mehlhorn et

  • al. [7]

Bucket queues O(1) O(c) Priorities must be small integers and resolution cannot be tuned. 2017 Ros-Giralt et

  • al. (this work)

Multiresolution priority queue O(1), O(r) or O(log(r)) O(1) Tunable/bounded resolution error. Error is zero if priority space is multi-resolutive.

Multiresolution Priority Queues

slide-31
SLIDE 31

4th International Workshop on Innovating the Network for Data Intensive Science

  • A multiresolution priority queue is a container data structure that at all

times maintains the following invariant:

  • Intuitively:

1. Discretize the priority space into a sequence of slots or resolution groups 2. Prioritize elements according to the slot in which they belong. 3. Elements belonging to lower slots are given higher priority. 4. Within a slot, ordering is not guaranteed. This enables a mechanism to control the trade-off accuracy versus performance.

31

Multiresolution Priority Queues

slide-32
SLIDE 32

4th International Workshop on Innovating the Network for Data Intensive Science

  • The larger the parameter pΔ → the lower the resolution of the queue → the

higher the error → the higher the performance (and vice versa)

  • Instead of ordering the space of elements, an MR-PQ orders the space of

priorities.

  • The information theoretic barriers of the problem are broken by introducing

error in a way that entropy is reduced:

  • In many real world problems, the space of priorities has much lower

entropy than the space of keys.

  • Example:
  • Space of keys is the set of real numbers (Sk)
  • Space of priorities is the set of distances between any two US cities (Sp)
  • Entropy(Sk) >> Entropy(Sp)

32

Multiresolution Priority Queues

slide-33
SLIDE 33

4th International Workshop on Innovating the Network for Data Intensive Science

  • How it works through an example.

Let a multiresolution priority queue have parameters pΔ = 3, pmin = 7 and pmax = 31, and assume we insert seven elements with priorities 19, 11, 17, 12, 24, 22 and 29 (inserted in this order). Then:

33

Multiresolution Priority Queues

slide-34
SLIDE 34

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Base Algorithm

34

slide-35
SLIDE 35

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Base Algorithm

35

slide-36
SLIDE 36

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Base Algorithm

36

slide-37
SLIDE 37

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Base Algorithm

37

slide-38
SLIDE 38

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Correctness and Complexity

38

slide-39
SLIDE 39

4th International Workshop on Innovating the Network for Data Intensive Science

  • Example:

Multi-resolutive Priority Spaces

39

slide-40
SLIDE 40

4th International Workshop on Innovating the Network for Data Intensive Science

  • Problems with multi-resolutive priority spaces can be

resolved by a multi-resolution priority queue at a faster speed and without adding any additional error. In this case, we achieve better performance at no cost.

  • If condition (2) does not hold, then an error is

introduced but the entropy of the problem stays

  • constant. In this case, we also achieve better

performance but at the cost of losing some accuracy.

Multi-resolutive Priority Spaces

40

slide-41
SLIDE 41

4th International Workshop on Innovating the Network for Data Intensive Science

  • We can apply the rules of multi-resolutive priority spaces to optimize

the performance of problems involving priority queues.

  • Example. Consider the classic shortest path problem, known to have a

complexity of O((v+e)log(v)), for a graph with v vertices and e edges (See Section 24.3 of [2]). By using a multiresolution priority queue we have:

  • If the graph is such that the edge weights define a multi-resolutive

priority space, then using MR-PQ we can find the exact shortest path with a cost O(v+e).

  • Otherwise, we can find the approximate shortest path with a cost

O(v+e) and with a controllable error given by the parameter r. Multi-resolutive Priority Spaces

41

slide-42
SLIDE 42

4th International Workshop on Innovating the Network for Data Intensive Science

  • The base MR-PQ algorithm assumes priorities are in the

set [pmin, pmax).

  • Data structure can be generalized to support priorities in

the set [pmin+d(t), pmax+d(t)), where d(t) is any monotonically increasing function of a parameter t.

  • The case of sliding priority sets is particularly relevant to

applications that run event-driven engines. (See for example Section 6.5 of [2].)

Multiresolution Priority Queues: Support for Sliding Priorities

42

slide-43
SLIDE 43

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Support for Sliding Priorities

43

  • Sliding priorities can be supported with a few additional lines of code:
slide-44
SLIDE 44

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Binary Heap Based Optimization

44

  • When not all the slots in the QLT table are filled in, performance can be

improved by implementing the QLT table using a binary heap.

  • Let a multiresolution priority queue have parameters pΔ = 3, pmin = 7 and pmax =

31, and assume we insert seven elements with priorities 19, 11, 17, 12, 24, 22 and 29 (inserted in this order). Then:

slide-45
SLIDE 45

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Complexity Analysis

45

slide-46
SLIDE 46

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Benchmarks

46

  • We use MR-PQ to resolve a real world HPC problem.
  • Problem statement: when running the Bro network analyzer [12] against very

high speed traffic consisting of many short lived connections, the BubbleDown operation in the (binary heap based) priority queue used to manage Bro timers becomes a main system bottleneck.

  • Top functions in Bro according to their computational cost:
slide-47
SLIDE 47

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Benchmarks

47

  • Call graph showing the

BubbleDown function as the main bottleneck

slide-48
SLIDE 48

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Benchmarks

48

BH-PQ: Bro with its standard binary heap based priority queue to manage timers MR-PQ: Bro using a multiresolution priority queue to manage timers

slide-49
SLIDE 49

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Benchmarks

49

BH-PQ: Bro with its standard binary heap based priority queue to manage timers MR-PQ: Bro using a multiresolution priority queue to manage timers

slide-50
SLIDE 50

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Benchmarks

50

BH-PQ: Bro with its standard binary heap based priority queue to manage timers MR-PQ: Bro using a multiresolution priority queue to manage timers

slide-51
SLIDE 51

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Benchmarks

51

BH-PQ: Bro with its standard binary heap based priority queue to manage timers MR-PQ: Bro using a multiresolution priority queue to manage timers

slide-52
SLIDE 52

4th International Workshop on Innovating the Network for Data Intensive Science

Multiresolution Priority Queues: Benchmarks

52

BH-PQ: Bro with its standard binary heap based priority queue to manage timers MR-PQ: Bro using a multiresolution priority queue to manage timers

slide-53
SLIDE 53

4th International Workshop on Innovating the Network for Data Intensive Science

53

R-Scope Network Security Sensor: System Wide Benchmarks

slide-54
SLIDE 54

4th International Workshop on Innovating the Network for Data Intensive Science

54

R-Scope Network Security Sensor: System Wide Benchmarks

slide-55
SLIDE 55

4th International Workshop on Innovating the Network for Data Intensive Science

R-Scope

R-Scope Providing Deep Network Visibility at SC2017 / SCinet

slide-56
SLIDE 56

4th International Workshop on Innovating the Network for Data Intensive Science

R-Scope Providing Deep Network Visibility at SC2017 / SCinet

  • You can visit our demo at the SCinet NOC
slide-57
SLIDE 57

4th International Workshop on Innovating the Network for Data Intensive Science

57

Thank You

632 Broadway Suite 803 New York, NY 10012 812 SW Washington St. Suite 1200 Portland, OR 97205