Recursive Lattice Search: Hierarchical Heavy Hitters Revisited - - PowerPoint PPT Presentation

recursive lattice search hierarchical heavy hitters
SMART_READER_LITE
LIVE PREVIEW

Recursive Lattice Search: Hierarchical Heavy Hitters Revisited - - PowerPoint PPT Presentation

Recursive Lattice Search: Hierarchical Heavy Hitters Revisited Kenjiro Cho IIJ Research Laboratory IMC17 November 2, 2017 Hierarchical Heavy Hitters (HHHs) identifying significant clusters across multiple planes - exploiting underlying


slide-1
SLIDE 1

Recursive Lattice Search: Hierarchical Heavy Hitters Revisited

Kenjiro Cho

IIJ Research Laboratory

IMC’17 November 2, 2017

slide-2
SLIDE 2

Hierarchical Heavy Hitters (HHHs)

  • identifying significant clusters across multiple planes
  • exploiting underlying hierarchical IP address structures
  • e.g., (src, dst) address pairs
  • (1.2.3.4, *) → one-to-many: e.g., scanning
  • (*, 5.6.7.8) → many-to-one: e.g., DDoS
  • (1.2.3.0/24, 4.5.6.0/28) → subnet-to-subnet
  • can be extended to higher dimensions (e.g., 5-tuple)
  • powerful tool for traffic monitoring/anomaly detection

2

slide-3
SLIDE 3

Unidimensional HHH

  • an HHH: an aggregate with count c ≥ φN
  • φ: threshold N: total input (e.g., packets or bytes)
  • HHHs can be uniquely identified by depth-first tree traversal
  • aggregating small nodes until it exceeds the threshold

3

10.1.1.4 10.1.2.5 10.1.1/24 10.1/16 0.0.0.0/0 10.1.2/24 192.168/16 192.168.3/24 10.1.1.4 10.1.2.5 10.1.1/24 10.1/16 0.0.0.0/0 10.1.2/24 192.168/16 192.168.3/24

slide-4
SLIDE 4

Multi-dimensional HHH

  • each node has multiple parents
  • many combinations for aggregation
  • much harder than one-dimension
  • search space for 2-dimensional IPv4 addrs
  • 5×5=25 for bytewise aggregation
  • 33×33=1089 for bitwise aggregation

4

0,0

sum of prefix lengths

8 16 24 40 48 56 64 32

32,32 32,0 0,32 24,32 16,32 8,32 32,8 32,16 32,24 0,8 0,16 0,24 8,0 16,0 24,0 8,8 16,16 24,24 24,8 8,24 24,16 16,8 8,16 16,24

src: 1.2.3.4 dst: 5.6.7.8 [1.2.3.4/32,5.6.0.0/16] [1.2.3.0/24,5.6.7.0/24] [1.2.0.0/16,1.2.3.4/32] [1.2.3.4/32,5.6.7.0/24] [1.2.3.0/24,5.6.7.8/32] [1.2.3.4/32,5.6.7.8/32]

Lattice for IPv4 prefix length pair with 8-bit granularity

slide-5
SLIDE 5

Challenges

  • performance
  • bitwise aggregation is costly
  • operational relevance
  • ordering: e.g., [32, *] and [16, 16]
  • broad and redundant aggregates: (e.g., 128/4 and 128/2)
  • re-aggregation
  • useful for interactive analysis (for zoom-in/out)

5

slide-6
SLIDE 6

Contributions

  • new efficient HHH algorithm for bitwise aggregation
  • matches operational needs, supports re-aggregation
  • open-source tool and open datasets
  • more broadly, transforming the existing hard problem

into a tractable one, by revisiting the commonly accepted definition

6

slide-7
SLIDE 7

Various HHH definitions

  • discounted HHH ⬅ we also employ this
  • exclude descendant HHHs’ counts for concise outputs

ci′= ∑j cj′ where { j ∈ child(i) | cj′ < φN }

  • rollup rules: how to aggregate counts to parent
  • overlap rule: allows double-counting to detect all possible HHHs
  • split rule: preserves counts ⬅ we use a simple first-found split rule
  • aggregation ordering
  • sum of prefix lengths ⬅ we’ll revisit this ordering

7

slide-8
SLIDE 8

Previous algorithms

  • elaborate structures
  • cross-producting, grid-of-trie, rectangle-search
  • theoretical analyses
  • streaming approximation algorithms w/ error bounds
  • all the existing methods are bottom-up
  • our algorithm: top-down, deterministic
  • no elaborate structure, no approximation, no parameter

8

slide-9
SLIDE 9

HHH revisited

  • key idea: redefine child(i) to allow space partitioning
  • child(i): from bin-tree to quadtree

9

[16, 16] bottom-up aggregation top-down space partitioning [16, 16]

[16, 16] (1.2/16, 5.6/16) (1.2.0/24, 5.6/16) (1.2.1/24, 5.6/16) (1.2.3/24, 5.6/16) ... (1.2/16, 5.6.0/24) (1.2/16, 5.6.1/24) (1.2/16, 5.6.3/24) ...

[24, 16]

[24, 16]

[16, 24]

[16, 24] [24, 24]

slide-10
SLIDE 10

Z-order [Morton1966]

  • a space filling curve
  • by bit-interleaving (l0, l1)
  • prefers the largest value across

dimensions

  • looks different from standard Z-curve
  • [0..32] doesn’t have full 5-bit space
  • makes /32 higher in the order

10

0,0 0,0 32,32 32,32 32,0 32,0 0,32 0,32 24,32

(VI) left bottom edge (V) right bottom edge (IV) lower sub-area (I) upper sub-area (II) right sub-area (III) left sub-area

24,32

16,32

16,32

8,32

8,32

32,8

32,8

32,16

32,16

32,24

32,24

0,8

0,8

0,16 0,16 0,24

0,24

8,0

8,0

16,0 16,0 24,0

24,0

8,8

8,8

16,16 16,16 24,24

24,24

24,8

24,8

8,24

8,24

24,16

24,16

16,8

16,8

8,16

8,16

16,24

16,24

slide-11
SLIDE 11

Recursive spatial partitioning

  • visit regions from (VI) to (I) recursively
  • 2 bottom edges
  • (VI) left-bottom edge
  • (V) right-bottom edge
  • 4 quadrants
  • (IV) lower quadrant
  • (III) left quadrant
  • (II) right quadrant
  • (I) upper quadrant

11

0,0 32,32 32,0

(III) (VI) (VI) (II) (V) (V) (I) (IV)

0,32 24,32 16,32 8,32 32,8 32,16 32,24 0,8 0,16 0,24 8,0 16,0 24,0 8,8 16,16 24,24 24,8 8,24 24,16 16,8 8,16 16,24

slide-12
SLIDE 12

Recursive Lattice Search (RLS)

  • idea: recursively subdivide aggregates by Z-order
  • pros
  • recurse only for flows ≥ thresh
  • sub-division needs only parent’s sub-flows
  • /32 becomes higher in the order
  • cons
  • bias for the first dimension

12

slide-13
SLIDE 13

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-14
SLIDE 14

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-15
SLIDE 15

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-16
SLIDE 16

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-17
SLIDE 17

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-18
SLIDE 18

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-19
SLIDE 19

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-20
SLIDE 20

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-21
SLIDE 21

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-22
SLIDE 22

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-23
SLIDE 23

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-24
SLIDE 24

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-25
SLIDE 25

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-26
SLIDE 26

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-27
SLIDE 27

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-28
SLIDE 28

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-29
SLIDE 29

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-30
SLIDE 30

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-31
SLIDE 31

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-32
SLIDE 32

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-33
SLIDE 33

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-34
SLIDE 34

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-35
SLIDE 35

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

RLS Illustrated

10 inputs, HHH ≥ 2

slide-36
SLIDE 36

13

0,0 32,32 0,16 16,16 16,0 32,16 32,0 16,32 0,32

4 HHHs extracted

RLS Illustrated

10 inputs, HHH ≥ 2

slide-37
SLIDE 37

Evaluation (in the paper)

  • ordering bias: (src, dst) vs (dst, src) ➡ negligible
  • comparison with Space-Saving: to illustrate differences
  • outputs ➡ much more compact
  • differences due to different definitions
  • speed ➡ 100 times faster for bitwise aggregation
  • but requires more memory (as a non-streaming algo)

14

slide-38
SLIDE 38

Implementations: RLS in agurim

  • agurim: open-source tool
  • 2-level HHH
  • main-attribute (src-dst adds), sub-attritbute (ports)
  • protocol specific heuristics
  • change depth of recursions by protocol knowledge to

meet operational needs

  • online processing by exploiting multi-core CPU

15

slide-39
SLIDE 39

agurim Web UI

16

http://mawi.wide.ad.jp/~agurim/

slide-40
SLIDE 40

Summary

  • Recursive Lattice Search algorithm for HHH
  • revisit the definition of HHH, apply Z-ordering
  • propose an efficient HHH algorithm
  • open-source tool and open datasets from 2013

http://mawi.wide.ad.jp/~agurim/about.html

17

slide-41
SLIDE 41

evaluation in detail

  • simulation: code from SpaceSaving [Mitzenmacher2012]
  • quick hack to port agurim’s RLS
  • input: a mawi packet trace from 2016-10-20
  • order sensitivity: (src,dst) vs. (dst,src)
  • very similar outputs: not sensitive to the order
  • comparing with SS (streaming algorithm, overlap rollup)
  • different definitions: just to illustrate major differences
  • outputs: comparable, except nodes in upper lattice
  • performance: 100x faster for bit-wise aggregation!

18

slide-42
SLIDE 42
  • rder sensitivity

(src,dst) vs. (dst,src)

  • (1)-(12): identical
  • (13)-(15): minor difference
  • not sensitive to src-dst order

19

aggregated by (src,dst) region no src dst c′/N(%) VI (1) 112.31.100.1/32 163.229.97.230/32 16.5 V (2) 64.0.0.0/2 202.203.3.13/32 5.2 (3) 128.0.0.0/1 202.203.3.13/32 5.8 (4) * 202.26.162.46/32 6.0 III (5) 163.229.96.0/23 * 5.0 (6) 203.179.128.0/20 * 6.8 II (7) * 202.203.3.0/24 5.9 (8) * 203.179.140.0/23 5.7 (9) * 163.229.128.0/17 5.1 I (10) 0.0.0.0/1 202.192.0.0/12 5.3 (11) 202.192.0.0/12 * 6.7 (12) * 202.0.0.0/7 7.6 (13) 128.0.0.0/4 * 5.0 (14) 128.0.0.0/2 * 6.0 (15) * 128.0.0.0/2 5.4

  • *

* 2.0 100.0 aggregated by (dst,src) (1)-(12) identical to (src,dst) I (13) 128.0.0.0/2 0.0.0.0/2 5.7 (14) * 128.0.0.0/3 5.3 (15) 128.0.0.0/1 * 6.4

  • *

* 1.0

slide-43
SLIDE 43

HHHs reported by RLS vs. SS

  • # of HHHs: RLS:15, SS:52
  • missing HHHs: not informative
  • double-counting / short prefix lengths
  • 40 missing HHHs, 35 in (I), 4 in (II), 1 in

(III)

  • RLS: concise and compact summary

20

no RLS(%) SS(%) missing SS HHHs with their c′/N(%) (1) 16.5 16.5

  • (2)

5.2 5.2

  • (3)

5.8 5.8

  • (4)

6.0 6.0

  • (5)

5.0 5.0

  • (6)

6.8 6.8

  • (7)

5.9 16.9

  • (8)

5.7 5.7

  • (9)

5.1 5.1

  • (10)

5.3

  • (96/3,202.203/16):5.4 (0/2,202.203/16):5.6

(112/4,202.192/12):5.2 (64/2,202.192/12):9.0

(11) 6.7 6.7

  • (12)

7.6

  • (0/1,203.179.128/20):6.0 (128/2,202.203/16):5.5

(192/4,202/8):5.1 (*,202.192/12):25.5 (16/4,202/7):5.4 (128/1,202.128/9):10.6 (64/2,202/7):15.5 (128/1,202/7):17.7

(13) 5.0 5.2

  • (14)

6.0

  • (163.229/16,0/1):6.0 (144/4,128/1):5.3

(128/2,96/3):5.0 (128/3,0/1):5.3 (160/3,128/1):7.0 (128/2,0/2):5.7 (128/2,0/1):11.4

(15) 5.4 33.1

(128/1,160/6):5.0 (192/4,128/2):5.2 (0/1,128/2):22.7 (*,128/3):7.1

  • 2.0
  • (202/7,0/2):5.4 (192/8,128/1):5.6 (202/8,0/1):5.7

(202/7,128/1):6.0 (192/3,200/5):10.5 (128/1,112/6):5.1 (112/5,128/1):21.8 (200/5,*):17.0 (192/4,128/1):13.6 (128/1,16/4):6.2 (*,200/5):42.4 (64/3,128/1):6.0 (96/3,128/1):29.7 (128/1,64/2):10.4 (0/1,128/1):46.7 (128/1,*):53.3 (*,128/1):78.3

aggregated by (src,dst) region no src dst c′/N(%) VI (1) 112.31.100.1/32 163.229.97.230/32 16.5 V (2) 64.0.0.0/2 202.203.3.13/32 5.2 (3) 128.0.0.0/1 202.203.3.13/32 5.8 (4) * 202.26.162.46/32 6.0 III (5) 163.229.96.0/23 * 5.0 (6) 203.179.128.0/20 * 6.8 II (7) * 202.203.3.0/24 5.9 (8) * 203.179.140.0/23 5.7 (9) * 163.229.128.0/17 5.1 I (10) 0.0.0.0/1 202.192.0.0/12 5.3 (11) 202.192.0.0/12 * 6.7 (12) * 202.0.0.0/7 7.6 (13) 128.0.0.0/4 * 5.0 (14) 128.0.0.0/2 * 6.0 (15) * 128.0.0.0/2 5.4

  • *

* 2.0 100.0

slide-44
SLIDE 44

CPU time: RLS vs. SS

  • RLS: lower cost for finer granularity
  • 100+ times faster for bit-wise aggregation!

21

0.01 0.1 1 10 100 1000 10000 0.1 1 10 100 CPU time (sec) input N (million packets) RLS 5x5 RLS 33x33 SS 5x5 SS 33x33

slide-45
SLIDE 45

memory usage: RLS vs. SS

  • RLS: proportional to inputs (ok for modern PCs)
  • SS: fixed memory usage

22

0.1 1 10 100 1000 0.1 1 10 100 Memory usage (MB) input N (million packets) RLS 5x5 RLS 33x33 SS 5x5 SS 33x33