Decentralized Evaluation of Regular Expressions for Capability - - PowerPoint PPT Presentation

decentralized evaluation of regular expressions for
SMART_READER_LITE
LIVE PREVIEW

Decentralized Evaluation of Regular Expressions for Capability - - PowerPoint PPT Presentation

Decentralized Evaluation of Regular Expressions for Capability Discovery in Peer-to-Peer Networks Maximilian Szengel Advisors: C. Grothoff, R. Holz, H. Niedermayer, B. Polot Masters thesis Chair for Network Architectures and Services


slide-1
SLIDE 1

Decentralized Evaluation of Regular Expressions for Capability Discovery in Peer-to-Peer Networks

Maximilian Szengel Advisors: C. Grothoff, R. Holz, H. Niedermayer, B. Polot Master’s thesis

Chair for Network Architectures and Services Technische Universit¨ at M¨ unchen

21 Nov 2012

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 1

slide-2
SLIDE 2

Motivation

Searching in DHT-based Peer-to-Peer Networks Distributed key/value storage, typically hashes for keys Range queries (PastryStrings, PHT) Pattern matching (Cubit, DPMS) Similarity queries (Karnstedt et al.)

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 2

slide-3
SLIDE 3

Motivation

Searching in DHT-based Peer-to-Peer Networks Distributed key/value storage, typically hashes for keys Range queries (PastryStrings, PHT) Pattern matching (Cubit, DPMS) Similarity queries (Karnstedt et al.) Our approach: regular expressions

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 2

slide-4
SLIDE 4

Motivation

Capability Discovery in Peer-to-Peer Networks

Distributed Hash Table

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3

slide-5
SLIDE 5

Motivation

Capability Discovery in Peer-to-Peer Networks

Offering Exit Node for: 192.0.3.0/24 TCP and UDP Offering Exit Node for: 2001:0db8::0370:FCA0:7334/64 UDP Offering Exit Node for: 192.0.4.0/24 TCP Distributed Hash Table

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3

slide-6
SLIDE 6

Motivation

Capability Discovery in Peer-to-Peer Networks

Offering Exit Node for: 192.0.3.0/24 TCP and UDP Offering Exit Node for: 2001:0db8::0370:FCA0:7334/64 UDP Offering Exit Node for: 192.0.4.0/24 TCP Searching Exit Node for: 192.0.3.123 TCP Distributed Hash Table

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3

slide-7
SLIDE 7

Motivation

Capability Discovery in Peer-to-Peer Networks

Offering Exit Node for: 192.0.3.0/24 TCP and UDP Offering Exit Node for: 2001:0db8::0370:FCA0:7334/64 UDP Offering Exit Node for: 192.0.4.0/24 TCP Searching Exit Node for: 192.0.3.123 TCP Distributed Hash Table

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3

slide-8
SLIDE 8

Motivation

Capability Discovery in Peer-to-Peer Networks

Offering Exit Node for: 192.0.3.0/24 TCP and UDP Offering Exit Node for: 2001:0db8::0370:FCA0:7334/64 UDP Offering Exit Node for: 192.0.4.0/24 TCP Searching Exit Node for: 192.0.3.123 TCP Distributed Hash Table

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 3

slide-9
SLIDE 9

Approach: Idea

1 Offerer creates regular expression describing service 2 Regular expression is converted to a DFA 3 DFA is stored in the DHT 4 Patron matches using a string

Offerer Patron PUT GET DFA DHT Search string NFA

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 4

slide-10
SLIDE 10

Problem: Mapping of States to Keys

Regular expression (ab|cd)e∗f and corresponding DFA

q0 a c (ab|cd)e* (ab|cd)e*f a c d b f e

A regular expression is assigned to each state as its identifier. The hash of the identifier is used as the key for DHT PUT.

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 5

slide-11
SLIDE 11

Problem: Mapping of States to Keys

Regular expression (ab|cd)e∗f and corresponding DFA

q0 a c (ab|cd)e* (ab|cd)e*f a c d b f e DHT h("(ab|cd)e*") h("(ab|cd)e*f") h("a") h("c")

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 5

slide-12
SLIDE 12

Problem: Merging of DFAs

Regular expressions (ab|cd)e∗f and (ab|cd)e∗fg∗ with corresponding DFAs

q0 a a c c (ab|cd)e* b d e (ab|cd)e*f f q0 a a c c (ab|cd)e* b d e (ab|cd)e*fg* f g Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 6

slide-13
SLIDE 13

Problem: Merging of DFAs

Merged NFA for regular expressions (ab|cd)e∗fg∗ and (ab|cd)e∗f

q0 a a c c (ab|cd)e* b d e (ab|cd)e*f f (ab|cd)e*fg* f g Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 6

slide-14
SLIDE 14

Problem: Decentralizing the Start State

Regular expression: abc∗defg∗h and k = 4.

abc* c abc*defg* def g abc*defg*h h q0 ab abcc c def abcd ef abde f

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 7

slide-15
SLIDE 15

Problem: Optimization (Path compression)

Compressing linear paths in the DFA. Example for abc(d∗|e)fgh.

1 a 2 b 3 c 4 d 8 e 5 f d f f 6 g 7 h Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 8

slide-16
SLIDE 16

Problem: Optimization (Path compression)

Compressing linear paths in the DFA. Example for abc(d∗|e)fgh.

1 a 2 b 3 c 4 d 8 e 5 f d f f 6 g 7 h

4 abcd 5 abcef abcf d f 7 gh

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 8

slide-17
SLIDE 17

Problem: Path compression length

Merging of DFAs with path compression

GNUNET − VPN00010000 − V4TCP110000000000000000000010(0|1)∗ GNUNET − VPN00010000 − V4TCP110000000000000000000011(0|1)∗ GNUNET − VPN00010000 − V4TCP110000010000000000000011(0|1)∗

1 G 2 N 3 U 27 ... 28 1 29 1

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 9

slide-18
SLIDE 18

Problem: Path compression length

Merging of DFAs with maximal path compression

GNUNET-VPN00010000-V4TCP 1 110000000000000000000010 2 110000000000000000000011 3 110000010000000000000011 1 1 1

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 9

slide-19
SLIDE 19

Problem: Path compression length

Merging of DFAs with limited path compression length

GNUNET-VPN00010000-V4TCP 1 11000000 5 11000001 2 00000000 3 00000010 4 00000011 1 1 6 00000000 7 00000010 1

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 9

slide-20
SLIDE 20

Evaluation

Implementation in GNUnet Profiling of Internet-scale routing using regular expressions to describe AS address ranges CAIDA AS data set: Real AS data

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 10

slide-21
SLIDE 21

Evaluation

AS 12816 129.187.0.0/16 131.159.0.0/16 138.244.0.0/15 138.246.0.0/16 ... 192.68.211.0/24 192.68.212.0/22 Distributed Hash Table AS 10001 49.128.128.0/19 61.195.240.0/20 122.49.192.0/21 123.255.240.0/21 175.41.32.0/21 202.75.112.0/20 202.238.32.0/20 210.48.128.0/21 211.133.224.0/20 219.124.0.0/20 219.124.0.0/21 219.124.8.0/21 AS 56357 188.95.232.0/22 192.48.107.0/24 AS 8265 91.223.12.0/24 195.96.192.0/19 195.96.192.0/24 195.96.193.0/24 195.96.194.0/23 195.96.196.0/22 195.96.200.0/22 195.96.204.0/22 195.96.208.0/21 195.96.216.0/21 AS 50038 57.236.47.0/24 57.236.48.0/24 57.236.51.0/24 193.104.87.0/24 AS 825 91.221.132.0/24 91.221.133.0/24 192.16.240.0/20 AS 32310 204.94.175.0/24 AS 931 46.183.152.0/21 103.10.233.0/24 186.233.120.0/21 186.233.120.0/22 186.233.124.0/22 AS 12812 193.188.128.0/24 193.188.129.0/24 193.188.130.0/24 193.188.131.0/24 AS 7212 129.59.0.0/16 160.129.0.0/16 192.111.108.0/24 192.111.109.0/24 192.111.110.0/24 199.78.112.0/24 199.78.113.0/24 199.78.114.0/24 199.78.115.0/24 AS 10002 61.114.64.0/20 61.195.128.0/20 120.50.224.0/19 120.72.0.0/20 202.180.192.0/20

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 10

slide-22
SLIDE 22

Evaluation: Results of Simulation (1)

Number of transitions and states in the merged NFA

400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 no compr. 2 4 6 8 16 # of transitions / states Maximum path compression length transitions states

Dataset: All 40, 696 ASs

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 11

slide-23
SLIDE 23

Evaluation: Results of Simulation (2)

Degree of non-determinism at states in the merged NFA

1 10 100 1000 10000 100000 1e+06 1e+07

1 2 3 # states degree of non-determinism

max path length 1 max path length 2 max path length 4 max path length 6 max path length 8 max path length 16

Dataset: All 40, 696 ASs

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 12

slide-24
SLIDE 24

Evaluation: Results of Simulation (3)

1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1 10 100 1000 10000 100000 % of states >= k out degree

  • max. path compression length 6
  • max. path compression length 8
  • max. path compression length 16

Dataset: All 40, 696 ASs

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 13

slide-25
SLIDE 25

Evaluation: Results of Emulation (1)

Search duration for five runs with 500 randomly connected peers, 500 regular expressions and 500 search strings.

50 100 150 200 250 300 350 400 450 500 20 40 60 80 100

# strings matched search duration in s

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 14

slide-26
SLIDE 26

Summary and Future Work

Achievements Capability discovery in DHT-based P2P networks using regular expressions Linear latency in the length of the search string Suitable for applications that can tolerate moderate latency Future Work Use regular expression search in new applications Open problem: searching using a regular expression Ultra-large scale profiling (SuperMUC)

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 15

slide-27
SLIDE 27

Thank you!

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 16

slide-28
SLIDE 28

Appendix

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 17

slide-29
SLIDE 29

Evaluation: Results of Emulation (2)

Emulation with subset of the CAIDA AS data

Average per peer 100 peers 500 peers 1, 000 peers PUT messages 561 kB 1, 231 kB 1, 166 kB GET messages 291 kB 364 kB 430 kB RESULT messages 20, 657 kB 762, 485 kB 397, 840 kB Number of states visited 9 10 9 Path length to accept state 5 4 5 search duration 15 s 30 s 46 s

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 18

slide-30
SLIDE 30

Example regular expression for AS 14259

GNVPN-0001-PAD(000000000111010011000002(0|1)*|10011100011000010010(0|1)*|101010001110011111001001(0|1)* |10111110000010000100(0|1)*|101111100000100001000010(0|1)*|101111100000100001000011(0|1)*|101111100000100001000100(0|1)* |101111100000100001000101(0|1)*|101111100000100001000111(0|1)*|101111100000100001001000(0|1)*|101111100000100001001001(0|1)* |101111100000100001001010(0|1)*|101111100000100001001011(0|1)*|101111100000100001001100(0|1)*|101111100000100001001101(0|1)* |101111100000100001001110(0|1)*|10111110000010000101(0|1)*|101111100000100001010000(0|1)*|101111100000100001010001(0|1)* |101111100000100001010011(0|1)*|101111100000100001010100(0|1)*|101111100000100001010101(0|1)*|101111100000100001010110(0|1)* |101111100000100001010111(0|1)*|101111100000100001011000(0|1)*|101111100000100001011001(0|1)*|101111100000100001011010(0|1)* |101111100000100001011011(0|1)*|101111100000100001011100(0|1)*|101111100000100001011101(0|1)*|101111100000100001011110(0|1)* |101111100000100001011111(0|1)*|10111110000010000110(0|1)*|101111100000100001100000(0|1)*|101111100000100001100001(0|1)* |101111100000100001100010(0|1)*|101111100000100001100011(0|1)*|101111100000100001100100(0|1)*|101111100000100001100110(0|1)* |101111100000100001100111(0|1)*|101111100000100001101000(0|1)*|101111100000100001101001(0|1)*|101111100000100001101010(0|1)* |101111100000100001101011(0|1)*|101111100000100001101100(0|1)*|101111100000100001101101(0|1)*|10111110000010000111(0|1)* |101111100000100001110000(0|1)*|101111100000100001110001(0|1)*|101111100000100001110010(0|1)*|101111100000100001110100(0|1)* |101111100000100001110101(0|1)*|101111100000100001110110(0|1)*|101111100000100001111000(0|1)*|101111100000100001111001(0|1)* |101111100000100001111010(0|1)*|101111100000100001111011(0|1)*|101111100000100001111100(0|1)*|101111100000100001111101(0|1)* |101111100000100001111110(0|1)*|101111100000100001111111(0|1)*|1011111001100000000(0|1)*|101111100110000000001010(0|1)* |101111100110000000001011(0|1)*|1011111001100000000011(0|1)*|101111100110000000010001(0|1)*|101111100110000000010011(0|1)* |101111100110000000010100(0|1)*|101111100110000000010101(0|1)*|10111110011000000001011(0|1)*|101111100110000000011000(0|1)* |101111100110000000011001(0|1)*|101111100110000000011010(0|1)*|101111100110000000011011(0|1)*|101111100110000000011100(0|1)* |101111100110000000011101(0|1)*|101111100110000000011110(0|1)*|101111100110000000011111(0|1)*|1011111001100000001(0|1)* |1011111001100000001000(0|1)*|101111100110000000100100(0|1)*|101111100110000000100101(0|1)*|10111110011000000010011(0|1)* |101111100110000000100110(0|1)*|101111100110000000100111(0|1)*|1011111001100000001010(0|1)*|101111100110000000101100(0|1)* |101111100110000000101101(0|1)*|101111100110000000101110(0|1)*|101111100110000000110(0|1)*|101111100110000000111(0|1)* |1011111001100000010(0|1)*|101111100110000001010101(0|1)*|101111100110000001010110(0|1)*|101111100110000001011000(0|1)* |101111100110000001011111(0|1)*|101111100110001011(0|1)*|101111100110001011010011(0|1)*|101111100110001011100010(0|1)* |10111110011000101110101(0|1)*|10111110011000101110111(0|1)*|101111100110001011110100(0|1)*|101111100110001011110110(0|1)* |101111100110001011111010(0|1)*|101111100110001011111011(0|1)*|101111100110001011111100(0|1)*|101111100110001011111101(0|1)* |101111100110001011111110(0|1)*|1011111001101011101100(0|1)*|10111110100110011(0|1)*|101111101001100111010111(0|1)* |101111101001100111100101(0|1)*|101111101001100111111000(0|1)*|10111110110001000(0|1)*|101111101100010000(0|1)* ... |11011100110(0|1)*|110010011110111011100111(0|1)*|110010011110111011110001(0|1)*|110010011110111011110011(0|1)* |110010011110111011110101(0|1)*|110010011110111011110110(0|1)*|110010011110111011111011(0|1)*|110010011110111011111100(0|1)* |110010011110111011111111(0|1)*)

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 19

slide-31
SLIDE 31

Example DFA for regular expression AS 14259

24 GNVPN-0001-PAD0000000001 48 GNVPN-0001-PAD1001110001 65 GNVPN-0001-PAD1010100011 85 GNVPN-0001-PAD1011111000 95 GNVPN-0001-PAD1011111001 114 GNVPN-0001-PAD1011111010 118 GNVPN-0001-PAD1011111011 137 GNVPN-0001-PAD1100000011 151 GNVPN-0001-PAD1100100000 196 GNVPN-0001-PAD1100100001 230 GNVPN-0001-PAD1100100111 28 1101 52 1000 69 1001 89 0010 99 1000 106 1010 116 01 122 0001 126 010 141 1001 111 155 0010 166 0011 182 0111 192 1100 193 1101 200 0010 206 0110 215 1100 222 1101 225 1111 234 0110 242 1011 32 0011 36 0000 38 02 1 55 010 56 57 1 73 1111 75 00 78 100 1 93 0001 94 1 1 0001 0000 103 1 109 111 104 1 111 01 112 1 117 1 000 001 128 11 1 145 1110 0101 159 0100 162 0110 170 0000 174 1001 176 1010 186 0111 1 111 00 165 000 1 001 0000 0001 100 11 203 101 210 1100 219 1110 11 000 10 111 238 0111 1 1

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 20

slide-32
SLIDE 32

Bibliography I

Maxmilian Szengel (TUM) Decentralized Evaluation of Regular Expressions 21