Analysis and modeling of the KAD P2P network Bachelor thesis - - PowerPoint PPT Presentation

analysis and modeling of the kad p2p network
SMART_READER_LITE
LIVE PREVIEW

Analysis and modeling of the KAD P2P network Bachelor thesis - - PowerPoint PPT Presentation

Lehrstuhl f ur Netzarchitekturen und Netzdienste Analysis and modeling of the KAD P2P network Bachelor thesis summary presentation Maximilian Sievert Lehrstuhl f ur Netzarchitekturen und Netzdienste Institut f ur Informatik


slide-1
SLIDE 1

Lehrstuhl f¨ ur Netzarchitekturen und Netzdienste

Analysis and modeling

  • f the KAD P2P network

Bachelor thesis summary presentation Maximilian Sievert

Lehrstuhl f¨ ur Netzarchitekturen und Netzdienste Institut f¨ ur Informatik Technische Universit¨ at M¨ unchen

May 29, 2013

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 1

slide-2
SLIDE 2

Outline

Introduction and Context Crawling framework and conducted crawls Evaluation Conclusion and future research

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 2

slide-3
SLIDE 3

Part I Introduction and context

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 3

slide-4
SLIDE 4

Motivation and goals

P2P network simulators Used to analyze interaction between P2P overlay and IP underlay Behavior of P2P nodes Geographic distribution of nodes, AS Analysis of KAD (aMule/eMule) to determine metrics PlanetLab vantage points Reasons for KAD

  • ne of the largest active P2P networks

simple, open source protocol

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 4

slide-5
SLIDE 5

Kademlia / Kad

Kademlia P2P distributed hash table (DHT) Kademlia: structured P2P network, XOR distance metric Routing Table: unbalanced binary tree of k-buckets Protocol changes in Kad (eMule, aMule): 128 bit md4 key/node IDs instead of 160 bit 2 protocol versions: Packet compression above certain size

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 5

slide-6
SLIDE 6

Related work

Steiner 2008:

Blizzard crawler: [IP , TCP , UDP , ID] mapping snapshot daily full crawls for a year, zone crawls every 5 minutes for 6 months

Jie Yu et al 2009 ‘ID Repetition in Kad’: similar crawler

ID reuse port aliasing non-persistent IDs silent peers

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 6

slide-7
SLIDE 7

Part II Crawling framework

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 7

slide-8
SLIDE 8

Crawling process

Adaptation of Steiner’s crawler Blizzard. Own additions: Protocol version 2 decompression throttle delay parameter Data structures: Queue U of discovered yet uncontacted nodes Hashset D of all discovered nodes (stores IP , UDP port) Process: Initialize U with inital set of starting peers Sender-thread loop, Receiver-thread loop Abort conditions: U empty for a while, timeout, network issue Output: Binary dump of sent requests / received responses, text log

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 8

slide-9
SLIDE 9

Crawling parameters

Limitations: Bandwidth Firewall/Rate restrictions Parameters: Request type: number of contacts, 1-31 Request burst size: under 10 to avoid remote spam block Request throttling: Limit on nodes queried per second Zone filter: restrict ’valid’ nodes to specific ID prefix

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 9

slide-10
SLIDE 10

Conducted crawls

TU Munich (net.in.tum.de) Early crawls:

# Start Duration Discovered Queried Responsive 1 22/03/11 15:24 00:59:50 2.685.010 63% 26% 2 22/03/11 21:54 00:59:56 1.950.599 65% 29% 3 22/03/11 22:55 00:51:20 1.920.401 64% 29% 4 05/04/11 16:27 00:59:53 3.070.211 64% 24% 5 06/04/11 13:24 00:46:46 2.816.465 55% 29% 6 13/04/11 01:38 00:56:56 1.960.204 100% 21% 7 14/04/11 20:31 01:14:24 2.334.305 69% 32% 8 18/04/11 18:11 01:59:57 2.735.059 81% 19% 9 19/04/11 04:47 01:53:29 2.229.108 100% 16% 10 19/04/11 23:42 01:35:16 1.853.972 100% 20% 11 20/04/11 09:05 01:59:58 2.421.390 90% 15% 12 20/04/11 18:07 01:59:59 2.596.452 81% 18% 13 20/04/11 20:26 01:53:58 2.153.941 100% 20%

Adaptation and addition of features and parameters to crawler.

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 10

slide-11
SLIDE 11

Conducted crawls

TU Munich (net.in.tum.de)

# Start Duration P Discovered Queried Responsive 14 09/05/11 12:49 01:59:56 3 ms b8 2.733.412 77% 19% 15 10/05/11 07:34 01:39:59 3 ms 2.388.565 76% 19% 16 10/05/11 14:52 01:59:59 3 ms 2.939.943 68% 21% 17 10/05/11 19:25 02:23:21 3 ms 2.517.320 100% 20% 18 10/05/11 22:15 01:50:54 3 ms 2.033.205 100% 23% 19 18/05/11 09:50 01:59:57 2 ms 2.523.813 99% 16% 20 18/05/11 12:24 01:59:57 2 ms 2.637.730 90% 16% 21 19/05/11 10:23 02:08:34 2 ms 2.615.946 100% 16% 22 19/05/11 13:46 02:30:18 2 ms 3.002.986 100% 16% 23 19/05/11 17:24 07:18:46 2 ms 3.589.031 100% 17% 24 23/05/11 01:28 02:38:23 5 ms -n 2.229.040 100% 20% 25 26/05/11 13:11 03:54:46 10 ms -m 2.957.286 70% 20% 26 30/05/11 15:12 03:35:54 4 ms -m 2.771.051 63% 21% 27 30/05/11 20:09 03:23:33 4 ms -m 2.214.414 100% 22% 28 20/06/11 07:28 01:09:37 4 ms r7 2.359.295 37% 43% 29 20/06/11 08:52 07:22:31 4 ms r7 5.787.889 100% 18% 30 22/06/11 12:00 03:09:35 4 ms r7 3.821.829 100% 17% M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 11

slide-12
SLIDE 12

Conducted crawls

PlanetLab nodes as global vantage points:

# Start Duration P Discovered Queried Responsive China (planetlab-1.sjtu.edu.cn) 31 19/05/11 13:33 07:31:44 10 ms 3.682.385 100% 15% 32 22/05/11 13:08 08:06:00 10 ms 3.875.197 100% 15% 33 20/06/11 04:27 03:09:08 10 ms r7 2.795.254 49% 28% Brazil (plab2.larc.usp.br) 34 19/05/11 13:38 08:08:17 10 ms 4.059.290 100% 17% 35 22/05/11 23:56 02:29:54 10 ms 2.259.123 100% 21% US: Denver (linux2.cs.du.edu) 36 20/06/11 00:50 05:12:27 10 ms 3.895.919 58% 25% 37 20/06/11 06:12 04:22:08 10 ms 3.818.514 54% 29% 38 22/06/11 04:03 09:59:58 20 ms 4.034.685 73% 19% US: California (planet4.cs.ucsb.edu) 39 19/05/11 14:11 00:26:25 10 ms 1.934.332 12% 61% US: Mass. (planetlab2.cs.umass.edu) 40 20/05/11 19:15 05:16:11 5 ms 2.300.916 100% 16% Italy (planetlab2.di.unito.it) 41 19/05/11 16:42 05:24:48 10 ms 2.779.760 90% 19% 42 20/05/11 19:32 03:03:37 5 ms 2.034.142 100% 18% M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 12

slide-13
SLIDE 13

Part III Evaluation

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 13

slide-14
SLIDE 14

Topology: network size

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 14

slide-15
SLIDE 15

Topology: ID distribution

IDs commonly (aMule, eMule) randomly initialized once, then persistent.

Figure : 8 bit ID prefix histogram for nodes in crawl 20110622 M

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 15

slide-16
SLIDE 16

Topology: ID distribution (filtered)

Notable IDs: 0x0000... 0x09262ce48db41838ce94c80cdaab3fab 0x025e747cea687ccab41c95fa62a27a5d 0x1000000 4 byte prefix

Figure : Filtered 8 bit ID prefix histogram for nodes in crawl 20110622 M

Except for a number of client classes the assumption is valid.

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 16

slide-17
SLIDE 17

Topology: Node IN degree

Measured as: number of unique remote nodes having a node as a contact in their routing table

Figure : Histogram of observed in degrees of all/responsive nodes in crawl 20110510 M2

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 17

slide-18
SLIDE 18

Geographic distribution

Determined by IP mapping.

Figure : Overview of geographic location of nodes in crawl 20110622 M

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 18

slide-19
SLIDE 19

Geographic distribution: continents

Figure : Distribution of observed key metrics by continents of crawl 20110510 M2 (left) and 20110622 M (right)

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 19

slide-20
SLIDE 20

Geographic distribution: widely known peers

Over all crawls: stable nodes (found in at least 10 crawls) with persistently large IN degree (average over 1000)

Figure : Geographic location of widely known and stable peers.

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 20

slide-21
SLIDE 21

Autonomous systems

Mapping data: 38143 distinct autonomous systems Participating number of AS:

6658 for 20110510 M2 7626 for 20110622 M

[IP , TCP] clients AS in 20110510 M2 AS in 20110622 M 1 2504 2601 2 - 10 2583 3092 11 - 100 1113 1384 101 - 1000 330 392 1001 - 10000 96 120 10001 - 100000 29 33 100001 - 1000000 3 4 ASN / Name

  • f crawl

responsive ID uniqueness AS4134 Chinanet 27.2547% 15.61% 41.08 % AS4837 CNCGROUP China169 Backbone 19.9828% 20.80% 41.48 % AS3269 Telecom Italia S.p.a. 4.2992% 34.02% 82.11 % AS4808 CNCGROUP IP network China169 Beijing Province Network 3.0233% 20.37% 67.42 % AS4812 China Telecom (Group) 2.6414% 13.86% 74.38 % AS3462 Data Communication Business Group 2.1754% 40.30% 83.83 % AS3352 Internet Access Network of TDE 2.1413% 27.50% 82.02 % AS3215 France Telecom - Orange 1.7685% 30.37% 88.64 % AS9394 CHINA RAILWAY Internet(CRNET) 1.7389% 27.12% 79.33 % AS1267 Infostrada S.p.A 1.5942% 35.96% 77.10 % M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 21

slide-22
SLIDE 22

Autonomous systems: characteristics

Characterization of AS by following features: Responsiveness: percentage of nodes from AS that responded ID uniqueness: Number of unique IDs in AS divided by Number of nodes in AS

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 22

slide-23
SLIDE 23

Deviations

ID repetition: reused ID

ID in 20110622 M [IP ,TCP]

  • f total

responsive 09262ce48db41838ce94c80cdaab3fab 14802 0.428% 0.000% 00000000000000000000000000000000 7363 0.213% 23.568% 025e747cea687ccab41c95fa62a27a5d 3085 0.089% 14.325% fcfcfcfcfcfcfcfcfcfcfcfcfcfcfcfc 1257 0.036% 15.868% f8355ff93e61e08936a4fc506a105f09 1017 0.029% 14.095% 62d9fbdd7e108116a36f962f1d53d855 1009 0.029% 46.337% 5c8f173ba95b09b48f9cfb7fdb33ffd7 656 0.019% 0.000% 2f2338be851ff069ef34ddbe76d914a5 425 0.012% 61.647% 74631f49283ff82d4fecc0f3ebe85849 392 0.011% 54.592% 02ac8fc8a3e4caba1b1b520a623d5732 289 0.008% 14.737%

non-random ID 0x1000000 4 byte prefix IP repetition: IPs associated with multiple nodes UDP port aliasing: client feature IP sharing: IP reassignment, NAT non-persistent ID

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 23

slide-24
SLIDE 24

Deviation: 0x09262ce48db41838ce94c80cdaab3fab

Identified by specific ID Silent Located in Asia Port usage:

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 24

slide-25
SLIDE 25

Deviation: non-persistent ID

Identified within single crawl by multiple IDs associated with persistent IP/ports. Port usage:

Figure : UDP/TCP port combinations used by all (left) and 2 or more (right) nodes known under at least 2 distinct IDs

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 25

slide-26
SLIDE 26

Part IV Conclusion

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 26

slide-27
SLIDE 27

Conclusion and future research

Learnings: Size of the network:

2-4 million endpoints 1-2 million IDs

Topology:

Large percentage of weakly integrated nodes Small number of stable nodes

Geographic distribution:

Majority of nodes in Asia (70%) and Europe (25%) regional differences in behaviors

AS: low number of (strongly) participating AS Number of irregular node classes Future research: Further investigate deviating network clients Overcome limitations of used crawling algorithm:

Obtain full routing tables: better interconnectivity graph Improve analysis and handling of response failures

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 27

slide-28
SLIDE 28

Questions

Questions?

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 28

slide-29
SLIDE 29

Localized clustering

Figure : Top 10 countries / cities and their key metrics

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 29

slide-30
SLIDE 30

Node OUT degree

Measured as: unique number of contacts in a nodes routing table.

Figure : Histogram of observed out degrees of nodes in crawls

Limited by request type (maximum 31) and request burst size!

M.Sievert (TU M¨ unchen) Analysis and modeling of the KAD P2P network 30