usage in the visible internet
play

Usage in the Visible Internet Xue Cai and John Heidemann - PowerPoint PPT Presentation

Understanding Block-level Address Usage in the Visible Internet Xue Cai and John Heidemann USC/Information Sciences Institute Aug. 31, 2010, SIGCOMM10 xuecai@isi.edu 1 The Discovery of Halley's Comet xuecai@isi.edu 2 The Discovery of


  1. Understanding Block-level Address Usage in the Visible Internet Xue Cai and John Heidemann USC/Information Sciences Institute Aug. 31, 2010, SIGCOMM’10 xuecai@isi.edu 1

  2. The Discovery of Halley's Comet xuecai@isi.edu 2

  3. The Discovery of Halley's Comet “It’s the same 2 historical records object which (year 1531, 1607) returns to earth 1 observation every 76 years. ” (year 1682) Edmond Halley 1 simple 3 simple characteristic an astronomer observations of the comet SIMPLE observations inferred SIMPLE conclusion can have TREMENDOUS value. xuecai@isi.edu 3

  4. pings responses Internet Our Q: what can simple Address Utilization? Dynamic Addressing? observations about the …… Internet say? xuecai@isi.edu 4

  5. Key Contributions Methodology positive non- negative time response - Active probing, pattern analysis, clustering, classification Application - Network management, resource allocation, Internet trend study Validation - USC’s network, the general Internet, consistency across time xuecai@isi.edu 5

  6. Key Contributions Methodology Application Validation Group addresses More frequent probing? USC’s network, Spatial into blocks by Block sizes? General Internet, Correlation? usage Block-level usage? Consistency Find blocks with Resource reallocation? USC’s network, Address General Internet, less than 10% Efficient management? Utilization? Consistency time responsive Blocks switching Botnet detection? USC’s network, Dynamic state (up/down) Spam filtering? General Internet, Addressing? frequently Click fraud? Consistency Utilize standard Auto content serving? USC’s network, Low-bitrate deviation of RTTs Network management? General Internet Identification? xuecai@isi.edu 6

  7. Key Contributions Methodology Application Validation Group addresses More frequent probing? USC’s network, Spatial into blocks by Block sizes? General Internet, Correlation? usage Block-level usage? Consistency Address Utilization? See See See paper paper paper Dynamic Addressing? Utilize standard Auto content serving? USC’s network, Low-bitrate deviation of RTTs Network management? General Internet Identification? xuecai@isi.edu 7

  8. Related Work • J. Heidemann, Y. Pradkin, R. Govindan, C. Papadopoulos, G. Bartlett, and J. Bannister. Census and Survey of the Visible Internet. In Proceedings of the ACM Internet Measurement Conference (IMC) , p. 169-182. Vouliagmeni, Greece, October, 2008. • What’s the same? – Collection methodology (and datasets) – Error bounds on ping census accuracy: undercounts by about 40% – Preliminary metrics • What’s new? deeper understanding; new interpretation • new metrics – block-level analysis, not just addresses – RTT, not just responsivness • new algorithms – block identification – low-bitrate identification • new conclusions – evaluation of block utilization – trends of address utilization – trends of dynamic addressing xuecai@isi.edu 8

  9. Key Contributions Methodology Application Validation Group addresses More frequent probing? USC’s network, Spatial into blocks by Block sizes? General Internet, Correlation? usage Block-level usage? Consistency Address Utilization? See See See paper paper paper Dynamic Addressing? Utilize standard Auto content serving? USC’s network, Low-bitrate deviation of RTTs Network management? General Internet Identification? xuecai@isi.edu 9

  10. Background: What space? • IPv4 address space • address block : p/n: addresses with common n -bit prefix p • a.b.c.d and a.b.c .( d+ 1) are adjacent addresses A /24 block ( p/24 ) with 256 addresses, Layout Hilbert Curve keeps adjacent addresses physically near each other. Hilbert Curve xuecai@isi.edu 10

  11. Hypothesis: Spatial Correlation • What is Spatial Correlation? – adjacent addresses are likely to be used in the same way  spatial correlation of address blocks  usage blocks • Usage blocks – are NOT allocated blocks , but correlated • Internet addresses are allocated in blocks (ICANN to regional registries to ISPs to you) • addresses in one block are usually assigned to similar users – are what we want to observe if exist • observable blocks  usage blocks xuecai@isi.edu 11

  12. Spatial Correlation: Application • Why care? – Efficiently select representative addresses to conduct more detailed study • Addresses in one block are used in the same way • So only need few representatives to probe in the future xuecai@isi.edu 12

  13. Spatial Correlation: Methodology Input : data for Data Collection individual addresses Output : address sharing Representation similar usage grouped into observable blocks Block Identification xuecai@isi.edu 13

  14. Spatial Correlation: Data Collection How How ? Ping each address in random /24 blocks every 11 minutes for a week and collect the probe responses . 1% of the allocated IPv4 address space probed. addresses positive non- negative time response Data Collection Why Why ? Systematic pings reveal more information. Representation time Validity of ping : IMC’08 paper established error bounds: not perfect, but often pretty good; ~40% undercount Block Identification xuecai@isi.edu 14

  15. Spatial Correlation: Data Collection positive positive 1 address non- non- negative negative time time response response 1 /24 block (256 consecutive addresses) address time Data Collection 24,000 random /24s Representation time Block Identification xuecai@isi.edu 15

  16. Spatial Correlation: Representation Why One survey: > 5 billion ping responses, need more meaningful representation to represent address usage Data Collection 24,000 random /24s Representation Block Identification xuecai@isi.edu 16

  17. Spatial Correlation: Representation given series of ping responses over time positive non- negative time response each represents period to next probe Data Collection a series of up durations Representation Block Identification xuecai@isi.edu 17

  18. Spatial Correlation: Representation probing duration length: 10 1 st duration 2 nd duration 3 rd duration length: 1 length: 2 length: 2 How 3 metrics to capture address usage Availability (A ) Volatility (V) Median-Up (U) := normalized sum := normalized # of up := median up Data Collection of up durations durations duration Example : Example : Example : Representation = (2+2+1) / 10 = 0.5 = 3 / (10/2) = 0.6 = median(2,2,1) = 2 Intuition : Intuition : Intuition : Block Identification utilization efficiency high V infers dynamics typical duration xuecai@isi.edu 18

  19. Spatial Correlation: Block Identification 1D address time positive 2D negative & non-response Data Collection 1D 2D Representation low high Availability(A) Volatility(V) low high Hilbert Block Identification Curve White: Non-response xuecai@isi.edu 19

  20. Spatial Correlation: Block Identification intra-block variance + intra-block variance Data Collection Idea : examine each Representation block size, if block is homogeneous, stop else split and Block How recurse Identification xuecai@isi.edu 20

  21. Spatial Correlation: Block Identification not homogeneous => split homogeneous => stop not homogeneous => split Data Collection Idea : examine each Representation block size, if block is homogeneous, stop else split and Block How recurse Identification xuecai@isi.edu 21

  22. Spatial Correlation: Validation • Validation is hard – Where to find ground truth? • decentralized management • usage block ground truth? • Use three complementary ways: – Compare to USC’s network ( operator provided truth ) – Compare to general Internet ( hostname inferred truth ) – Evaluate different samples and dates • is 1% of the Internet enough? yes! • trends change some over time • details: paper section 5.3 xuecai@isi.edu 22

  23. Spatial Correlation: USC’s Network • Why Why – quite solid truth (operator provided) – knowledge of both allocated blocks and usage blocks • How How – compare observable blocks (result to validate) with usage blocks (ground truth) xuecai@isi.edu 23

  24. Spatial Correlation: USC’s Network ground truth usage blocks approach is mostly false-neg. : incomplete non-use blocks we (23%) missed to sometimes identify error (20%) but what is found is correct false-pos. : blocks we wrongly very accurate when it reaches a conclusion identified xuecai@isi.edu 24

  25. Spatial Correlation: General Internet • Why Why – unbiased truth (randomly selected) • How How – Infer usage blocks from hostnames • dhcp-host-xxx.example.net – compare observable blocks (result to validate) with usage blocks (ground truth) xuecai@isi.edu 25

  26. Spatial Correlation: General Internet mostly correct (and more than USC) ground truth is hard to infer methodology more complete when evaluate with unbiased sample xuecai@isi.edu 26

  27. Key Contributions Methodology Application Validation Group addresses More frequent probing? USC’s network, Spatial into blocks by Block sizes? General Internet, Correlation? usage Block-level usage? Consistency Address Utilization? See See See paper paper paper Dynamic Addressing? Utilize standard Auto content serving? USC’s network, Low-bitrate deviation of RTTs Network management? General Internet Identification? xuecai@isi.edu 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend