david choffnes
play

David Choffnes EECS, Northwestern U. - PowerPoint PPT Presentation

David Choffnes EECS, Northwestern U. http://aqualab.cs.northwestern.edu/projects/EdgeScope.html http://aqualab.cs.northwestern.edu Internet scale data BGP is useful, but has serious limitations PlanetLab offers more control, but skewed


  1. David Choffnes EECS, Northwestern U. http://aqualab.cs.northwestern.edu/projects/EdgeScope.html http://aqualab.cs.northwestern.edu

  2. Internet scale data – BGP is useful, but has serious limitations – PlanetLab offers more control, but skewed view – End system monitors need widespread adoption EdgeScope – Make our collection of edge traces available – What do we have? – When can you have it? 2 David Choffnes EdgeScope: Exposing the view from the Edge

  3. 547,000 231,000 35,000 1,096 3 David Choffnes EdgeScope: Exposing the view from the Edge

  4. Ono – Uses CDN redirections to inform peer biasing for BitTorrent – Installed more than 800,000 times NEWS – Uses passively gathered BT data to detect, confirm and isolate network events – More than 40,000 users Coverage – ~8k ASNs – 54k routable prefixes – 200 countries 4 David Choffnes EdgeScope: Exposing the view from the Edge

  5. Data types (see website for details) – Per-download stats • Transfer rates, file-size estimates, state – Per-connection stats • Transfer rates, cumulative data transferred, seed/leech – Global stats • Overall transfer rates, session times – … other interesting/necessary stuff • IP changes All of this is sampled every 30 seconds – Per-session data sampled every hour and at end Traceroutes/pings • Uses builtin command, we are playing with v6 traceroutes/pings • Limited to a maximum number of measurements in parallel 5 David Choffnes EdgeScope: Exposing the view from the Edge

  6. A platform for controlled experiments – Why? • Security implications! A topology measurement tool – But we do have loads of traceroute data An arbitrarily extensible data collection system – Everything we collect relates to Ono/NEWS performance – If it fits, we can add it fairly easily • Needs to go through a beta process (usually about a week) • Once mainlined, near full adoption within about 4 days 6 David Choffnes EdgeScope: Exposing the view from the Edge

  7. Started (proper) collection in December 2007 Daily stats (approximate) – 3 to 4 GB of compressed data – About 10 to 20 GB raw data – 2.5-3M traceroutes – 100-150M connection samples Per-Download Samples Per-connection Samples Traceroutes 160,000,000 7,000,000 250,000,000 140,000,000 6,000,000 200,000,000 120,000,000 5,000,000 100,000,000 150,000,000 4,000,000 80,000,000 3,000,000 100,000,000 60,000,000 2,000,000 40,000,000 50,000,000 1,000,000 20,000,000 0 0 0 7 David Choffnes

  8. NEWS SwarmScreen Network positioning (cool? cold? you decide.) Topology studies Fabian’s talk 8 David Choffnes EdgeScope: Exposing the view from the Edge

  9. Preliminaries – CAIDA-style agreement Anonymization – AS-level detail – Prefix-preserving – (Maybe) User ids, without location info If you need IPs, you have to work with us 9 David Choffnes EdgeScope: Exposing the view from the Edge

  10. Ono dataset – Now AS links (CoNEXT work) – Now Everything else – On-demand anonymization (takes time) – Hardware on order – Quarantine period (6-12 months) 10 David Choffnes EdgeScope: Exposing the view from the Edge

  11. Before you ask for data – Be sure you know what you want – Make sure you have space for it – Give us time to get it to you Working with data at this scale – Throwing hardware at the analysis doesn’t work • Good data structures do work – MapReduce isn’t always the best fit • Especially if you don’t have a giant cluster – Dynamic languages are a bad idea • Seriously, perl is not your friend here • Thank me later 11 David Choffnes EdgeScope: Exposing the view from the Edge

  12. Privacy is hard – No really, this is serious • Messing this up will ruin it for everyone – We invite new proposals/research in this area Scale – Cannot do this without good tech support, net ops cooperation, elbow grease – MTTFs in mirror(ed array) are closer than they appear Other fun stuff – Timestamp synchronization – UUIDs – Users can come and go at any time • Sessions and install/uninstall 12 David Choffnes EdgeScope: Exposing the view from the Edge

  13. http://aqualab.cs.northwestern.edu/projects/EdgeScope.html 13 David Choffnes EdgeScope: Exposing the view from the Edge

Recommend


More recommend