svm learning of ip address structure for latency
play

SVM Learning of IP Address Structure for Latency Prediction Rob - PowerPoint PPT Presentation

SVM Learning of IP Address Structure for Latency Prediction Rob Beverly, Karen Sollins and Arthur Berger {rbeverly,sollins,awberger}@csail.mit.edu SIGCOMM MineNet06 1 SVM Learning of IP Address Structure for Latency Prediction The case


  1. SVM Learning of IP Address Structure for Latency Prediction Rob Beverly, Karen Sollins and Arthur Berger {rbeverly,sollins,awberger}@csail.mit.edu SIGCOMM MineNet06 1

  2. SVM Learning of IP Address Structure for Latency Prediction • The case for Latency Prediction • The case for Machine Learning • Data and Methodology • Results • Going Forward SIGCOMM MineNet06 2

  3. The Case for Latency Prediction SIGCOMM MineNet06 3

  4. Latency Prediction (again?) • Significant prior work: – King [Gummandi 2002] – Vivaldi [Dabek 2004] – Meridian [Wong 2005] – Others… IDMaps, GNP, etc… • Prior Methods: – Active Queries – Synthetic Coordinate Systems – Landmarks • Our work seeks to provide an agent-centric (single-node) alternative SIGCOMM MineNet06 4

  5. Why Predict Latency? 1. Service Selection : balance load, optimize performance, P2P replication 2. User-directed Routing : e.g. IPv6 with per- provider logical interfaces 3. Resource Scheduling : Grid computing, etc. 4. Network Inference : Measure additional topological network properties SIGCOMM MineNet06 5

  6. An Agent-Centric Approach • Hypothesis : Two hosts within same sub network likely have consistent congestion and latency • Registry allocation policies give network structure – but fragmented and discontinuous • Formulate as a supervised learning problem • Given latencies to a set of (random) destinations as training: – predict_latency(unseen IP) – error = |predict(IP) – actual(IP)| SIGCOMM MineNet06 6

  7. The Case for Machine Learning SIGCOMM MineNet06 7

  8. Why Machine Learning? Internet-scale Networks: – Complex (high-dimensional) – Dynamic • Can accommodate and recover from infrequent errors in probabilistic world • Traffic provides large and continuous training base SIGCOMM MineNet06 8

  9. Candidate Tool: Support Vector Machine • Supervised learning (but amenable to online learning) • Separate training set into two classes in most general way • Main insight : find hyper-plane separator that maximizes the minimum margin between convex hulls of classes • Second insight : if data is not linearly separable, take to higher dimension • Result : generalizes well, fast, accommodate unknown data structure SIGCOMM MineNet06 9

  10. SVMs – Maximize Minimum Margin =positive examples =negative examples =support vector Most Simple Case: 2 Classes in 2 Dimensions Linearly Separable SIGCOMM MineNet06 10

  11. SVMs – Redefining the Margin =positive examples =negative examples The single new positive example redefines margin SIGCOMM MineNet06 11

  12. Non-SV Points don’t affect solution =positive examples =negative examples SIGCOMM MineNet06 12

  13. IP Latency Non-Linearity =200ms examples =10ms examples IP Bit 1 IP Bit 0 SIGCOMM MineNet06 13

  14. Higher Dimensions for Non-Linearity =200ms examples ?? ?? =10ms examples ?? 2 Classes in 2 Dimensions NOT Linearly Separable SIGCOMM MineNet06 14

  15. Kernel Function Φ =200ms examples =10ms examples 2 Classes in 3 Dimensions Linearly Separable SIGCOMM MineNet06 15

  16. Support Vector Regression • Same idea as classification • ε -insensitive loss function Φ ε Latency Latency First Octet SIGCOMM MineNet06 16

  17. Data and Methodology SIGCOMM MineNet06 17

  18. Data Set • 30,000 random hosts responding to ping • Average latency to each over 5 pings • Non-trivial distribution for learning SIGCOMM MineNet06 18

  19. Methodology IP Latency IP Latency # # Train Permute Order Data Set Test • Average 5 experiments: – Randomly permute data set – Split data set into training / test points SIGCOMM MineNet06 19

  20. Methodology IP Latency IP Latency # # Train Build SVM Permute Order Measure Test Performance • Average 5 experiments: – Training data defines SVM – Performance on (unseen) test points – Each bit of IP an input feature SIGCOMM MineNet06 20

  21. Results SIGCOMM MineNet06 21

  22. Results • Spoiler: So, does it work? • Yes, within 30% for more than 75% of predictions • Performance varies with selection of parameters (multi-optimization problem) – Training Size – Input Dimension – Kernel SIGCOMM MineNet06 22

  23. Training Size SIGCOMM MineNet06 23

  24. Question: Are MSB Better Predictors • Determine error versus number most significant bits of test input IPs SIGCOMM MineNet06 24

  25. Question: Are MSB Better Predictors • Determine error versus number most significant bits of test input IPs Powerful result: majority of discriminatory power in first 8 bits SIGCOMM MineNet06 25

  26. Feature Selection • Use feature selection to determine which individual bits of address contribute to discriminatory power of prediction θ i ← ← argmin j V(f( θ θ θ ,x j ),y) ∀ θ ∈ θ 1 ,…, θ i-1 ← ← ∀ ∀ ∀ x j ! ∈ ∈ ∈ SIGCOMM MineNet06 26

  27. Feature Selection Note x-axis: cumulative bit “features” SIGCOMM MineNet06 27

  28. Feature Selection Matches Intuition and Registry Allocations SIGCOMM MineNet06 28

  29. Performance • Given empirically optimal training size and input features • How well can agents predict latency to unknown destinations? SIGCOMM MineNet06 29

  30. Prediction Performance Ideal SIGCOMM MineNet06 30

  31. Prediction Performance Within 30% for >75% of Predictions SIGCOMM MineNet06 31

  32. Prediction Performance Within 30% for >75% of Predictions SIGCOMM MineNet06 32

  33. Going Forward SIGCOMM MineNet06 33

  34. Future Research • How agents select training data (random, BGP prefix, registry allocation, from TCP flows, etc) • How performance decays over time and how often to retrain • Online, continuous learning SIGCOMM MineNet06 34

  35. Summary - Questions? • Major Results: – An agent-centric approach to latency prediction – Validation of SVMs and Kernel Functions as a means to learn on the basis of Internet Addresses – Feature Selection analysis of IP address informational content in predicting latency – Latency estimation accuracy within 30% of true value for > 75% of data points SIGCOMM MineNet06 35

  36. Prediction Accuracy Only ~25% of predictions accurate to within 5% of actual value SIGCOMM MineNet06 36

  37. Prediction Accuracy But >85% of predictions accurate to within 50% of true value About ~45% of predictions accurate to within 10% of true value SIGCOMM MineNet06 37

  38. Prediction Accuracy Bottom line: suitable for coarse-grained latency prediction SIGCOMM MineNet06 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend