wap5
play

WAP5 Black-box Performance Debugging for Wide-Area Distributed - PowerPoint PPT Presentation

WAP5 Black-box Performance Debugging for Wide-Area Distributed Systems Patrick Reynolds reynolds@cs.duke.edu With: Janet Wiener Marcos Aguilera Jeffrey Mogul Amin Vahdat http://www.hpl.hp.com/research/project5/ Motivation Discover


  1. WAP5 Black-box Performance Debugging for Wide-Area Distributed Systems Patrick Reynolds reynolds@cs.duke.edu With: Janet Wiener Marcos Aguilera Jeffrey Mogul Amin Vahdat http://www.hpl.hp.com/research/project5/

  2. Motivation • Discover structure and Client performance problems in large, wide-area systems • Infer paths through nodes Web proxy – One path per client request – Discover timing at each step Local Origin DNS • Focus attention on nodes DHT node web server that are problematic – First step in performance debugging Remote DHT node WAP5 - WWW'06 page 2

  3. Coral example • Causal path: a sequence of related messages and processing, annotated with timing/delays Client 500ms Proxy Proxy Proxy Proxy Proxy Origin 250ms DNS server Origin • Second-level hit (4 messages) server • Second-level miss (6 messages) • Also: DHT lookups WAP5 - WWW'06 page 3

  4. Goals • Find bugs in wide-area applications – Performance bugs: too much or too little time at any point – Structure bugs: incorrect ordering or placement of processing or communication • Expose causal paths – Structure discovery – Measure latency for processing and communication – Unexpected structure or timing • Indicates possible bugs • Black-box approach – Do not require source code access – Allow heterogeneity WAP5 - WWW'06 page 4

  5. Three target audiences • Primary programmer – Debugging or optimizing his/her own system • Secondary programmer – Inheriting a project or joining a programming team – Discovery: learning how the system behaves • Operator – Monitoring a running system for unexpected behavior – Performing regression tests after a change WAP5 - WWW'06 page 5

  6. Contributions • New causality analysis algorithm Trace capture • Full tool chain Packet or socket traces – Trace capture library – Causal path analysis Reconciliation – Visualization Message • Results with two PlanetLab CDNs traces – Coral and CoDeeN Causal analysis Causal paths and timing Visualization WAP5 - WWW'06 page 6

  7. Outline • Introduction • Naming • Trace capture • Reconciliation • Causality analysis – Message linking algorithm • Results with CoDeeN & Coral WAP5 - WWW'06 page 7

  8. Naming • Message is single read/write system call – May be many TCP or UDP packets • Node can be process or host • Endpoint can be socket path or <IP address, port> Web proxy 1025 1207 Client Web server pid=2297 8080 80 DHT /tmp/corald… pid=2312 DHT node Host = foo.cs.duke.edu WAP5 - WWW'06 page 8

  9. Naming • Node names are causal names – Message into a process/host can cause messages out • Endpoint names guide aggregation – Calls to foo:8080 are different from calls to foo:53 – Client hosts and ports can be ignored Web proxy 1025 1207 Client Web server pid=2297 8080 80 DHT /tmp/corald… pid=2312 DHT node Host = foo.cs.duke.edu WAP5 - WWW'06 page 9

  10. Outline • Introduction • Naming • Trace capture • Reconciliation • Causality analysis – Message linking algorithm • Results with CoDeeN & Coral WAP5 - WWW'06 page 10

  11. Trace capture • Capture events using host/net sniffing or library interposition – All three choices: no modifications to applications – On PlanetLab: sniffing on host only, limited flexibility • We capture events using library interposition – Captures all calls that create, modify, or use a socket Library Host Network interposition sniffing sniffing program program libc libc kernel kernel WAP5 - WWW'06 page 11

  12. Outline • Introduction • Naming • Trace capture • Reconciliation • Causality analysis – Message linking algorithm • Results with CoDeeN & Coral WAP5 - WWW'06 page 12

  13. Reconciliation: Convert socket calls to logical messages • Assign endpoint names to each call bi nd( f d=6, addr ={ 15. 1. 2. 3: 33250} ) pid=5040 client connect ( f d=6, addr ={ 16. 5. 6. 7: 80} ) send( f d=6, l en=10, t i m e=0. 592) r ecv( f d=6, l en=12, t i m e=2. 033) bi nd( f d=4, addr ={ 16. 5. 6. 7: 80} ) pid=8712 server accept ( l f d=4, addr ={ 15. 1. 2. 3: 33250} ) = 5 r ecv( f d=5, l en=10, t i m e=0. 852) send( f d=5, l en=12, t i m e=1. 705) cl i ent / 5040 ser ver / 8712 0. 592 0. 852 ser ver / 8712 cl i ent / 5040 1. 705 2. 033 WAP5 - WWW'06 page 13

  14. Reconciliation: Convert socket calls to logical messages • Combine send and recv events for each message – Detect dropped or reordered UDP packets – Detect differing message (buffer) boundaries bi nd( f d=6, addr ={ 15. 1. 2. 3: 33250} ) pid=5040 client connect ( f d=6, addr ={ 16. 5. 6. 7: 80} ) send( f d=6, l en=10, t i m e=0. 592) r ecv( f d=6, l en=12, t i m e=2. 033) bi nd( f d=4, addr ={ 16. 5. 6. 7: 80} ) pid=8712 server accept ( l f d=4, addr ={ 15. 1. 2. 3: 33250} ) = 5 r ecv( f d=5, l en=10, t i m e=0. 852) send( f d=5, l en=12, t i m e=1. 705) cl i ent / 5040 ser ver / 8712 0. 592 0. 852 ser ver / 8712 cl i ent / 5040 1. 705 2. 033 WAP5 - WWW'06 page 14

  15. Reconciliation: Convert socket calls to logical messages • Assign node (process) names to each message bi nd( f d=6, addr ={ 15. 1. 2. 3: 33250} ) pid=5040 client connect ( f d=6, addr ={ 16. 5. 6. 7: 80} ) send( f d=6, l en=10, t i m e=0. 592) r ecv( f d=6, l en=12, t i m e=2. 033) bi nd( f d=4, addr ={ 16. 5. 6. 7: 80} ) pid=8712 server accept ( l f d=4, addr ={ 15. 1. 2. 3: 33250} ) = 5 r ecv( f d=5, l en=10, t i m e=0. 852) send( f d=5, l en=12, t i m e=1. 705) cl i ent / 5040 ser ver / 8712 0. 592 0. 852 ser ver / 8712 cl i ent / 5040 1. 705 2. 033 WAP5 - WWW'06 page 15

  16. Outline • Introduction • Naming • Trace capture • Reconciliation • Causality analysis – Message linking algorithm • Results with CoDeeN & Coral WAP5 - WWW'06 page 16

  17. Causal path analysis • Which call to B caused outgoing calls? – Could be spontaneous action – May be ambiguous • Make good guesses • Use statistics over whole trace • Try multiple possibilities • Build paths by combining calls WAP5 - WWW'06 page 17

  18. Message linking algorithm Message traces Estimate average causal delays Score possible parents for each message Link-probability trees Build and aggregate paths Causal-path patterns WAP5 - WWW'06 page 18

  19. Estimate average causal delay • Look at all messages into B, plus all B � C messages – Take smallest delay before each B � C message – Trace-specific upper limit • D B � C = average of these delays – Might underestimate D • Scaling factor λ B � C = 1/D B � C • Create exponential distribution – f(t) = λ e – λ t Smallest delay for B � C WAP5 - WWW'06 page 19

  20. Find and weight possible parent messages • Use f(t) to find weight of link from each parent WAP5 - WWW'06 page 20

  21. Find and weight possible parent messages • Normalize so sum of weights to each child = 1 • Possible-parent trees – Spontaneous action has small probability, not shown – Links to B � D are slightly less likely Z � B Y � B X � B Z � B Y � B X � B 0.64 0.24 0.09 0.61 0.22 0.08 B � C B � D WAP5 - WWW'06 page 21

  22. Build causality trees • Invert to get possible-child trees Z � B Z � B Y � B Y � B X � B X � B Z � B Z � B Y � B Y � B X � B X � B 0.64 0.64 0.24 0.24 0.09 0.09 0.61 0.61 0.22 0.22 0.08 0.08 B � C B � C B � C B � C B � D B � D B � D B � D Z � B Y � B X � B 0.64 0.61 0.24 0.22 0.09 0.08 B � C B � D B � C B � D B � C B � D WAP5 - WWW'06 page 22

  23. Build causality trees • Build trees from individual links – Use probability to decide whether or not to keep child – Some links are “try-both” and generate 2 trees • Tree probability is product of link probabilities p = 0.8 * 0.9 * (1-0.2) * (1-0.1) * (1-0.48) ≈ 0.270 A A A � B 0.8 0.2 0.1 0.48 B B B � C B � D B � E B � F C C F 0.9 G G C � G p=0.270 p=0.249 WAP5 - WWW'06 page 23

  24. Build causality trees • Aggregate trees with identical structure – Combine client names and ports for better aggregation • Total probabilities for each pattern � ranking – Expected number of instances – Highlights paths that appear many times with high confidence WAP5 - WWW'06 page 24

  25. Outline • Introduction • Naming • Trace capture • Reconciliation • Causality analysis – Message linking algorithm • Results with CoDeeN & Coral WAP5 - WWW'06 page 25

  26. Results: Timeline vs. call tree • Coral miss path with DNS lookup Coral processing Origin server Response WAP5 - WWW'06 page 26

  27. Results: Two CoDeeN miss paths • Different mean delays at proxies – 0.20 to 4.86 ms in different proxies • Different delays at origin web servers • All clients aggregated together WAP5 - WWW'06 page 27

  28. Results: Coral DHT lookup • Three-level DHT lookups 3 calls in parallel WAP5 - WWW'06 page 28

  29. Conclusions • WAP5 exposes structure and timing of wide-area applications – Particularly PlanetLab applications • Successful analysis of CoDeeN and Coral traces – We found paths that match authors’ descriptions of systems – We characterized delays at each step and found outliers http://www.hpl.hp.com/research/project5/ WAP5 - WWW'06 page 29

  30. Extra slides

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend