diagnosing missing events in distributed systems with
play

Diagnosing Missing Events in Distributed Systems with Negative - PowerPoint PPT Presentation

Diagnosing Missing Events in Distributed Systems with Negative Provenance Yang Wu* Mingchen Zhao* Andreas Haeberlen* Wenchao Zhou + Boon Thau Loo* * University of Pennsylvania + Georgetown University 1 Motivation: Network debugging - Example:


  1. Diagnosing Missing Events in Distributed Systems with Negative Provenance Yang Wu* Mingchen Zhao* Andreas Haeberlen* Wenchao Zhou + Boon Thau Loo* * University of Pennsylvania + Georgetown University 1

  2. Motivation: Network debugging - Example: Software Defined Networks - SDN offers flexibility, but can have bugs - Need good debuggers! Why is the HTTP server getting DNS queries? SDN Controller DNS HTTP Query ¡ Request ¡ Internet Data Center Network HTTP Server 2

  3. Approach: Provenance - Existing tools: SNP (SOSP ‘11), NetSight (NSDI ‘14) - They produce “backtraces”, or provenance Why is the HTTP server DNS Query arrived getting DNS queries? at HTTP Server SDN Controller DNS Query Broken FlowEntry Program ¡ received at Switch existed at Switch Broken … … FlowEntry ¡ DNS DNS … ¡ Query ¡ Query ¡ Internet Data Center Network HTTP Server 3

  4. Challenge: Missing events - What if an expected event does not happen? - Cannot be handled by existing tools Why is the HTTP server - No starting point for a backtrace NOT getting requests? SDN Controller ??? ¡ Internet Data Center Network HTTP Server 4

  5. Survey: How common are missing events? - Missing events are consistently in the majority - Email threads for missing events are longer Missing events Positive events NANOG-user floodlight-dev Outages 17% 26% 52% 48% 74% 83% Outages ¡ NANOG-user ¡ Floodlight-dev ¡ 5

  6. Approach: Counter-factual reasoning Find all the ways a missing event could have occurred, and show why each of them did not happen. Why did Bob NOT arrive at SIGCOMM? Philadelphia Chicago 6

  7. Result: Debugger for missing events No HTTP Request arrived Why is the HTTP server at HTTP Server NOT getting requests? No Forwarding-FlowEntry installed at Switch Controller HTTP Request Dropping-FlowEntry Program ¡ received at Switch existed at Switch … … Dropping- FlowEntry ¡ HTTP ??? ¡ … ¡ ??? ¡ Request ¡ Internet Data Center Network HTTP Server 7

  8. Challenge: Too many possible explanations! Why did Bob NOT arrive at SIGCOMM? When an event happens, there is one reason. When an event does not happen, there can be many reasons. 8

  9. WHY NOT ? Goal: Diagnose missing events Overview Approach: Counter-factual reasoning Challenge: Too many explanations Background: Provenance function QUERY ( EXIST ( [ t 1 , t 2 ] ,N, τ )) function QUERY ( RECEIVE (t, N 1 ← N 2 , + τ )) function QUERY ( NAPPEAR ([ t 1 , t 2 ],N, τ )) t s ← max t 0 <t : (+ τ , N 2 , t 0 ,r,1) ∈ Log S ← ∅ if BaseTuple( τ ) then for each ( + τ ,N,t,r,c) ∈ Log: t 1 ≤ t ≤ t 2 RETURN { SEND ( t s , N 1 → N 2 , + τ ), RETURN { NINSERT ([ t 1 , t 2 ],N, τ ) } S ← S ∪ { APPEAR (t,N, τ ,r,c) } DELAY ( t s , N 2 → N 1 , + τ , t − t s ) } else if LocalTuple(N, τ ) then for each ( − τ ,N,t,r,c) ∈ Log: t 1 ≤ t ≤ t 2 function QUERY ( SEND (t, N → N 0 , + τ )) RETURN S r 2 Rules(N) : Head ( r )= τ S ← S ∪ { DISAPPEAR (t,N, τ ,r,c) } FIND (+ τ ,N,t,r,c) ∈ Log { NDERIVE ([ t 1 , t 2 ],N, τ ,r) } RETURN S RETURN { APPEAR (t, N , τ ,r) } function QUERY ( APPEAR (t,N, τ ,r,c)) else RETURN { NRECEIVE ([ t 1 , t 2 ],N, + τ ) } function QUERY ( NEXIST ([ t 1 , t 2 ],N, τ )) Approach if BaseTuple( τ ) then function QUERY ( NRECEIVE ([ t 1 , t 2 ],N, + τ )) Generating Negative Provenance if ∃ t < t 1 : (- τ ,N,t,r,1) ∈ Log then S ← ∅ , t 0 ← t 1 − ∆ max RETURN { INSERT (t,N, τ ) } t x ← max t<t 1 : (- τ ,N,t,r,1) ∈ Log for each N 0 ∈ SENDERS ( τ ,N): else if LocalTuple(N, τ ) then RETURN { DISAPPEAR ( t x ,N, τ ), RETURN { DERIVE (t,N, τ ,r) } X ← { t 0 ≤ t ≤ t 2 | (+ τ , N 0 ,t,r,1) ∈ Log } NAPPEAR (( t x , t 2 ],N, τ ) } else RETURN { RECEIVE (t, N ← r.N , τ ) } t x ← t 0 else RETURN { NAPPEAR ([0, t 2 ],N, τ ) } for (i=0; i < | X | ; i++) function QUERY ( INSERT (t,N, τ )) function QUERY ( NDERIVE ([ t 1 , t 2 ],N, τ ,r)) S ← S ∪ { NSEND (( t x , X i ), N 0 , + τ ), RETURN ∅ S ← ∅ NARRIVE (( t 1 , t 2 ), N 0 → N , X i , + τ ) } function QUERY ( DERIVE (t,N, τ , τ :- τ 1 , τ 2 ...)) for ( τ i , I i ) ∈ PARTITION ([ t 1 , t 2 ],N, τ ,r) t x ← X i S ← ∅ S ← S ∪ { NEXIST ( I i ,N, τ i ) } S ← S ∪ { NSEND ([ t x , t 2 ], N 0 , + τ ) } for each τ i : if (+ τ i ,N,t,r,c) ∈ Log: RETURN S S ← S ∪ { APPEAR (t,N, τ i ,c) } RETURN S function QUERY ( NSEND ([ t 1 , t 2 ],N, + τ )) function Q ( NARRIVE ([ t 1 , t 2 ], N 1 → N 2 , t 0 , + else τ )) t x ← max t 0 <t : (+ τ , N , t 0 ,r,1) ∈ Log if ∃ t 1 <t<t 2 : (- τ ,N,t,r,1) ∈ Log then FIND ( + τ , N 2 , t 3 ,( N 1 , t 0 ),1) ∈ Log RETURN { EXIST ([ t 1 , t ],N, τ ), RETURN { SEND ( t 0 , N 1 → N 2 , + τ ), S ← S ∪ { EXIST ([ t x ,t],N, τ i ,c) } NAPPEAR (( t , t 2 ],N, τ ) } RETURN S DELAY ( t 0 , N 1 → N 2 , + τ , t 3 − t 0 ) } else RETURN { NAPPEAR ([ t 1 , t 2 ],N, τ ) } Figure 3: Graph construction algorithm. Some rules have been omitted; for instance, the handling of + τ and − τ messages is Improving readability Y! System R-tree indexing Experiments Query speed Evaluation Size reduction Usability 9

  10. Background: Provenance - Captures causality between events - Example: SNP (SOSP ’11) Event Causal DNS Query arrived relationship at HTTP Server network datalog (NDLOG) ¡ DNS Query Broken FlowEntry PacketSent :- PacketReceived , FlowEntry. received at Switch existed at Switch … … Provenance graph 10

  11. Background: How to generate provenance? Step 3: Provenance graph is generated Step 2: Issue query when relevant event occurs Step 1: Collect events from distributed system PacketSent :- PacketReceived, FlowEntry . PacketSent :- PacketOut . PacketSent during [t4,t5] FlowEntry during [t4,t5] PacketReceived during [t4,t5] ??? PacketReceived ¡ FlowEntry ¡ PacketOut ¡ PacketSent ¡ time ¡ now ¡ t4 ¡ t5 ¡ 11

  12. WHY NOT ? Goal: Diagnose missing events Overview Approach: Counter-factual reasoning Challenge: Too many explanations Background: Provenance function QUERY ( EXIST ( [ t 1 , t 2 ] ,N, τ )) function QUERY ( RECEIVE (t, N 1 ← N 2 , + τ )) function QUERY ( NAPPEAR ([ t 1 , t 2 ],N, τ )) t s ← max t 0 <t : (+ τ , N 2 , t 0 ,r,1) ∈ Log S ← ∅ if BaseTuple( τ ) then for each ( + τ ,N,t,r,c) ∈ Log: t 1 ≤ t ≤ t 2 RETURN { SEND ( t s , N 1 → N 2 , + τ ), RETURN { NINSERT ([ t 1 , t 2 ],N, τ ) } S ← S ∪ { APPEAR (t,N, τ ,r,c) } DELAY ( t s , N 2 → N 1 , + τ , t − t s ) } else if LocalTuple(N, τ ) then for each ( − τ ,N,t,r,c) ∈ Log: t 1 ≤ t ≤ t 2 function QUERY ( SEND (t, N → N 0 , + τ )) RETURN S r 2 Rules(N) : Head ( r )= τ S ← S ∪ { DISAPPEAR (t,N, τ ,r,c) } FIND (+ τ ,N,t,r,c) ∈ Log { NDERIVE ([ t 1 , t 2 ],N, τ ,r) } RETURN S RETURN { APPEAR (t, N , τ ,r) } function QUERY ( APPEAR (t,N, τ ,r,c)) else RETURN { NRECEIVE ([ t 1 , t 2 ],N, + τ ) } function QUERY ( NEXIST ([ t 1 , t 2 ],N, τ )) Approach if BaseTuple( τ ) then function QUERY ( NRECEIVE ([ t 1 , t 2 ],N, + τ )) Generating Negative Provenance if ∃ t < t 1 : (- τ ,N,t,r,1) ∈ Log then S ← ∅ , t 0 ← t 1 − ∆ max RETURN { INSERT (t,N, τ ) } t x ← max t<t 1 : (- τ ,N,t,r,1) ∈ Log for each N 0 ∈ SENDERS ( τ ,N): else if LocalTuple(N, τ ) then RETURN { DISAPPEAR ( t x ,N, τ ), RETURN { DERIVE (t,N, τ ,r) } X ← { t 0 ≤ t ≤ t 2 | (+ τ , N 0 ,t,r,1) ∈ Log } NAPPEAR (( t x , t 2 ],N, τ ) } else RETURN { RECEIVE (t, N ← r.N , τ ) } t x ← t 0 else RETURN { NAPPEAR ([0, t 2 ],N, τ ) } for (i=0; i < | X | ; i++) function QUERY ( INSERT (t,N, τ )) function QUERY ( NDERIVE ([ t 1 , t 2 ],N, τ ,r)) S ← S ∪ { NSEND (( t x , X i ), N 0 , + τ ), RETURN ∅ S ← ∅ NARRIVE (( t 1 , t 2 ), N 0 → N , X i , + τ ) } function QUERY ( DERIVE (t,N, τ , τ :- τ 1 , τ 2 ...)) for ( τ i , I i ) ∈ PARTITION ([ t 1 , t 2 ],N, τ ,r) t x ← X i S ← ∅ S ← S ∪ { NEXIST ( I i ,N, τ i ) } S ← S ∪ { NSEND ([ t x , t 2 ], N 0 , + τ ) } for each τ i : if (+ τ i ,N,t,r,c) ∈ Log: RETURN S S ← S ∪ { APPEAR (t,N, τ i ,c) } RETURN S function QUERY ( NSEND ([ t 1 , t 2 ],N, + τ )) function Q ( NARRIVE ([ t 1 , t 2 ], N 1 → N 2 , t 0 , + else τ )) t x ← max t 0 <t : (+ τ , N , t 0 ,r,1) ∈ Log if ∃ t 1 <t<t 2 : (- τ ,N,t,r,1) ∈ Log then FIND ( + τ , N 2 , t 3 ,( N 1 , t 0 ),1) ∈ Log RETURN { EXIST ([ t 1 , t ],N, τ ), RETURN { SEND ( t 0 , N 1 → N 2 , + τ ), S ← S ∪ { EXIST ([ t x ,t],N, τ i ,c) } NAPPEAR (( t , t 2 ],N, τ ) } RETURN S DELAY ( t 0 , N 1 → N 2 , + τ , t 3 − t 0 ) } else RETURN { NAPPEAR ([ t 1 , t 2 ],N, τ ) } Figure 3: Graph construction algorithm. Some rules have been omitted; for instance, the handling of + τ and − τ messages is Improving readability Y! System R-tree indexing Experiments Query speed Evaluation Size reduction Usability 12

  13. Generating negative provenance graphs - Goal: Explain why something does not exist - Use missing preconditions to explain missing events No PacketSent during [t1,now] ??? PacketSent :- PacketReceived , FlowEntry . PacketSent ¡ PacketReceived ¡ FlowEntry ¡ time ¡ now ¡ t1 ¡ t2 ¡ t3 ¡ t4 ¡ t5 ¡ 13

  14. Generating negative provenance graphs - Explanation can be unnecessarily complex No PacketSent during [t1,now] No PacketReceived No PacketReceived during [t1,t2] during [t5,now] No FlowEntry No FlowEntry No PacketReceived during [t2,t3] during [t4,t5] during [t3,t4] PacketSent ¡ PacketReceived ¡ FlowEntry ¡ time ¡ now ¡ t1 ¡ t2 ¡ t5 ¡ t3 ¡ t4 ¡ 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend