Principled work fm ow-centric tracing of distributed systems Raja - - PowerPoint PPT Presentation

principled work fm ow centric tracing of distributed
SMART_READER_LITE
LIVE PREVIEW

Principled work fm ow-centric tracing of distributed systems Raja - - PowerPoint PPT Presentation

Principled work fm ow-centric tracing of distributed systems Raja Sambasivan Ilari Shafer, Jonathan Mace, Ben Sigelman, Rodrigo Fonseca, Greg Ganger Todays distributed systems E.g.,Twitter Twitter death star:


slide-1
SLIDE 1

Principled workfmow-centric tracing

  • f distributed systems

Raja Sambasivan

Ilari Shafer, Jonathan Mace, Ben Sigelman, Rodrigo Fonseca, Greg Ganger

slide-2
SLIDE 2

Today’s distributed systems

2

E.g.,Twitter

Twitter “death star”: https://twitter.com/adrianco/status/441883572618948608

slide-3
SLIDE 3

Today’s distributed systems

3

E.g.,Twitter E.g., Netfmix

Machine-centric tools insuffjcient {

GDB, gprof, strace, linux perf. counters

Amazingly complex

Netfmix “death star”: http://www.slideshare.net/adriancockcroft/fast-delivery-devops-israel

slide-4
SLIDE 4

Workfmow-centric tracing

4

Provides the needed coherent view

App Server Distributed FS Table store

Client Server

Trace point (e.g., at functions)

Get

27 ms 25 ms 17 µs

! ! ! ! ! ! ! Metadata (e.g., IDs)

slide-5
SLIDE 5

It is useful / being adopted

5

\

Category Management task Resource mgmt. Attribution Performance tuning Diagnosis ID anomalous workfmows ID workfmows w/ steady-state problems Profjling Multiple Dynamic monitoring

\

Stardust [SIGM’06] Stardust✚ [NSDI’11] X-Trace [NSDI’07] X-Trace✚ [WREN’10] Pip [NSDI’06] Pinpoint [NSDI’04] Mace [PLDI’07] PivotTrace [SOSP’15] Retro [NSDI’15]

\

Dapper [TR10-14] HTrace Zipkin UberTrace

But, no clarity for tracing developers

slide-6
SLIDE 6

6

But, no clarity for tracing developers

Reality

Stardust Stardust✚ Spectroscope

Expectation

Stardust Stardust Spectroscope

slide-7
SLIDE 7

We provide clarity for tracing developers

7

Methodology:

Use experiences to distill design axes ID design choices best for difgerent tasks Compare to existing infrastructures Task B ?

}

Task A Task C Task D

Tracing infrastructure

Choices: 1 2

3 4 5 6

slide-8
SLIDE 8

Key results

8

1

Difgerent design decisions needed for diagnosis and resource management

3

Existing tracing infrastructures suited to a task make similar choices to our suggestions

2

Batching causes some design decisions across some axes to interact poorly

slide-9
SLIDE 9

Tracing infrastructure App Server Table store File system

Anatomy & design axes

9

Management tasks

In-band /

  • ut-of-band?

How will trace points be added? What to use to reduce ovhd? Sample? How to defjne a request? Conc./Sync. needed? Causal relationships? Inter-request needed?

!

I n

  • b

a n d

Trace construction

O u t

  • f
  • b

a n d !

Trace storage

slide-10
SLIDE 10

How original Stardust defjned requests

10

WRITE START CACHE WRITE

10 µs 20 ms

WRITE REPLY

2 µs

INSERT BLOCK }

Unaccounted latency

Response time: ~20 ms

Trace not useful for diagnosis tasks

slide-11
SLIDE 11

Two valid ways to defjne a request’s workfmow

11

WRITE START CACHE WRITE WRITE REPLY INSERT BLOCK 10 µs 9 µs 15 µs WRITE START CACHE WRITE 10 µs ~20 ms WRITE REPLY 2 µs INSERT BLOCK 9 µs 20,000 µs EVICT BLOCK DISK START DISK END 5 µs

Resource management: Assign latent work to original submitter

Latent work

slide-12
SLIDE 12

Two valid ways to defjne a request’s workfmow

12

WRITE START CACHE WRITE WRITE REPLY INSERT BLOCK 10 µs 9 µs 15 µs WRITE CACHE WRITE 10 µs WRITE REPLY 2 µs INSERT BLOCK 5µs 5µs 20,000 µs EVICT BLOCK DISK START DISK END 5 µs

Diagnosis: Assign latent work to request on whose critical path it is executed

Latent work

slide-13
SLIDE 13

Future research directions

13

Exploring new analyses Reducing diffjculty of adding trace points Lowering overhead when identifying anomalous workfmows

slide-14
SLIDE 14

Summary

14

Key design choices dictate workfmow-centric utility for difgerent tasks We identify choices best suited for difgerent tasks