RPC Metrics at Google JBD, Google (@rakyll) gRPC Metrics at - - PowerPoint PPT Presentation

rpc metrics at google
SMART_READER_LITE
LIVE PREVIEW

RPC Metrics at Google JBD, Google (@rakyll) gRPC Metrics at - - PowerPoint PPT Presentation

RPC Metrics at Google JBD, Google (@rakyll) gRPC Metrics at Google JBD, Google (@rakyll) Request Metrics at Google JBD, Google (@rakyll) "100% is the wrong reliability target for basically everything." -- Benjamin Treynor


slide-1
SLIDE 1

RPC Metrics at Google

JBD, Google (@rakyll)

slide-2
SLIDE 2

gRPC Metrics at Google

JBD, Google (@rakyll)

slide-3
SLIDE 3

Request Metrics at Google

JBD, Google (@rakyll)

slide-4
SLIDE 4

@rakyll

"100% is the wrong reliability target for basically everything."

  • - Benjamin Treynor Sloss, VP of Engineering, Google
slide-5
SLIDE 5

@rakyll

"A service is available if users cannot tell that there was an outage."

slide-6
SLIDE 6

@rakyll

Principled way of saying what level of downtime is acceptable.

  • Error rate
  • Latency expectations

SLOs

slide-7
SLIDE 7

@rakyll

Analytics frontend server Authentication Reporting Users ... Spanner Blob Store

slide-8
SLIDE 8

@rakyll

Questions infra teams want to ask:

  • Are we meeting the SLO for the other team?
  • What’s the impact of a product on infra?
  • How much do we need to scale up if product grows 10%?
slide-9
SLIDE 9

@rakyll

High-Cardinality

Breaking down the metrics data...

slide-10
SLIDE 10

@rakyll

Query the collected data in various ways:

  • Latency distribution for RPCs originated at Google Analytics.
  • Requests take took more than 100ms for the customer #123.
  • Compare the request latency initiated at web vs mobile frontend.
slide-11
SLIDE 11

@rakyll

Analytics frontend server Authentication Reporting Users ... Spanner Blob Store

  • riginator=analytics;

...

slide-12
SLIDE 12

@rakyll

Blob store read errors by originator

slide-13
SLIDE 13

@rakyll

Dynamically choose aggregation

(split between recording and aggregation)

slide-14
SLIDE 14

@rakyll

Exemplars

slide-15
SLIDE 15

@rakyll

/rpz and /statz

slide-16
SLIDE 16

@rakyll

http://server:7777/debug/rpcz

slide-17
SLIDE 17

@rakyll

Export?

Monarch, Prometheus, and more.

slide-18
SLIDE 18

@rakyll

import “cloud.google.com/go/pubsub”

slide-19
SLIDE 19

@rakyll

+

slide-20
SLIDE 20

Thank you!

JBD, Google jbd@google.com @rakyll