Culture and the Games People Play Roy Rapoport rsr@netflix.com - - PowerPoint PPT Presentation

culture and the games people play
SMART_READER_LITE
LIVE PREVIEW

Culture and the Games People Play Roy Rapoport rsr@netflix.com - - PowerPoint PPT Presentation

Culture and the Games People Play Roy Rapoport rsr@netflix.com @royrapoport November 18, 2015 SHALL WE PLAY A GAME? What We Want (And How We Get It) What environment does What Outcomes environment says Actions Decisions What


slide-1
SLIDE 1

Culture and the Games People Play

Roy Rapoport rsr@netflix.com @royrapoport November 18, 2015

slide-2
SLIDE 2

SHALL WE PLAY A GAME?

slide-3
SLIDE 3

What We Want

(And How We Get It)

Outcomes Actions

Decisions

What
 environment
 says What environment does

slide-4
SLIDE 4

What We Want

(And How We Get It)

Outcomes Actions

Decisions

What
 environment
 says What environment does

slide-5
SLIDE 5

What We Want

(And How We Get It)

Decisions

What
 environment
 says What environment does

slide-6
SLIDE 6

What We Want

(And How We Don’t Get It)

slide-7
SLIDE 7

What We Want

(And How We Don’t Get It)

slide-8
SLIDE 8

Test #1 Attendance Award

slide-9
SLIDE 9

A Word About Netflix …

  • Clear Priorities
  • 1. Innovation
  • 2. Availability
  • 3. Cost
  • Hire smart, experienced, people
  • Get out of the way
  • Anti-process bias

Culture

slide-10
SLIDE 10

In Practice …

slide-11
SLIDE 11

Dozens of SSL Certificates Decentralized Kept Expiring Hilarity would ensue Amazon Resources “No Preset Limit” You know when you hit it Hilarity would ensue

The Before Time

slide-12
SLIDE 12

Well-developed Developer Ecosystem Service Discovery DB Client Credentials Management Memory Object Cache Server Infrastructure Telemetry You wanted that for Java, right?

The Before Time

slide-13
SLIDE 13

Just moved from IT/Ops Formally tasked with SSL cert issue as quarterly goal Limits issue “tacked” on “Effective” in Python Didn’t know Java

Presenter Selfie

The Before Time

slide-14
SLIDE 14

Ported necessary libraries to Python Boss was dubious. Really dubious. Ran into security problem Introducing Jay

No Problem!

slide-15
SLIDE 15

Democratized Innovation

What would you say you do around here?

Story Time: Shark Tank

slide-16
SLIDE 16

Conceived by Reliability Engineer Remote Telemetry Network Teams involved: Reliability Engineering Insight Engineering Performance Engineering Some others …

Surprise!

“Proof-of-concept work

  • n Ansible

configuration management for Gulo and Hammerhead.”

slide-17
SLIDE 17

Avoid Zero-Sum Games Stack ranking Fixed bonus / raise pools No ranking/quantifying Reviews != raises Decentralize collaboration Align goals

I want:

Collaboration and Selflessness

slide-18
SLIDE 18

Act In Netflix’s Best Interests

slide-19
SLIDE 19

Test #2 Early Birds, Late Worms

slide-20
SLIDE 20

I want:

Decentralized Innovation Autonomy and Independence

Bets and Risk Tolerance: a Story of Failures

slide-21
SLIDE 21

Losing Bets

18 month report card (estimated)

Security Monkey Success Howler Monkey Success Exploit Monkey Failure Python Success Service SLA Dashboard Failure Alert Outsourcing Success Alert Response Analytics Failure Alert Gateway Success Alerting GUI Success Latency Monkey Adoption Fizzle Stateful Alerting Failure Open Application Alerting Failure

50% Failure Rate

slide-22
SLIDE 22

I want:

Decentralized Innovation Autonomy and Independence

An Engineering Manager Walks Into an Override Bar …

slide-23
SLIDE 23

The Override Bar

Asgard: Full-fledged cloud

  • rchestration

GUI-driven Region-and-account specific

slide-24
SLIDE 24

The Override Bar

Four regions Eight accounts Hundreds of clusters

slide-25
SLIDE 25

The Override Bar

A Bold Proposal Totally duplicates functionality Customized fit Failed the override bar: Am I sure this is the wrong thing? If I’m right, will this be very expensive for us?

slide-26
SLIDE 26

The Override Bar

Accomplished predicted results Massively simplified operational processes Improved resiliency and velocity Unpredictable results Used by other teams Inspiration Will retire

slide-27
SLIDE 27

I want:

Decentralized Innovation Autonomy and Independence

Spheres of Autonomy: Staying DRI

slide-28
SLIDE 28

Yury’s SoA Yury’s SoA Yury’s SoA Josh’s SoA Roy’s Sphere of autonomy

Concentric Spheres of Autonomy

Fang’s Sphere of autonomy

slide-29
SLIDE 29

Reed’s Sphere of Autonomy Neil’s Sphere of Autonomy Yury’s Sphere of Autonomy Josh’s Sphere of Autonomy Roy’s Sphere of autonomy

Spheres of Autonomy: A New Model

Fang’s sphere of autonomy

slide-30
SLIDE 30

Spheres of Autonomy: A New Model

Set context. Not control.

slide-31
SLIDE 31

Spheres of Autonomy: A New Model

Keeping Peers DRI

slide-32
SLIDE 32

Test #3 Lucy and the Ball

slide-33
SLIDE 33

Literally* no downsides!

* For very non-literal definitions of the word “literally”

Predictability tradeoffs Locality optimization Duplication Duplication

slide-34
SLIDE 34

Agility vs Predictability

Neither is bad Probably need some of both Do you know how much you want? Do you have it?

Agility Predictability

slide-35
SLIDE 35

Agility vs Predictability

Optimize for agility Constrain predictability Some things are important to predict Public KPIs Big product plans Fewer are important than you may think

Agility Predictability

slide-36
SLIDE 36

If a Thing can be built anywhere Not always in the best place Extra work

Locality Optimization

Or lack thereof

slide-37
SLIDE 37

Locality Optimization

Or lack thereof

Story Time: Scryer

slide-38
SLIDE 38

Scryer: Start State

Real-Time Telemetry System 2 weeks of data

slide-39
SLIDE 39

Scryer: Goal

Real-Time Telemetry System 2 weeks of data Predictor Signal Predictions Today Product Value-add Process

slide-40
SLIDE 40

Scryer Architecture, v1

Real-Time Telemetry System 2 weeks of data Telemetry Extractor Telemetry Persistence 4 weeks of data Predictor Signal Predictions Today Product Value-add Process Waste of Time Pain the [REDACTED]

slide-41
SLIDE 41

The Thing Is …

Real-Time Telemetry System 2 weeks of data Cloud Storage All telemetry, forever ETL

slide-42
SLIDE 42

Scryer Architecture, v2

Real-Time Telemetry System 2 weeks of data Predicted Signal Today Predictor Product Value-add Process Cloud Storage All telemetry, forever ETL

slide-43
SLIDE 43

Test #4

Making Friends $100 At a Time

slide-44
SLIDE 44
slide-45
SLIDE 45

"I only want to ride the wind and walk the waves, slay the big whales of the Eastern sea, clean up frontiers, and save the people from drowning. Why should I imitate others, bow my head, stoop over and be a slave?” - Lady Triệu

slide-46
SLIDE 46

rsr@netflix.com @royrapoport Attributions:

https://www.flickr.com/photos/cseeman/ http://www.flickr.com/photos/watchsmart http://www.flickr.com/photos/yaketyyakyak/ https://www.flickr.com/photos/gfreeman23/ https://www.flickr.com/photos/dotcode https://www.flickr.com/photos/tlindfors And the Rands Leadership Slack