Its About How We Work Randy Shoup @randyshoup - - PowerPoint PPT Presentation

it s about how we work
SMART_READER_LITE
LIVE PREVIEW

Its About How We Work Randy Shoup @randyshoup - - PowerPoint PPT Presentation

DevOps Its About How We Work Randy Shoup @randyshoup linkedin.com/in/randyshoup Background VP Engineering at Stitch Fix o Using technology and data science to revolutionize clothing retail Consulting CTO as a service o Helping


slide-1
SLIDE 1

DevOps

It’s About How We Work

Randy Shoup @randyshoup linkedin.com/in/randyshoup

slide-2
SLIDE 2

Background

  • VP Engineering at Stitch Fix
  • Using technology and data science to revolutionize clothing retail
  • Consulting “CTO as a service”
  • Helping companies move fast at scale J
  • Director of Engineering for Google App Engine
  • World’s largest Platform-as-a-Service
  • Chief Engineer at eBay
  • Evolving multiple generations of eBay’s infrastructure
slide-3
SLIDE 3

Time to Value

slide-4
SLIDE 4

Speed vs. Stability?

slide-5
SLIDE 5

High-Performing Organizations

  • Multiple deploys per day vs.
  • ne per month
  • Commit to deploy in less than

1 hour vs. one week

  • Recover from failure in less

than 1 hour vs. one day

  • Change failure rate of 0-15%
  • vs. 31-45%
slide-6
SLIDE 6

High-Performing Organizations

  • Multiple deploys per day vs.
  • ne per month
  • Commit to deploy in less than

1 hour vs. one week

  • Recover from failure in less

than 1 hour vs. one day

  • Change failure rate of 0-15%
  • vs. 31-45%

Speed Stability

slide-7
SLIDE 7

Speed AND Stability!

slide-8
SLIDE 8

High-Performing Organizations

è 2.5x more likely to exceed goals

  • Profitability
  • Market share
  • Productivity
slide-9
SLIDE 9

DevOps How We Work

  • Organizing for DevOps
  • What to Build / What NOT to Build
  • When to Build
  • How to Build
  • Delivering and Operating
slide-10
SLIDE 10

DevOps How We Work

  • Organizing for DevOps
  • What to Build / What NOT to Build
  • When to Build
  • How to Build
  • Delivering and Operating
slide-11
SLIDE 11

Conway’s Law

  • Organization determines architecture
  • Design of a system will be a reflection of the communication paths within

the organization

  • Modular system requires modular organization
  • Small, independent teams lead to more flexible, composable systems
  • Larger, interdependent teams lead to larger systems
  • We can engineer the system we want by

engineering the organization

@randyshoup linkedin.com/in/randyshoup

slide-12
SLIDE 12

Small “Service” Teams

  • Full-Stack, “2 Pizza” Teams
  • No team should be larger than can be fed by 2 large pizzas
  • Typically 4-6 people
  • All disciplines required for the team to function
  • Aligned to Business Domains
  • Clear, well-defined area of responsibility
  • Single service or set of related services
  • Deep understanding of business problems
  • Growth through “cellular mitosis”

@randyshoup linkedin.com/in/randyshoup

slide-13
SLIDE 13

Ideally, 80% of project work should be within a team boundary.

slide-14
SLIDE 14

DevOps How We Work

  • Organizing for DevOps
  • What to Build / What NOT to Build
  • When to Build
  • How to Build
  • Delivering and Operating
slide-15
SLIDE 15

What problem are you trying to solve?

slide-16
SLIDE 16

“Building the wrong thing is the biggest waste in software development.”

  • - Mary and Tom Poppendieck,

Lean Software Development

slide-17
SLIDE 17

What Problem Are You Trying to Solve?

  • Focus on what is important for your business
  • Problem might be solved without any technology at

all

  • Redefine the problem
  • Change the business process
  • Continue manually before automating in an application

@randyshoup linkedin.com/in/randyshoup

slide-18
SLIDE 18

“A problem well-stated is a problem half-solved.”

  • - Charles Kettering, former head of

research for General Motors

slide-19
SLIDE 19

Buy, Not Build

  • Use Cloud Infrastructure
  • Faster, cheaper, better than we can do ourselves
  • Stitch Fix has no owned physical infrastructure anywhere in the world
  • Prefer Open Source
  • Kubernetes, Docker, Istio
  • MySQL, Postgres, Redis, Elastic Search
  • Machine learning models
  • Etc.
  • Usually better than the commercial alternatives (!)

@randyshoup linkedin.com/in/randyshoup

slide-20
SLIDE 20

Buy, Not Build

  • Third-Party Services
  • Stitch Fix uses >50 third party services
  • Logging, monitoring, alerting
  • Project management, bug tracking
  • Payments, billing, fraud detection
  • Etc.
  • Focus on your core competency
  • Consider third-party services for everything else (!)

@randyshoup linkedin.com/in/randyshoup

slide-21
SLIDE 21

Soon it will be just as common to run your own data center as it is to run your own electrical power generation.

slide-22
SLIDE 22

Experimental Discipline

  • State your hypothesis
  • What metrics do you expect to move and why
  • Understand your baseline
  • Run a real A | B test
  • Sample size
  • Isolated treatment and control groups
  • No peeking or quitting early!
  • Obsessively log and measure
  • Understand customer and system behavior
  • Understand why this experiment worked or did not
  • Develop insights for next experiment
slide-23
SLIDE 23

eBay Machine-Learned Ranking

  • Ranking function for search results
  • Which item should appear 1st, 10th, 100th, 1000th
  • Before: Small number of hand-tuned factors
  • Goal: Thousands of factors
  • Incremental Experimentation
  • Predictive models: query->view, view->purchase, etc.
  • Hundreds of parallel A | B tests
  • Full year of steady, incremental improvements

è 2% increase in eBay revenue (~$120M / year)

slide-24
SLIDE 24

eBay Site Speed

  • Reduce user-experienced latency for search results
  • Iterative Process
  • Implement a potential improvement
  • Release to the site in an A | B test
  • Monitor metrics –time to first byte, time to click, click rate, purchase rate

è 2% increase in eBay revenue (~$120M / year)

slide-25
SLIDE 25

DevOps How We Work

  • Organizing for DevOps
  • What to Build / What NOT to Build
  • When to Build
  • How to Build
  • Delivering and Operating
slide-26
SLIDE 26

Prioritization

  • We always have more to do than resources to do it
  • Scarce resources require prioritization
  • Opportunity cost -- deciding to do X means deciding not to do Y
  • Every decision is a tradeoff
  • Priority ← Return on Investment
  • Business Value / Effort

@randyshoup linkedin.com/in/randyshoup

slide-27
SLIDE 27

Fewer Things, More Done

slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

Fewer Things, More Done

  • Deliver Full Value Earlier
  • Time Value of Money
  • Benefit now is worth more than benefit in the future
  • Incremental Delivery
  • Deliver increments along the way instead of everything at the end
  • Tasks often take less time
  • Multiple engineers can unblock one another

@randyshoup linkedin.com/in/randyshoup

slide-31
SLIDE 31

“When you solve problem

  • ne, problem two gets a

promotion.”

slide-32
SLIDE 32

DevOps How We Work

  • Organizing for DevOps
  • What to Build / What NOT to Build
  • When to Build
  • How to Build
  • Delivering and Operating
slide-33
SLIDE 33

Quality Discipline

  • Quality and Reliability are “Priority-0 features”
  • Equally important to users as product features and engaging user

experience

  • Developers responsible for
  • Features
  • Quality
  • Performance
  • Reliability
  • Manageability
slide-34
SLIDE 34

Test-Driven Development

  • Tests make better code
  • Confidence to break things
  • Courage to refactor mercilessly
  • Tests make better systems
  • Catch bugs earlier, fail faster

@randyshoup linkedin.com/in/randyshoup

slide-35
SLIDE 35

Optimizing Developer Effort

@randyshoup linkedin.com/in/randyshoup

  • 75% reading

existing code

  • 20% modifying

existing code

  • 5% writing new

code

https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/

slide-36
SLIDE 36

Optimizing Developer Effort

@randyshoup linkedin.com/in/randyshoup

  • 75% reading

existing code

  • 20% modifying

existing code

  • 5% writing new

code

https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/

slide-37
SLIDE 37

“Do you have time to do it twice?” “We don’t have time to do it right!”

slide-38
SLIDE 38

The more constrained you are

  • n time or resources, the more

important it is to build it right the first time.

slide-39
SLIDE 39

Build It Right (Enough) The First Time

  • Build one great thing instead of two half-finished

things

  • Right ≠ Perfect (80 / 20 Rule)
  • è Basically no bug tracking system (!)
  • Bugs are fixed as they come up
  • Backlog contains features we want to build
  • Backlog contains technical debt we want to repay

@randyshoup linkedin.com/in/randyshoup

slide-40
SLIDE 40

DevOps How We Work

  • Organizing for DevOps
  • What to Build / What NOT to Build
  • When to Build
  • How to Build
  • Delivering and Operating
slide-41
SLIDE 41

You Build It, You Run It.

  • - Werner Vogels
slide-42
SLIDE 42

End-to-End Ownership

  • All disciplines required for the team’s function
  • Design
  • Development
  • Quality and Performance
  • Maintenance
  • Operations
  • Teams take long-term ownership
  • Team owns service from design to deployment to retirement
  • No separate maintenance or sustaining engineering team
slide-43
SLIDE 43

Continuous Delivery

  • Repeatable Deployment Pipeline
  • Low-risk, push-button deployment
  • Rapid release cadence
  • Rapid rollback and recovery
  • Most applications deployed multiple times per day
  • More solid systems
  • Release smaller units of work
  • Smaller changes to roll back or roll forward
  • Faster to repair, easier to understand, simpler to diagnose

@randyshoup linkedin.com/in/randyshoup

slide-44
SLIDE 44

Blameless Post-Mortems

  • Post-mortem After Every Incident
  • Document exactly what happened
  • What went right
  • What went wrong
  • Open and Honest Discussion
  • What contributed to the incident?
  • What could we have done better?

è Engineers compete to take personal responsibility (!)

@randyshoup linkedin.com/in/randyshoup

slide-45
SLIDE 45

“Finally we can prioritize fixing that broken system!”

slide-46
SLIDE 46

Blameless Post-Mortems

  • Action Items
  • How will we change process, technology, documentation, etc.
  • How could we have automated the problems away?
  • How could we have diagnosed more quickly?
  • How could we have restored service more quickly?
  • Follow up (!)

@randyshoup linkedin.com/in/randyshoup

slide-47
SLIDE 47

Failure is not falling down, but refusing to get back up.

  • - Theodore Roosevelt
slide-48
SLIDE 48

DevOps How We Work

  • Organizing for DevOps
  • What to Build / What NOT to Build
  • When to Build
  • How to Build
  • Delivering and Operating
slide-49
SLIDE 49

High-Performing Organizations

è 2.5x more likely to exceed business goals

  • Profitability
  • Market share
  • Productivity

@randyshoup linkedin.com/in/randyshoup

https://puppet.com/resources/whitepaper/state-of-devops-report

slide-50
SLIDE 50

Time to Value

slide-51
SLIDE 51

Merci vielmal!

  • @randyshoup
  • linkedin.com/in/randyshoup
slide-52
SLIDE 52

DevOps Resources

  • Books
  • The Phoenix Project, 2013
  • The DevOps Handbook, 2015
  • Making Work Visible, 2017
  • Leading the Transformation,

2015

  • Continuous Delivery, 2010
  • Lean Software Development,

2003

  • Inspirations
  • The Goal, 1984
  • Toyota Production System, 1978
  • Toyota Kata, 2009

@randyshoup linkedin.com/in/randyshoup

  • Conferences
  • DevOpsDays (everywhere;

Zürich, May 2-3 2018)

  • DevOps Enterprise Summit

(London, June 25-26 2018)

  • Podcasts
  • DevOps Café
  • Arrested DevOps