Getting a System to Production ... and keeping it there Eoin Woods - - PowerPoint PPT Presentation

getting a system to production
SMART_READER_LITE
LIVE PREVIEW

Getting a System to Production ... and keeping it there Eoin Woods - - PowerPoint PPT Presentation

Getting a System to Production ... and keeping it there Eoin Woods SATURN 2016 Endava 1 Who Am I? Eoin Woods - CTO at Endava 2005 - 2014 in capital markets (UBS, BGI) 2000 - 2004 in product engineering & consultancy (Bull,


slide-1
SLIDE 1

Getting a System to Production

... and keeping it there

1

Eoin Woods
 Endava

SATURN 2016

slide-2
SLIDE 2

Who Am I?

Eoin Woods - CTO at Endava

2005 - 2014 in capital markets (UBS, BGI) 2000 - 2004 in product engineering & consultancy 
 (Bull, Sybase, InterTrust, independent)

Author, editor, speaker, community-guy

2

slide-3
SLIDE 3

Who are Endava?

Software Engineering & IT Services Firm

2800+ people UK, US, Germany, Romania, Moldova, Serbia, Macedonia

Agile and Digital Transformation

Consulting, Architecture, Development, Testing Data and Analytics Application Management, Infrastructure, DevOps

3

slide-4
SLIDE 4

Content

Introducing Production Systems What Goes Wrong in Production? Solutions for Production Systems Conclusions

4

slide-5
SLIDE 5

Production Systems

5

slide-6
SLIDE 6

What is a production system?

6

Any system
 being used
 for real work

slide-7
SLIDE 7

Why is Productionisation Hard?

No one teaches you about production

who do you talk to? what do they want? what is the definition of “done” ?

Production is difficult for developers

hard to access, interrogate, debug, change, ...

7

slide-8
SLIDE 8

A new cast of characters

8

Developers

Development

Users

slide-9
SLIDE 9

A new cast of characters

8

Production

Users Developers Auditors Operations Acquirers Infrastructure Business
 Management

slide-10
SLIDE 10

Production is constrained

Highly controlled Content is all valuable Change can be difficult

9

slide-11
SLIDE 11

Production is unpredictable

10

slide-12
SLIDE 12

Production is highly visible!

11

slide-13
SLIDE 13

You don’t own production

12

slide-14
SLIDE 14

What goes wrong?

13

slide-15
SLIDE 15

Performance surprises

Interactive load Batch time surprises System abusers!

“all transactions this year”, “average since 1967”, ...

14

slide-16
SLIDE 16

Environment bombshells

Constraints and contention Unexpected behaviour Integration points

15

slide-17
SLIDE 17

Failures happen

Software defects Platform failures Environment failures

16

slide-18
SLIDE 18

Security tangles

Security is simple in Development Much more complex in Production!

17

slide-19
SLIDE 19

Finding Solutions

18

slide-20
SLIDE 20

Architects Know This - Right?

19

scalability deployability monitorability

  • perability

availability interoperability performance security testability capacity reliability

T O O H A R D

slide-21
SLIDE 21

Architectural Heresy

Architects obsess about system qualities

usually results in good production characteristics

However teams just find this all a bit hard

too many qualities, need to get functions delivered

… and we must empower teams

architects can’t be responsible for all of the software being “production ready”

20

slide-22
SLIDE 22

Key requirements for production

Functionally correct

does what the business process requires

Stability

behaves predictably in all situations

Capacity

can process the workload required (at all times)

Security

limits access to those who are authorised to have it

21

slide-23
SLIDE 23

Solution Framework

Correctness Stability Capacity Security Design Principles Technology Practices

22

slide-24
SLIDE 24

Solution Framework

Correctness Stability Capacity Security Design Principles Technology Practices

Simplicity

22

slide-25
SLIDE 25

Solution Framework

Correctness Stability Capacity Security Design Principles Technology Practices

Simplicity Resource Governor

22

slide-26
SLIDE 26

Solution Framework

Correctness Stability Capacity Security Design Principles Technology Practices

Simplicity Resource Governor Threat Modelling

22

slide-27
SLIDE 27

Solution Framework

Correctness Stability Capacity Security Design Principles Technology Practices

Simplicity Resource Governor Threat Modelling

22

Our focus today

slide-28
SLIDE 28

General Principles

One Team Automate Measure and Improve (feedback loops) Good Enough over Perfection

23

Timeless principles … that led to CD and DevOps

slide-29
SLIDE 29

So How About DevOps?

DevOps helps get code to production

not much about whether it is ready for production

Developers still need to “productionise”

make sure the software meets the requirements for production operation

Relatively few developers get much training to prepare them for this

24

slide-30
SLIDE 30

DevOps Principles

Communication Automation Lean thinking Measurement Sharing

25

CALMS - itrevolution.com/devops-culture-part-1

slide-31
SLIDE 31

Solutions: Achieving Stability

26

slide-32
SLIDE 32

Stability - design principles

Fail quickly

fail fast, timeouts

Isolate problems

flow control, circuit breakers, bulkheads, asynchronous integration

Ensure steady state operation

housekeeping, predictable resource allocation, governors, throttling

27

slide-33
SLIDE 33

Stability - technology solutions

28

slide-34
SLIDE 34

Stability - technology solutions

Fail fast

28

slide-35
SLIDE 35

Stability - technology solutions

Fail fast Bulkhead

28

slide-36
SLIDE 36

Stability - technology solutions

Timeouts Fail fast Bulkhead

28

slide-37
SLIDE 37

Stability - technology solutions

Timeouts Fail fast Bulkhead Governor

28

slide-38
SLIDE 38

Stability - technology solutions

Timeouts Circuit Breaker Fail fast Bulkhead Governor

28

slide-39
SLIDE 39

Stability - technology solutions

Timeouts Circuit Breaker Fail fast Bulkhead Governor Housekeeping

28

slide-40
SLIDE 40

Example - Circuit Breaker

Normal Checking Tripped

err_returned timeout err_returned &&
 err_count > 10 err_returned

29

slide-41
SLIDE 41

Stability - practices

Repeatability

defined processes, practice scenarios, prelive environments

Automation

automate the routine, automate the difficult allow the human back in the loop on demand

Transparency

logging, monitoring, alerts, trends

30

slide-42
SLIDE 42

Stability - process automation

Logging 
 & Metrics Monitoring Automation

31

slide-43
SLIDE 43

Stability - environments

Development UAT Prelive Production

32

slide-44
SLIDE 44

“Uncontrolled”

Stability - environments

Development UAT Prelive Production

32

slide-45
SLIDE 45

“Controlled” “Uncontrolled”

Stability - environments

Development UAT Prelive Production

32

slide-46
SLIDE 46

“Controlled” “Uncontrolled”

Stability - environments

Development UAT Prelive Production

32

The DevOps Zone

slide-47
SLIDE 47

Stability - production runbooks

Security, Audit,
 Compliance, ... Production
 Operations Developers System design Experience Constraints

  • Overview
  • Install
  • Backout
  • Op Procs
  • Investigation
  • Recovery

33

slide-48
SLIDE 48

Solutions: Achieving Capacity

34

slide-49
SLIDE 49

Capacity - design principles

Minimise workload

efficiency is important

Flatten the peaks

move workload around

Design for the large (scalability)

understand where the time goes multiply by a million

35

slide-50
SLIDE 50

Capacity - technology solutions

Measure and minimise

understand where the work is

Caching and pre-computing

reduce the work to be done

Sharding and partitioning

separate workload to allow scale

36

slide-51
SLIDE 51

Capacity - solutions

37

slide-52
SLIDE 52

Capacity - solutions

Segment Timings

37

slide-53
SLIDE 53

Capacity - solutions

Segment Timings

Static cache

37

slide-54
SLIDE 54

Capacity - solutions

Segment Timings

Static cache Lookaside cache

37

slide-55
SLIDE 55

Capacity - solutions

Segment Timings

Static cache Lookaside cache Result set caching

37

slide-56
SLIDE 56

Capacity - solutions

Segment Timings

Static cache Lookaside cache Precompute Result set caching

37

slide-57
SLIDE 57

Capacity - solutions

Segment Timings

Static cache Lookaside cache Precompute Result set caching Phased batch

37

slide-58
SLIDE 58

Moving Work Around

Utilisation

25 50 75 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Utilisation

25 50 75 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

38

slide-59
SLIDE 59

Capacity - practices

Model and estimate Test capacity on realistic environments

allows model calibration

Monitoring and trend analysis

tests theory against reality spots impending storms before they hit

39

slide-60
SLIDE 60

Solutions: Achieving Security

40

slide-61
SLIDE 61

Security - key design principles

What they don’t have won’t hurt you

least privilege - grant the minimum needed

Security needs simplicity

what you can’t analyse you can’t be sure about

Don’t put your eggs in one basket

separate privileges to avoid total breaches

Fail safely

41

slide-62
SLIDE 62

Security - solutions

42

slide-63
SLIDE 63

Security - solutions

Authentication & Roles

42

slide-64
SLIDE 64

Security - solutions

Authentication & Roles Least privilege / separation

42

slide-65
SLIDE 65

Security - solutions

Authentication & Roles Least privilege / separation Privacy (TLS)

42

slide-66
SLIDE 66

Security - solutions

Authentication & Roles Least privilege / separation Privacy (TLS) Trust (certs)

42

slide-67
SLIDE 67

Security - solutions

Authentication & Roles Least privilege / separation Privacy (TLS) Isolation (firewalls & zones) Trust (certs)

42

slide-68
SLIDE 68

Security - key practices

Model threats to identify mitigation Define policy to know what to protect Apply mechanisms to mitigate threats Test security as well as functions

43

slide-69
SLIDE 69

Security - techniques

Security Model Threat
 Model

44

slide-70
SLIDE 70

Summary

45

slide-71
SLIDE 71

Production is just different

it’s not yours and you need to respect that

Production is demanding

Correctness Stability Capacity Security

Summary

46

slide-72
SLIDE 72

Summary (ii)

Identify solutions by requirement & area

principles technologies practices

47

slide-73
SLIDE 73

Summary (iii)

Production requirements and principles go back to the age of the mainframe CD and DevOps makes another step welcome attention from developers new tech enabling new possibilities breaking down silos to make it happen

48

slide-74
SLIDE 74

Books

Software Systems Architecture

Second Edition

NICK ROZANSKI • EOIN WOODS

Working with Stakeholders Using Viewpoints and Perspectives

Second Edition

49

slide-75
SLIDE 75

Eoin Woods


eoin.woods@endava.com
 www.eoinwoods.info
 @eoinwoodz

Thank you. Questions?

50

Acknowledgements

http://www.icons-land.com http://www.alamy.com/ http://www.42u.com