What they dont tell you about -services Q C o n N Y J u n e 2 0 - - PowerPoint PPT Presentation

what they don t tell you about services
SMART_READER_LITE
LIVE PREVIEW

What they dont tell you about -services Q C o n N Y J u n e 2 0 - - PowerPoint PPT Presentation

What they dont tell you about -services Q C o n N Y J u n e 2 0 1 6 Daniel Rolnick C h i e f Te c h n o l o g y O f f i c e r Daniel Rolnick C h i e f Te c h n o l o g y O f f i c e r daniel.rolnick@yodle.com Story Time


slide-1
SLIDE 1

What they don’t tell you about µ-services…

Q C o n N Y – J u n e 2 0 1 6

Daniel Rolnick

C h i e f Te c h n o l o g y O f f i c e r

slide-2
SLIDE 2

Daniel Rolnick

C h i e f Te c h n o l o g y O f f i c e r

daniel.rolnick@yodle.com

slide-3
SLIDE 3

Story Time

slide-4
SLIDE 4

September 2014

Story Time

slide-5
SLIDE 5

June 2016

Story Time

slide-6
SLIDE 6

Something’s gotta give

▶ Changing environments cause stress ▶ Existing processes need to be revisited ▶ Processes need to to be created ▶ New technology needs to be integrated ▶ Businesses are built on trade-offs

Evolution Requires Adaptation

slide-7
SLIDE 7

Expected developmental needs

▶ Platform as a Service ▶ Service Discovery ▶ Testing ▶ Containerization ▶ Monitoring

Eyes Wide Open

slide-8
SLIDE 8

Unexpected implications of micro-services

▶ Impact on data access ▶ Build and Deploy Tooling ▶ Source Repository Complexity ▶ Cross application monitoring

Expect the Unexpected

slide-9
SLIDE 9

Bring on the complexity

Story Time

50 100 150 200 250 Yodle Service Count

slide-10
SLIDE 10

Data access patterns

slide-11
SLIDE 11

Independent Data Domains

▶ Isolated data ownership per micro-service ▶ Options: Physical Databases, Schemas, Polyglot ▶ Ideal state for new things but what about the old stuff ▶ Can’t get there in one move

Microservices Macroproblems

slide-12
SLIDE 12

Baby Steps to Freedom

▶ Central data stores are leaky abstractions

Microservices Macroproblems

slide-13
SLIDE 13

Baby Steps to Freedom

Microservices Macroproblems

▶ Central data stores are leaky abstractions ▶ Enforce data ownership

through access patterns

slide-14
SLIDE 14

Baby Steps to Freedom

▶ Central data stores are leaky abstractions ▶ Enforce data ownership

through access patterns

▶ Façade for decoupling

Microservices Macroproblems

slide-15
SLIDE 15

Baby Steps to Freedom

▶ Central data stores are leaky abstractions ▶ Enforce data ownership

through access patterns

▶ Façade for decoupling ▶ Multi-step process

Microservices Macroproblems

slide-16
SLIDE 16

Shared Containers Simplify Things

Microservices Macroproblems

▶ Services in the same container reuse

connections

▶ Connection pooling goes away ▶ Base connection count starts

adding up

▶ You could always go to a minimum

idle of zero

▶ What could go wrong?

slide-17
SLIDE 17

Microservices Macroproblems

50 100 150 200 250 Yodle Service Count

slide-18
SLIDE 18

External Connection Pooling

▶ Connection pooling outside of the container ▶ Add visibility while you’re at it ▶ Better logging, cleaner visualizations

Microservices Macroproblems

slide-19
SLIDE 19

Microservices Macroproblems

slide-20
SLIDE 20

Tooling for empowerment

▶ Server spin-up ▶ Schema and Account creation ▶ Ensure externalized your configurations

Microservices Macroproblems

slide-21
SLIDE 21

Platform as a Service

slide-22
SLIDE 22

Static Configurations

▶ Every application deployed to a fixed set of hosts on a set of known ports ▶ Monitoring was done at a gross system synthetic level ▶ Only complete outages were easily detectable ▶ Manual restarts required ▶ PS-Watcher and Docker restart help but are not sufficient ▶ This was not going to scale

A Place for Everything and Everything…

slide-23
SLIDE 23

Keeping services alive by hand is problematic

▶ Researched available PaaS Platforms available in late 2014

  • Mesos / Marathon
  • CoreOS

▶ What about:

  • Kubernetes
  • Swarm
  • AWS Elastic Container Service

This Ain’t Gonna Scale

slide-24
SLIDE 24

Mesos and Marathon

▶ Deploy applications to marathon ▶ Marathon decides what host and port to run applications on ▶ Health checks are built in to ensure application up-time ▶ Mesos ensures the applications run and are contained

Platform as a Service

slide-25
SLIDE 25

Platform as a Service

50 100 150 200 250 Yodle Service Count

Pace of Innovation Increases

slide-26
SLIDE 26

Service Discovery

slide-27
SLIDE 27

Aware Apps vs. Smart Pipes

▶ Service discovery can be baked into

your application

Dynamic Topologies Require Service Discovery

slide-28
SLIDE 28

Aware Apps vs. Smart Pipes

▶ Plumbing can take care of it for you ▶ Smart Pipes allows

  • Easier path to polyglot ecosystem
  • Decouple applications from

service discovery

▶ We chose the latter but we had to iterate a few times to get there

Dynamic Topologies Require Service Discovery

slide-29
SLIDE 29

Curator already in place

▶ Already used zookeeper/curator for our thrift based macro-services ▶ Made our micro-services self register and do discovery via curator ▶ You can’t solve everything at once ▶ Not our desired end state

Use What You Know

slide-30
SLIDE 30

Hipache by dotCloud

▶ URLs looked like https://svcb.services.prod.yodle.com ▶ Utilized dedicated routing servers

Service Discovery V2

slide-31
SLIDE 31

Hipache by dotCloud

▶ Pros: Decoupled service discovery from applications ▶ Cons: Services had to be environment aware

Service Discovery V2

slide-32
SLIDE 32

PaaS’s built-in routing layer

▶ Marathon has a built-in routing layer using haproxy ▶ Simple command to generate an haproxy config ▶ Basic listener (Qubit Bamboo) keep haproxy files up-to-date ▶ Hipache could have worked

Service Discovery V3

slide-33
SLIDE 33

Discovery was simpler

Service Discovery V3 Continued

slide-34
SLIDE 34

Discovery was simpler

▶ Service discovery is now fully externalized ▶ Iterate on routing and discovery independently ▶ Created tech debt for the applications

Service Discovery V3 Continued

slide-35
SLIDE 35

Service Discovery V4

50 100 150 200 250 Yodle Service Count

Scale Problems

slide-36
SLIDE 36

Many to Many Problems

▶ As the number of slave nodes in our PaaS grew so did our problems ▶ Health checks from every host to every container ▶ Ensuring the HAproxy file was up-to-date

  • n all hosts was annoying

▶ Centralized onto a small cluster of routing boxes

Service Discovery V4

slide-37
SLIDE 37

Testing

slide-38
SLIDE 38

Regressions give comfort

▶ Monolithic releases are understandable ▶ We tested everything ▶ Everything works

Continuous Integration

slide-39
SLIDE 39

Release code as it is written

Continuous Delivery Pipeline

Develop Commit to Branch Continuous Integration Merge Continuous Delivery

slide-40
SLIDE 40

Regressions take time

▶ Empower continuous delivery ▶ Broke apart our monolithic regression suite ▶ Same methodology for macro and micro-services

Continuous Integration

slide-41
SLIDE 41

Enter the Canary

▶ Landscape is in flux ▶ If we test a subset of things how can

we be sure everything works?

▶ Canary Ensures

▶ Dependencies met ▶ Satisfying existing contracts ▶ Handle production load

Continuous Delivery Pipeline

slide-42
SLIDE 42

Continuous Delivery Pipeline

▶ Special canary routing in our service discovery layer ▶ Test anywhere in the service mesh ▶ Discoverable tests using a /tests endpoint ▶ Monitor canary health in New Relic ▶ Promote to Canary Partial

slide-43
SLIDE 43

Continuous Delivery Pipeline

▶ Receive partial production load ▶ Monitor canary health in New Relic ▶ Validate response codes ▶ Measure throughput ▶ Promote to general availability

slide-44
SLIDE 44

Sentinel

Continuous Delivery Pipeline

slide-45
SLIDE 45

Sentinel

Continuous Delivery Pipeline

slide-46
SLIDE 46

Sentinel

Continuous Delivery Pipeline

slide-47
SLIDE 47

Sentinel

Continuous Delivery Pipeline

slide-48
SLIDE 48

Sentinel

▶ INSERT SCREENSHOTS OF SENTINEL

Continuous Delivery Pipeline

slide-49
SLIDE 49

Sentinel

▶ INSERT SCREENSHOTS OF SENTINEL

Continuous Delivery Pipeline

slide-50
SLIDE 50

Sentinel

▶ INSERT SCREENSHOTS OF SENTINEL

Continuous Delivery Pipeline

slide-51
SLIDE 51

Containers

slide-52
SLIDE 52

Standardization is required

▶ Polyglot environments buck standardization ▶ Micro-service environments increase complexity ▶ Operational complexity can grown unbounded ▶ Developers own the runtime ▶ Common runtime from an operator’s standpoint ▶ Tooling provides consistent deployments

Containers Bring Simplicity

slide-53
SLIDE 53

Hierarchical Container Images

▶ How do you roll out environmental changes when you have 200 different container

builds?

Containers Bring Simplicity

slide-54
SLIDE 54

Containers make a mess

▶ Docker host machines were littered ▶ Docker registry is littered with old images ▶ Developed a tagging process

Containers Bring Simplicity

slide-55
SLIDE 55

Monitoring

slide-56
SLIDE 56

Legacy Monitoring not cutting it

▶ Designed for testing and monitoring infrastructure ▶ Needed application performance management ▶ Wanted something that would scale with us with little effort

Increased Complexity Increased Requirements

slide-57
SLIDE 57

Graphite and Grafana

▶ Dropwizard metrics to report data ▶ Teams built custom dashboards ▶ Too much manual effort ▶ No alerting

Increased Complexity Increased Requirements

slide-58
SLIDE 58

Enter the Hackathon

▶ New Relic Monitoring For Microservices ▶ Simple – just add an agent ▶ Detailed per application dashboards out of the box ▶ Single score to focus attention (Useful for initial canary implementation) ▶ Basic alerting

Increased Complexity Increased Requirements

slide-59
SLIDE 59

100 Apps in 100 Days

▶ Made use of our base containers ▶ Rolled out monitoring to every application in the fleet ▶ Suddenly we had visibility everywhere. ▶ Some Limitations

  • No good docker support (this is better now)
  • Services graphs aren’t dynamically generated

Increased Complexity Increased Requirements

slide-60
SLIDE 60

Finding root causes

▶ Hundreds of Dashboards ▶ Hundreds of Individual Service Nodes ▶ Finding root causes in complex service graphs is difficult ▶ Anomalies from individual service nodes difficult to detect ▶ Still looking for a good solution

Increased Complexity Increased Requirements

slide-61
SLIDE 61

Source Repository Complexity

slide-62
SLIDE 62

Source Code Management

▶ Organizational scheme to help think about it ▶ Hound to help with code searching ▶ Repo tool to help keep up-to-date ▶ Upgrading libraries is a challenge

Source Repository Complexity

slide-63
SLIDE 63

Dependency Management

▶ INSERT IMAGE OF VANTAGE

Source Repository Complexity

slide-64
SLIDE 64

Dependency Management

Source Repository Complexity

slide-65
SLIDE 65

Dependency Management

Source Repository Complexity

slide-66
SLIDE 66

Dependency Management

Source Repository Complexity

slide-67
SLIDE 67

Build tooling

▶ Many build systems don’t directly allow scripting ▶ Bamboo definitely doesn’t ▶ Build tooling iterations are painful ▶ Managing Bamboo build and deploy plans at scale is hard

Source Repository Complexity

slide-68
SLIDE 68

Existing Build Tooling

Build and Deploy Tooling

50 100 150 200 250 Yodle Service Count

slide-69
SLIDE 69

Directly to Marathon configurations in Bamboo

Build and Deploy Tooling

50 100 150 200 250 Yodle Service Count

slide-70
SLIDE 70

LaunchPad as an Abstraction Layer

Build and Deploy Tooling

50 100 150 200 250 Yodle Service Count

slide-71
SLIDE 71

Build and Deploy Tooling

slide-72
SLIDE 72

Sentinel For Human Service Discovery

Source Repository Complexity

slide-73
SLIDE 73

Sentinel For Human Service Discovery

Source Repository Complexity

slide-74
SLIDE 74

Sentinel For Human Service Discovery

Source Repository Complexity

slide-75
SLIDE 75

Conclusion

slide-76
SLIDE 76

Even if you aren’t on the bleeding edge …

▶ Every environment is different ▶ Legacy Applications present unique challenges ▶ Different business requirements ▶ Different trade-offs

Plan for Challenges

slide-77
SLIDE 77

Every Hurdle Was Worth It

Improved Agility

200 400 600 800 1000 1200 1400

Monthly Deployments

slide-78
SLIDE 78

We make it easy to grow and manage profitable customer relationships It It’s success simplifi fied!