Microservices reativos usando a stack do Netfmix na AWS Diego - - PowerPoint PPT Presentation

microservices reativos usando a stack do netfmix na aws
SMART_READER_LITE
LIVE PREVIEW

Microservices reativos usando a stack do Netfmix na AWS Diego - - PowerPoint PPT Presentation

Microservices reativos usando a stack do Netfmix na AWS Diego Pacheco Principal Software Architect at ilegra.com @diego_pacheco www.ilegra.com NetfmixOSS Stack Why Netfmix? Netfmix My Problem Billions Requests Per Day Social


slide-1
SLIDE 1

Microservices reativos usando a stack do Netfmix na AWS

Diego Pacheco Principal Software Architect at ilegra.com @diego_pacheco

slide-2
SLIDE 2
slide-3
SLIDE 3

www.ilegra.com

slide-4
SLIDE 4

NetfmixOSS Stack

slide-5
SLIDE 5

Why Netfmix?  Billions Requests Per Day

 1/3 US internet bandwidth  ~10k EC2 Instances  Multi-Region  100s Microservices  Innovation + Solid Service  SOA, Microservices and DevOps Benchmark

 Social Product  Social Network  Video  Docs  Apps  Chat Scalability Distributed T eams Could reach some Web Scale Netfmix My Problem

slide-6
SLIDE 6

AWS

slide-7
SLIDE 7

Cloud Native

slide-8
SLIDE 8

Principles  Stateless Services  Ephemeral Instances  Everything fails all the time  Auto Scaling / Down Scaling  Multi AZ and multi Region  No SPOF  Design for Failure (expected)  SOA  Microservices  No Central Database  NoSQL  Lightweight Serializable Objects  Latency tolerant protocols  DevOps Enabler

 Immutable Infrastructure  Anti-Fragility

slide-9
SLIDE 9

Right Set of Assumptons

slide-10
SLIDE 10

Microservices

slide-11
SLIDE 11

Reactive

slide-12
SLIDE 12

Java Drivers X REST

X

slide-13
SLIDE 13

Simple View of the Architecture

Zuul UI Microservice Cassandra Cluster

slide-14
SLIDE 14

Stack

slide-15
SLIDE 15

OSS

slide-16
SLIDE 16

Zuul

slide-17
SLIDE 17

Zuul

slide-18
SLIDE 18

Karyon: Microbiology - Nucleus

slide-19
SLIDE 19

 Reactive Extensions + Netty Server  Lower Latency under Heavy Load  Fewer Locks, Fewer Thread Migrations  Consumes Less CPU  Lower Object Allocation Rate

RxNetty

slide-20
SLIDE 20

Karyon: CODE

slide-21
SLIDE 21

Karyon: Reactive

slide-22
SLIDE 22

Karyon: Reactive

slide-23
SLIDE 23

Eureka and Service Discovery http://microservices.io/patterns/server-side-discovery.html

slide-24
SLIDE 24

Eureka

 AWS Service Registry for Mid-tier  Load balancing and Failover  REST based  Karyon and Ribbon Integration

slide-25
SLIDE 25

Eureka

slide-26
SLIDE 26

Eureka and Service Discovery

slide-27
SLIDE 27

Availability

slide-28
SLIDE 28

Histryx

slide-29
SLIDE 29

 IPC Library  Client Side Load Balancing  Multi-Protocol (HTTP, TCP, UDP)  Caching*  Batching  Reactive

Ribbon

slide-30
SLIDE 30

Ribbon CODE

slide-31
SLIDE 31

Ribbon CODE

slide-32
SLIDE 32

 Reactive Extension of the JVM  Async/Event based programming  Observer Pattern  Less 1mb  Heavy usage by Netfmix OSS Stack

RX-Java

slide-33
SLIDE 33

Archaius

 Confjguration Management Solution  Dynamic and T yped Properties  High Throughtput and Thread Safety  Callbacks: Notifjcations of confjg changes  JMX Beans  Dynamic Confjg Sources: File, Db, DynamoDB, Zookeper  Based on Apache Commons Confjguration

slide-34
SLIDE 34

Archaius + Git Microservice Microservice Slave Side Car Central Internal GIT Property Files File System Microservice Microservice Slave Side Car File System Microservice Microservice Slave Side Car File System

slide-35
SLIDE 35

Asgard

slide-36
SLIDE 36

Asgard

slide-37
SLIDE 37

Packer

JOB Create

Bake/Provision

Launch

Deploys

slide-38
SLIDE 38

Dynomite: Distributed Cache https://github.com/Netfmix/dynomite

slide-39
SLIDE 39

Dynomite  Implements the Amazon Dynamo Similar to Cassandra, Riak and DynamoDB Strong Consistency – Quorum-like – No Data Loss Pluggable Scalable Redis / Memcached Multi-Clients with Dyno Can use most of redis commands Integrated with Eureka via Prana

slide-40
SLIDE 40

 Isolate Failure – Avoid cascading  Redundancy – NO SPOF  Auto-Scaling  Fault T

  • lerance and Isolation

 Recovery  Fallbacks and Degraded Experience  Protect Customer from failures – Don’t throw Failures -> Failures VS Errors

Dynomite: Distributed Cache

slide-41
SLIDE 41

Dynomite: Internals

slide-42
SLIDE 42

Oregon D1 Oregon D2 N California D3 Eureka Server Eureka Server

Prana Prana

Prana

Multi-Region Cluster

slide-43
SLIDE 43

Dynomite: CODE

slide-44
SLIDE 44

Dynomite Contributions

https://github.com/Netfmix/dynomite https://github.com/Netflix/dynomite/pull/207 https://github.com/Netflix/dynomite/pull/200

slide-45
SLIDE 45

Caos Engineering

slide-46
SLIDE 46

Gatling

Stress T esting T

  • ol

Scala DSL Run on top of Akka Simple to use

slide-47
SLIDE 47

Chaos Arch

Zuul Microservice N1 Microservice N2 Cassandra Cluster Zuul Eureka ELB

slide-48
SLIDE 48

Running…

slide-49
SLIDE 49

Chaos Results and Learnings

 Retry confjguration and Timeouts in Ribbon  Right Class in Zuul 1.x (default retry only SocketException)

 RequestSpecifjcRetryHandler (Httpclient Exceptions)  zuul.client.ribbon.MaxAutoRetries=1  zuul.client.ribbon.MaxAutoRetriesNextServer=1  zuul.client.ribbon.OkT

  • RetryOnAllOperations=true

Eureka Timeouts  It Works  Everything needs to have redudancy  ASG is your friend :-)  Stateless Service FTW

slide-50
SLIDE 50

Microservice Producer Kafka / Storm :: Event System

slide-51
SLIDE 51

Chaos Results and Learnings

Before:

 Data was not in Elastic Search  Producers was loosing data

After:

 No Data Loss  It Works

Changes:

 No logging on Microservice :( (Log was added)  Code that publish events on a try-catch  Retry confjg in kafka producer from 0 to 5

slide-52
SLIDE 52
slide-53
SLIDE 53

Main Challenges

slide-54
SLIDE 54

Hacker Mindset

slide-55
SLIDE 55
slide-56
SLIDE 56

Next Steps  IPC  Spinnaker  Containers  Client side Aggregation  DevOps 2.0 -> Remediation / Skynet

slide-57
SLIDE 57

Pocs

https://github.com/diegopacheco/netflixoss-pocs http://diego-pacheco.blogspot.com.br/search/label/netflix?max-results=30

slide-58
SLIDE 58

Microservices reativos usando a stack do Netfmix na AWS

Diego Pacheco Principal Software Architect at ilegra.com @diego_pacheco

Obrigado!