Microservices reativos usando a stack do Netfmix na AWS Diego - - PowerPoint PPT Presentation
Microservices reativos usando a stack do Netfmix na AWS Diego - - PowerPoint PPT Presentation
Microservices reativos usando a stack do Netfmix na AWS Diego Pacheco Principal Software Architect at ilegra.com @diego_pacheco www.ilegra.com NetfmixOSS Stack Why Netfmix? Netfmix My Problem Billions Requests Per Day Social
www.ilegra.com
NetfmixOSS Stack
Why Netfmix? Billions Requests Per Day
1/3 US internet bandwidth ~10k EC2 Instances Multi-Region 100s Microservices Innovation + Solid Service SOA, Microservices and DevOps Benchmark
Social Product Social Network Video Docs Apps Chat Scalability Distributed T eams Could reach some Web Scale Netfmix My Problem
AWS
Cloud Native
Principles Stateless Services Ephemeral Instances Everything fails all the time Auto Scaling / Down Scaling Multi AZ and multi Region No SPOF Design for Failure (expected) SOA Microservices No Central Database NoSQL Lightweight Serializable Objects Latency tolerant protocols DevOps Enabler
Immutable Infrastructure Anti-Fragility
Right Set of Assumptons
Microservices
Reactive
Java Drivers X REST
X
Simple View of the Architecture
Zuul UI Microservice Cassandra Cluster
Stack
OSS
Zuul
Zuul
Karyon: Microbiology - Nucleus
Reactive Extensions + Netty Server Lower Latency under Heavy Load Fewer Locks, Fewer Thread Migrations Consumes Less CPU Lower Object Allocation Rate
RxNetty
Karyon: CODE
Karyon: Reactive
Karyon: Reactive
Eureka and Service Discovery http://microservices.io/patterns/server-side-discovery.html
Eureka
AWS Service Registry for Mid-tier Load balancing and Failover REST based Karyon and Ribbon Integration
Eureka
Eureka and Service Discovery
Availability
Histryx
IPC Library Client Side Load Balancing Multi-Protocol (HTTP, TCP, UDP) Caching* Batching Reactive
Ribbon
Ribbon CODE
Ribbon CODE
Reactive Extension of the JVM Async/Event based programming Observer Pattern Less 1mb Heavy usage by Netfmix OSS Stack
RX-Java
Archaius
Confjguration Management Solution Dynamic and T yped Properties High Throughtput and Thread Safety Callbacks: Notifjcations of confjg changes JMX Beans Dynamic Confjg Sources: File, Db, DynamoDB, Zookeper Based on Apache Commons Confjguration
Archaius + Git Microservice Microservice Slave Side Car Central Internal GIT Property Files File System Microservice Microservice Slave Side Car File System Microservice Microservice Slave Side Car File System
Asgard
Asgard
Packer
JOB Create
Bake/Provision
Launch
Deploys
Dynomite: Distributed Cache https://github.com/Netfmix/dynomite
Dynomite Implements the Amazon Dynamo Similar to Cassandra, Riak and DynamoDB Strong Consistency – Quorum-like – No Data Loss Pluggable Scalable Redis / Memcached Multi-Clients with Dyno Can use most of redis commands Integrated with Eureka via Prana
Isolate Failure – Avoid cascading Redundancy – NO SPOF Auto-Scaling Fault T
- lerance and Isolation
Recovery Fallbacks and Degraded Experience Protect Customer from failures – Don’t throw Failures -> Failures VS Errors
Dynomite: Distributed Cache
Dynomite: Internals
Oregon D1 Oregon D2 N California D3 Eureka Server Eureka Server
Prana Prana
Prana
Multi-Region Cluster
Dynomite: CODE
Dynomite Contributions
https://github.com/Netfmix/dynomite https://github.com/Netflix/dynomite/pull/207 https://github.com/Netflix/dynomite/pull/200
Caos Engineering
Gatling
Stress T esting T
- ol
Scala DSL Run on top of Akka Simple to use
Chaos Arch
Zuul Microservice N1 Microservice N2 Cassandra Cluster Zuul Eureka ELB
Running…
Chaos Results and Learnings
Retry confjguration and Timeouts in Ribbon Right Class in Zuul 1.x (default retry only SocketException)
RequestSpecifjcRetryHandler (Httpclient Exceptions) zuul.client.ribbon.MaxAutoRetries=1 zuul.client.ribbon.MaxAutoRetriesNextServer=1 zuul.client.ribbon.OkT
- RetryOnAllOperations=true
Eureka Timeouts It Works Everything needs to have redudancy ASG is your friend :-) Stateless Service FTW
Microservice Producer Kafka / Storm :: Event System
Chaos Results and Learnings
Before:
Data was not in Elastic Search Producers was loosing data
After:
No Data Loss It Works
Changes:
No logging on Microservice :( (Log was added) Code that publish events on a try-catch Retry confjg in kafka producer from 0 to 5
Main Challenges
Hacker Mindset
Next Steps IPC Spinnaker Containers Client side Aggregation DevOps 2.0 -> Remediation / Skynet
Pocs
https://github.com/diegopacheco/netflixoss-pocs http://diego-pacheco.blogspot.com.br/search/label/netflix?max-results=30