The Hardware & Software Implications of Microservices and How - PowerPoint PPT Presentation

The Hardware & Software Implications of Microservices and How Big Data Can Help Christina Delimitrou Cornell University with Yu Gan, Yanqi Zhang, Shuang Chen, Neeraj Kulkarni, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Ankitha Shetty, Nayan Katarki, Brett Clancy, Chris Colen, Dailun Cheng, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Meghna Pancholi, Siyuan Hu ASBD Workshop – June 2 nd 2018

Executive Summary ¨ Shift from monoliths to microservices: ¤ Modularity, specialization, simplicity, accelerated development ¤ Change assumptions about datacenter server design ¤ Complicate scheduling and resource management ¤ Amplify tail@scale effects ¨ Revisit architectural design decisions for microservices ¨ Highlight management challenges of microservices ¨ Motivate the need for data-driven approaches for systems whose scale & complexity keeps increasing 2

From Monoliths to Microservices 3

Motivation ¨ Advantages of microservices: ¤ Ease & speed of code development & deployment ¤ Security, error isolation ¤ PL/framework heterogeneity ¨ Challenges of microservices: ¤ Change server design assumptions ¤ Complicate resource management à dependencies ¤ Amplify tail-at-scale effects ¤ More sensitive to performance unpredictability ¤ No representative end-to-end apps with microservices 4

An End-to-End Suite for Cloud & IoT Microservices ¨ 4 end-to-end applications using popular open-source microservices à ~30-40 microservices per app ¤ Social Network ¤ Movie Reviewing/Renting/Streaming ¤ E-commerce ¤ Drone control service ¨ Programming languages and frameworks: ¤ node.js, Python, C/C++, Java/Javascript, Scala, PHP , and Go ¤ Nginx, memcached, MongoDB, CockroachDB, Mahout, Xapian ¤ Apache Thrift RPC, RESTful APIs ¤ Docker containers ¤ Lightweight RPC-level distributed tracing 5

Client Movie Streaming http Load Balancer http Front- nginx end fastcgi php-fpm Compose text uniqueID movieID AssignRating Phase All arrows are Thrift RPCs ComposeReview Login Store UpdateUser ReviewStorage UserReviewDB MovieReviewDB UpdateMovie Phase memcached memcached memcached memcached mongoDB mongoDB mongoDB mongoDB 6

Movie Streaming ¨ Browse movie info (movie plot, photos, videos, cast, stats, etc.) ¨ ML widgets: n Recommender for movies to watch n Recommender for ads ¨ User authentication/Payment ¨ Search: n Xapian: search movie DB ¨ Analytics: n Mahout: user analytics based on input stored in HDFS n Spark MLlib: in-memory ML analytics 7

Architectural Implications [CAL’18] ¨ Big vs. small servers: ¤ Power management using RAPL ¤ More pressure on single-thread performance, low tail latency 8

Architectural Implications Movie Streaming Social Network E-commerce IoT - Cloud Tail Latency (msec) Tail Latency (msec) 50 180 12 25 Tail Latency (sec) Tail Latency (sec) 45 160 Xeon 10 40 20 140 35 ThunderX 120 8 15 30 100 6 25 80 10 4 20 60 15 40 2 5 10 20 0 5 0 0 0 50 100 150 200 250 250 0 100 200 300 400 500 250 0 20 40 60 80 100 0 50 100 150 200 250 QPS QPS QPS QPS ¨ Big vs. small servers: ¤ Power management using RAPL ¤ More pressure on single-thread performance, low tail latency ¤ Low-power SoCs, e.g., Cavium ThunderX2 ¤ Similar latency, but earlier saturation 9

Architectural Implications Movie Streaming 12 Tail Latency (msec) Application proc 10 TCP proc (RPCs) 8 6 4 2 0 ProcText nginx AssignR MovieID ReviewID Compose RevStorage UserReview MovReview memcached mongoDB End-to-End Monolith ¨ Computation:Communication ratio: ¤ Monolithic service à 70:30 @ high load ¤ Microservices à 50:50 @ high load 10

Architectural Implications DRAM DRAM DRAM QPI PCIe Gen3 10Gbps CPU CPU Virtex7 QSFP QSFP PCIe Gen3 NIC QSFP 10Gbps ¨ Computation:Communication ratio: ¤ Monolithic service à 70:30 @ high load ¤ Microservices à 50:50 @ high load ¤ RPC/REST acceleration à NIC offloads, FPGAs 11

Architectural Implications Social Network E-Commerce 40 45 40 35 35 30 30 L1i MPKI L1i MPKI 25 25 20 20 15 15 10 10 5 5 0 0 frontend login orders search cart wishlist catalogue recommend shipping payment invoice qMaster mem$ mongodb nginx text image msgID tagUser urlShorten compose video msgStore wrTimeline wrGraph mem$ mongodb ¨ L1-i cache pressure: ¤ Monoliths à Large code footprints à L1i thrashing ¤ Microservices à Small footprint/microservice n Assuming dedicated cores 12

End-to-End Latency Breakdown ¨ Post-rightsizing (resource ratios to avoid glaring bottlenecks) ¨ Bottlenecks shift with load ¨ Need online, dynamic decisions 13

Resource Management Implications Netflix Twitter Movie Streaming Amazon ¨ Challenges of microservices: ¤ Change server design assumptions ¤ Dependencies complicate resource management 14

Dependencies & Backpressure read <k,v> http1 nginx nginx nginx nginx nginx nginx nginx nginx mem$ 15

Dependencies & Backpressure read <k,v> http1 nginx nginx nginx nginx mem$ mem$ mem$ mem$ nginx nginx nginx nginx nginx mem$ mem$ TX RX RX http2 nginx mem$ ¨ Traditional techniques like autoscale may help/penalize the wrong microservice ¨ Dependencies change at runtime à difficult to infer impact 16

Determine Per-Tier QoS ¨ Queueing models QoS 2 nginx mem$ QoS 1 ¨ Queueing network simulation ¤ Complex microservices graphs, blocking, cyclic dependencies, etc. 17

Power Management for Microservices ¨ Two types of latency slack: ¤ Microservices off the critical path ¤ Microservices on the critical path with relaxed QoS Frequency Frequency Frequency 2.2GHz End-to-end Latency End-to-end Latency End-to-end Latency QoS Utilization Utilization Utilization 100 0 time time 18 time

Scalability Challenges ¨ Determine per-tier QoS for 1000s of microservices à intractable ¨ Put visceral graph here… 19

Tail at Scale Effects ¨ Microservices add an extra dimension to tail at scale effects ¤ A single slow microservice affects end-to-end latency ¤ Much more pressure on performance predictability & availability ¤ Monitoring at the edge ¨ Determining per-tier QoS for 10000s of microservices is intractable ¤ Scalable data-driven approach ¨ Need for online performance debugging 20

Proactive Performance Debugging ¨ Dependencies between microservices à propagate & amplify QoS violations ¤ Finding the culprit of a QoS violation is difficult ¤ Post-QoS violation, returning to nominal operation is hard ¨ Anticipating QoS violations & identifying culprits ¨ Seer: Data-driven Performance Debugging for Microservices ¤ Combines lightweight RPC-level distributed tracing with hardware monitoring ¤ Leverages scalable deep learning to signal QoS violations with enough slack to apply corrective action 21

Queue CPU Mem Net Disk Performance Implications 22

Queue CPU Mem Net Disk Performance Implications 23

Seer: Data-Driven Performance Debugging [HotCloud’18] ¨ Leverage the massive amount of traces collected over time Apply online, practical data mining techniques that 1. identify the culprit of an upcoming QoS violation Use per-server hardware monitoring to determine the 2. cause of the QoS violation Take corrective action to prevent the QoS violation from 3. occurring ¨ Need to predict 100s of msec – a few sec in the future 24

Gantt charts microservices Tracing Framework Client http latency ¨ RPC level tracing […] ¨ Based on Apache Thrift WebUI ¨ Timestamp start-end QueryEngine for each microservice TCP TCP proc RX zTracer App proc ¨ Store in centralized DB Proc Cassandra TCP proc TX TCP (Cassandra) uService K RPC time TX ¨ Record all requests à Tracing RPC time RX Collector No sampling TCP ¨ Overhead: <0.1% in zTracer Proc throughput and <0.2% TCP uService K+1 in tail latency […] 25

Deep Learning to the Rescue ¨ Why? ¤ Architecture-agnostic ¤ Adjusts to changes in dependencies over time ¤ High accuracy, good scalability ¤ Inference within the required window 26

DNN Configuration Input Output signal signal Which ¨ Container microservice utilization will cause a QoS violation ¨ Latency in the near future? ¨ Queue depth 27

DNN Configuration Input Output signal signal Which ¨ Container microservice utilization will cause a QoS violation ¨ Latency in the near future? ¨ Queue depth 28

DNN Configuration ¨ Training once: slow (hours - days) ¤ Across load levels, load distributions, request types ¤ Distributed queue traces, annotated with QoS violations ¤ Weight/bias inference with SGD ¤ Retraining in the background ¨ Inference continuously: streaming trace data 93% accuracy in signaling upcoming QoS violations 91% accuracy in attributing QoS violation to correct microservice 29

DNN Configuration Accuracy stable or increasing with cluster size ¨ Challenges: ¤ In large clusters inference too slow to prevent QoS violations ¤ Offload on TPUs, 10-100x improvement; 10ms for 90 th %ile inference ¤ Fast enough for most corrective actions to take effect (net bw partitioning, RAPL, cache partitioning, scale-up/out, etc.) 30

The Hardware & Software Implications of Microservices and How - PowerPoint PPT Presentation

The Hardware & Software Implications of Microservices and How Big Data Can Help Christina Delimitrou Cornell University with Yu Gan, Yanqi Zhang, Shuang Chen, Neeraj Kulkarni, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson,

Microservices Security Fundamentals MICROSERVICES SECURITY CHALLENGES Wojciech Lesniak PRINCIPAL

Hardware Observability Framework Hardware Observability Framework Hardware Observability

WHAT COMES AFTER MICROSERVICES? MATT RANNEY WHAT COMES AFTER MICROSERVICES? MATT RANNEY We

Microservices and OSGi running with Apache Karaf Agenda No free Lunch - microservices

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

Microservices and Monorepos Match made in heaven? Sven Erik Knop, Perforce Software Overview

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Beyond Microservices: Streams, State and Scalability Gwen Shapira, Engineering Manager @gwenshap

KrakenD API Gateway Product overview @devopsfaith Microservices are challenging The need for

Reactive Microsystems The Evolution of Microservices at Scale Jonas Bonr @jboner

Events First Microservices Jonas Bonr @jboner So, you want to do microservices? Make sure

Microservices: Service Oriented Development Rafael Schloming How do I break up my monolith? How

Microservices Smaller is Better? Eberhard Wolff Freelance consultant & trainer

Testing Java Microservices with Consumer-driven contracts Andrew Morgan @mogronalol

Managing Managing Microservices Microservices E ff ectively E ff ectively Daniel Hall

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Resource Efficient Computing for Warehouse-scale Datacenters Christos Kozyrakis Stanford

Reaching reliable agreement in an unreliable world Heidi Howard heidi.howard@cl.cam.ac.uk

Analyzing Direct Marketing Data Marketing Data with R Liang Wei Brendan Kitts Lucid Commerce

[V IRTUALIZATION ] Shrideep Pallickara Computer Science Colorado State University CS370:

Chapter 20 Intruders Cryptography and Network Security They agreed that Graham should set the

Why Do We Need Multiple Recovery Options? TECHNOLOGY PARTNERS www.vembu.com Agenda Introduction

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud Sean Barker and

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

The Hardware & Software Implications of Microservices and How - PowerPoint PPT Presentation

The Hardware & Software Implications of Microservices and How Big Data Can Help Christina Delimitrou Cornell University with Yu Gan, Yanqi Zhang, Shuang Chen, Neeraj Kulkarni, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson,

Microservices Security Fundamentals MICROSERVICES SECURITY CHALLENGES Wojciech Lesniak PRINCIPAL

Hardware Observability Framework Hardware Observability Framework Hardware Observability

WHAT COMES AFTER MICROSERVICES? MATT RANNEY WHAT COMES AFTER MICROSERVICES? MATT RANNEY We

Microservices and OSGi running with Apache Karaf Agenda No free Lunch - microservices

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

Microservices and Monorepos Match made in heaven? Sven Erik Knop, Perforce Software Overview

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Beyond Microservices: Streams, State and Scalability Gwen Shapira, Engineering Manager @gwenshap

KrakenD API Gateway Product overview @devopsfaith Microservices are challenging The need for

Reactive Microsystems The Evolution of Microservices at Scale Jonas Bonr @jboner

Events First Microservices Jonas Bonr @jboner So, you want to do microservices? Make sure

Microservices: Service Oriented Development Rafael Schloming How do I break up my monolith? How

Microservices Smaller is Better? Eberhard Wolff Freelance consultant &amp; trainer

Testing Java Microservices with Consumer-driven contracts Andrew Morgan @mogronalol

Managing Managing Microservices Microservices E ff ectively E ff ectively Daniel Hall

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Resource Efficient Computing for Warehouse-scale Datacenters Christos Kozyrakis Stanford

Reaching reliable agreement in an unreliable world Heidi Howard heidi.howard@cl.cam.ac.uk

Analyzing Direct Marketing Data Marketing Data with R Liang Wei Brendan Kitts Lucid Commerce

[V IRTUALIZATION ] Shrideep Pallickara Computer Science Colorado State University CS370:

Chapter 20 Intruders Cryptography and Network Security They agreed that Graham should set the

Why Do We Need Multiple Recovery Options? TECHNOLOGY PARTNERS www.vembu.com Agenda Introduction

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud Sean Barker and

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Microservices Smaller is Better? Eberhard Wolff Freelance consultant & trainer