From Development to Production: Many Uses of Serverless - PowerPoint PPT Presentation

From Development to Production: Many Uses of Serverless Observability EMRAH SAMDAN | SEPTEMBER 9, 2019 Community Day 2019 Sponsors

Who am I? ● Developer for 6+ years ● Product manager for 2 years ● VP of Product for Thundra ● Organizing committee ● Serverlessdays İstanbul On October 3rd! @emrahsamdan

Agenda Let’s define serverless (yes once again!) ● Is observability a buzzword or a real thing? ● Observability challenges in serverless ● Observability Driven Development ● How/When to test serverless applications ● What to check to monitor serverless stack ● Troubleshooting serverless applications ● @emrahsamdan

What’s serverless Serverless computing is a cloud-computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. Wikipedia: https://en.wikipedia.org/wiki/Serverless_computing @emrahsamdan

What is serverless? Operational construct? Things that run perfectly and I don’t need to manage. Is Stripe, Auth0 serverless? @emrahsamdan

@emrahsamdan

What’s serverless? Utility computing I only pay per what I use. @emrahsamdan

What’s serverless? A doctrine, a thought model helping you deliver faster and put your focus on the value you provide to your customers. Ben Kehoe Paul Johnston @emrahsamdan

I agree you all. But! All the ups can go down when you don’t pay attention what’s really happening with serverless.

Shared Responsibility Model Cloud vendor will handle scalability and reliability. But performance and security IS still ON US.

Serverless Observability Serverless is full of hidden ● traps that can harm its promise. Can be very costly. ● Can perform really poor. ● You need to check what’s ● going on. @emrahsamdan

What’s observability? https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c @emrahsamdan

Are we ready for unknown unknowns? Known knowns Known unknowns Things that we are aware of but don’t Things that we understand and we are aware of. understand at a glance. Follow the metric charts. Yes, there is a peak over there. Let me dig the traces and logs. Unknown knowns Unknown unknowns Things that we can understand but we are Things we neither understand nor aware of. not aware of. I would have fixed this if I had that that That things are kaputt and I have no metric chart :( freaking idea with what I have. @emrahsamdan

The pillars Alerts Machine Learning and Insights Visualization Traces Metrics Logs @emrahsamdan

Observability Challenges in Serverless No access to underlying infrastructure ● You either take whatever cloud vendor provides or accept that there will ● be an overhead. Overhead? ● Gathering intelligence (should be acceptable) ○ Transporting it to where necessary (you only have invocation life time to take the data ○ out) Everything is event-driven and distributed. ● @emrahsamdan

Fix the Charizard. If you can!

Observability-Driven Development @emrahsamdan

Observability-Driven Development Recall that unknown-unknowns. ● Known knowns Known unknowns What do you need to when you have to ● Things that we understand and Things that we are aware of troubleshoot a unknown-unknown? but don’t understand at a we are aware of. glance. Can an observability tool know your Follow the metric charts. ● Yes, there is a peak over unknowns? there. Let me dig the traces and logs. If you don’t know what to know what ● Unknown knowns Unknown unknowns can you do? Things that we can understand Things we neither understand but we are not aware of. nor aware of. I would have fixed this if I had that that metric chart :( That things are kaputt and I have no freaking idea with what I have. @emrahsamdan

ASK Your observability tool should give you auto-replies. But it should also let you ask wise questions.

Observability-Driven Development Not a replacement for test-driven development. ● Think of the answers that you can give for any type of question. ● If you are thinking about questions, request that feature from your tool. ● Structured logging and manual instrumentation is the key. ● @emrahsamdan Retrieved from: https://dzone.com/articles/what-is-structured-logging

Observability-Driven Development (Cons) Observability coverage? ● Hard to accustom. ● You can’t sample a thing. ● @emrahsamdan

TESTING

Testing challenges in Serverless Local testing is a pain. ● How to mock the cloud resources. Is it actually correct to mock them? ○ How should you test the chain of many invocations? ○ How to integrate it with CI/CD tools? ○ Integration testing with real resources is still the best effort but again ● how? @emrahsamdan

Integration testing Serverless != Functions ● Test your business logic against the ● resources. See how your messages being ● transformed in the flow. Async events can cause problems ● that you can never guess. @emrahsamdan

Integration Testing (Cons) Still you’re dealing only with known -knowns. ● Resources that are not pay-per-use. ● Setting up a test environment. Still? ○ Not with the production data. ● @emrahsamdan

Chaos Testing Serverless Applications Serverless fits the chaos engineering greatly because ● Distributed ○ Lots of possibilities of failures in async environment ○ Event-driven (So poisonous events) ○ Roles and permissions are so granular that access can slip away. ○ @emrahsamdan

Chaos testing on serverless what? What would happen if inner ● Lambda starts to respond slow? Are you sure that you properly ● tuned timeouts? Test with injecting latency. ● @emrahsamdan

Chaos testing on serverless what? What if we lose the connection ● to Redis? @emrahsamdan

Chaos testing on serverless. How? ● https://github.com/adhorn/aws-lambda-chaos-injection ● https://github.com/gunnargrosch/ Adrian Hornsby Gunnar Grosch @emrahsamdan

MONITORING How large should be my screen to see the charts for thousands of functions?

How to discover an anomaly in serverless? @emrahsamdan

@emrahsamdan

Serverless is more than functions, so is monitoring. Issues can stay local before you notice them. ● It is slow. Why? ● API slowdown? ○ Throttle on any resource? ○ Bad coding practice? ○ Invocation counts go crazy. ● Seasonal peak? ○ Successful product? ○ Retry storms? ○ @emrahsamdan

@emrahsamdan

Monitoring Latency Abnormal latency is mostly not related ● with the function code. Idly waiting for a third party API. ○ Throttled resource ○ Set aggregated alerts ● Alert on transaction duration ○ Alert on function duration. ○ Alert on operation duration ○ @emrahsamdan

Storm of retries and errors When your code fails for some reason, your function will retry several ● times. Sync events: You should control it. ○ Async events: Different retry mechanisms. ○ Stream based events : Risk of losing data. ○ Does this solve? ● Check ● Iterator age ○ Number of retries ○ Number of errors ○ Memory usage ○ Cold starts ○ @emrahsamdan

TROUBLESHOOTING Bad things happen in serverless, too. Now, it’s time to battle!

Failure modes of serverless ● https://github.com/adhorn/aws-lambda-chaos-injection ● https://github.com/gunnargrosch/ @emrahsamdan

Failure modes of serverless Bad-tuned memory Timeout Error in code or in a managed resource @emrahsamdan

Consequences Downtimes Huge Bills Unhappy customers @emrahsamdan

Challenges of Troubleshooting How to trace the distributed async events with non-aggregated traces, metrics and logs? How to trace requests to external resources? How to trace the async distributed events? @emrahsamdan

Distributed Tracing Trace the distributed ● transactions: chain of multiple invocations Understand what is ● wrong with a glance But?! What if the I ● have a bad coding practice in the code? @emrahsamdan

Local Tracing Instrument the code itself ● and check against code quality. Good for discovering ● Bad coding practices ○ Value of local variables in the ○ code. Debugging the code without ○ breakpoints. @emrahsamdan

Actionable Alerts in Serverless Alert on code errors ● Stacktrace ○ Code line it caused ○ Values of (Local variables) ○ Alert on latencies and timeout ● errors Slow API communications ○ Slow DB interaction for bad queries ○ @emrahsamdan

How to respond to the issues on serverless Issue may not be your code. ● Check the third parties. ● Check other metrics ● Iterator age of streams ○ Throttles on resources ○ Have some runbooks ● Exponential backoffs to APIs, Alternative APIs ○ Healthy on-call structures ○ @emrahsamdan

From Development to Production: Many Uses of Serverless - PowerPoint PPT Presentation

From Development to Production: Many Uses of Serverless Observability EMRAH SAMDAN | SEPTEMBER 9, 2019 Community Day 2019 Sponsors Who am I? Developer for 6+ years Product manager for 2 years VP of Product for Thundra

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production

PRODUCTION EXECUTION PRODUCTION EXECUTION Table of contents Course Map Module 1: Production

Materials Production Materials Production Materials Production Materials Production

Materials Production Materials Production Materials Production Materials Production T. G.

Comparing P2P Systems Anthony D. Joseph John Kubiatowicz CS294-4 Why so many systems? Many

Animal protein production in a Animal protein production in a Animal protein production in a

Monthly production from NCS 2020 compared with prognosis and 2019 Updated to March Production

Spirits Production Presented by: Marisa Krieg Agenda: 1. Production Concepts 2. Basics

COMMODITY STREAMING NOLAN WATSON Timeline to Production Success of Anticipated Production 78%

Getting a System to Production and keeping it there Eoin Woods, Endava Content

Introduction to Linear Programming Dominik Scheder Products Resources production production

Many-Valued Logic Daniel Bonevac February 27, 2013 Daniel Bonevac Many-Valued Logic Rationales

Collision Detection 1 2 Many Different Situations Many Different Situations Thin moving

Faster Code Nicolas Limare 2014/11/19 faster? one task vs many speeds one operation vs many

METHODS METHODS METHODS METHODS of of of of RADIONUCLIDE PRODUCTION RADIONUCLIDE PRODUCTION

2010 Computing on Grids and Supercomputers Improving Many-Task Computing in Scientific Workflows

Privacy Issues in Cloud computing Zeeshan Ali Shah System administrator PhD researcher KTH PDC

Infrastructure Technologies for Large- Scale Service-Oriented Systems Kostas Magoutis

Cloud Computing & Scalability Reid Holmes REID HOLMES - CPSC 410: ADVANCED SOFTWARE

Announcements CS 4100: Artificial Intelligence Markov Decision Processes II Homework k 4:

Oracles in TTCN-3 and UTP Ina Schieferdecker 2012, May 22nd, CREST Workshop, London Outline

Estimating the Specific Indirect Effect for Multiple Types of Correspondence Audit DISCUSSED BY:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 18:

From Development to Production: Many Uses of Serverless - PowerPoint PPT Presentation

From Development to Production: Many Uses of Serverless Observability EMRAH SAMDAN | SEPTEMBER 9, 2019 Community Day 2019 Sponsors Who am I? Developer for 6+ years Product manager for 2 years VP of Product for Thundra

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production

PRODUCTION EXECUTION PRODUCTION EXECUTION Table of contents Course Map Module 1: Production

Materials Production Materials Production Materials Production Materials Production

Materials Production Materials Production Materials Production Materials Production T. G.

Comparing P2P Systems Anthony D. Joseph John Kubiatowicz CS294-4 Why so many systems? Many

Animal protein production in a Animal protein production in a Animal protein production in a

Monthly production from NCS 2020 compared with prognosis and 2019 Updated to March Production

Spirits Production Presented by: Marisa Krieg Agenda: 1. Production Concepts 2. Basics

COMMODITY STREAMING NOLAN WATSON Timeline to Production Success of Anticipated Production 78%

Getting a System to Production and keeping it there Eoin Woods, Endava Content

Introduction to Linear Programming Dominik Scheder Products Resources production production

Many-Valued Logic Daniel Bonevac February 27, 2013 Daniel Bonevac Many-Valued Logic Rationales

Collision Detection 1 2 Many Different Situations Many Different Situations Thin moving

Faster Code Nicolas Limare 2014/11/19 faster? one task vs many speeds one operation vs many

METHODS METHODS METHODS METHODS of of of of RADIONUCLIDE PRODUCTION RADIONUCLIDE PRODUCTION

2010 Computing on Grids and Supercomputers Improving Many-Task Computing in Scientific Workflows

Privacy Issues in Cloud computing Zeeshan Ali Shah System administrator PhD researcher KTH PDC

Infrastructure Technologies for Large- Scale Service-Oriented Systems Kostas Magoutis

Cloud Computing &amp; Scalability Reid Holmes REID HOLMES - CPSC 410: ADVANCED SOFTWARE

Announcements CS 4100: Artificial Intelligence Markov Decision Processes II Homework k 4:

Oracles in TTCN-3 and UTP Ina Schieferdecker 2012, May 22nd, CREST Workshop, London Outline

Estimating the Specific Indirect Effect for Multiple Types of Correspondence Audit DISCUSSED BY:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 18:

Cloud Computing & Scalability Reid Holmes REID HOLMES - CPSC 410: ADVANCED SOFTWARE