From Development to Production: Many Uses of Serverless Observability
EMRAH SAMDAN | SEPTEMBER 9, 2019
Community Day 2019 Sponsors
From Development to Production: Many Uses of Serverless - - PowerPoint PPT Presentation
From Development to Production: Many Uses of Serverless Observability EMRAH SAMDAN | SEPTEMBER 9, 2019 Community Day 2019 Sponsors Who am I? Developer for 6+ years Product manager for 2 years VP of Product for Thundra
EMRAH SAMDAN | SEPTEMBER 9, 2019
Community Day 2019 Sponsors
@emrahsamdan
@emrahsamdan
@emrahsamdan
Serverless computing is a cloud-computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine
application, rather than on pre-purchased units of capacity.
Wikipedia: https://en.wikipedia.org/wiki/Serverless_computing
@emrahsamdan
Operational construct? Things that run perfectly and I don’t need to manage. Is Stripe, Auth0 serverless?
@emrahsamdan
@emrahsamdan
Utility computing I only pay per what I use.
@emrahsamdan
A doctrine, a thought model helping you deliver faster and put your focus on the value you provide to your customers.
Ben Kehoe Paul Johnston
@emrahsamdan
traps that can harm its promise.
going on.
@emrahsamdan
https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c
@emrahsamdan
Known knowns Things that we understand and we are aware of. Follow the metric charts. Known unknowns Things that we are aware of but don’t understand at a glance. Yes, there is a peak over there. Let me dig the traces and logs. Unknown knowns Things that we can understand but we are not aware of. I would have fixed this if I had that that metric chart :( Unknown unknowns Things we neither understand nor aware of. That things are kaputt and I have no freaking idea with what I have.
@emrahsamdan
Traces Metrics Logs Visualization Machine Learning and Insights Alerts
@emrahsamdan
be an overhead.
○ Gathering intelligence (should be acceptable) ○ Transporting it to where necessary (you only have invocation life time to take the data
@emrahsamdan
@emrahsamdan
troubleshoot a unknown-unknown?
unknowns?
can you do?
Known knowns Things that we understand and we are aware of. Follow the metric charts. Known unknowns Things that we are aware of but don’t understand at a glance. Yes, there is a peak over
and logs. Unknown knowns Things that we can understand but we are not aware of. I would have fixed this if I had that that metric chart :( Unknown unknowns Things we neither understand nor aware of. That things are kaputt and I have no freaking idea with what I have.
@emrahsamdan
Retrieved from: https://dzone.com/articles/what-is-structured-logging
@emrahsamdan
@emrahsamdan
○ How to mock the cloud resources. Is it actually correct to mock them? ○ How should you test the chain of many invocations? ○ How to integrate it with CI/CD tools?
how?
@emrahsamdan
resources.
transformed in the flow.
that you can never guess.
@emrahsamdan
○ Setting up a test environment. Still?
@emrahsamdan
○ Distributed ○ Lots of possibilities of failures in async environment ○ Event-driven (So poisonous events) ○ Roles and permissions are so granular that access can slip away.
@emrahsamdan
Lambda starts to respond slow?
tuned timeouts?
@emrahsamdan
to Redis?
@emrahsamdan
Adrian Hornsby Gunnar Grosch
@emrahsamdan
@emrahsamdan
@emrahsamdan
○ API slowdown? ○ Throttle on any resource? ○ Bad coding practice?
○ Seasonal peak? ○ Successful product? ○ Retry storms?
@emrahsamdan
@emrahsamdan
with the function code.
○ Idly waiting for a third party API. ○ Throttled resource
○ Alert on transaction duration ○ Alert on function duration. ○ Alert on operation duration
@emrahsamdan
times.
○ Sync events: You should control it. ○ Async events: Different retry mechanisms. ○ Stream based events: Risk of losing data.
○ Iterator age ○ Number of retries ○ Number of errors ○ Memory usage ○ Cold starts
@emrahsamdan
@emrahsamdan
Bad-tuned memory Timeout Error in code or in a managed resource
@emrahsamdan
Downtimes Huge Bills Unhappy customers
@emrahsamdan
How to trace the distributed async events with non-aggregated traces, metrics and logs? How to trace requests to external resources? How to trace the async distributed events?
@emrahsamdan
transactions: chain of multiple invocations
wrong with a glance
have a bad coding practice in the code?
@emrahsamdan
and check against code quality.
○ Bad coding practices ○ Value of local variables in the code. ○ Debugging the code without breakpoints.
@emrahsamdan
○ Stacktrace ○ Code line it caused ○ Values of (Local variables)
errors
○ Slow API communications ○ Slow DB interaction for bad queries
@emrahsamdan
○ Iterator age of streams ○ Throttles on resources
○ Exponential backoffs to APIs, Alternative APIs ○ Healthy on-call structures
@emrahsamdan
engineering before your engineering run into chaos!
discover local bottlenecks with an architectural view!
resources and APIs. Full tracing required!
that you can predict.