aaronrinehart verica io chaosengineering aaronrinehart

@aaronrinehart @verica_io #chaosengineering @aaronrinehart - PowerPoint PPT Presentation

@aaronrinehart @verica_io #chaosengineering @aaronrinehart @verica_io #chaosengineering CONFIDENTIAL Security Precognition @aaronrinehart @verica_io #chaosengineering Resilience is the story of the outage that never happened. - John


  1. @aaronrinehart @verica_io #chaosengineering

  2. @aaronrinehart @verica_io #chaosengineering

  3. CONFIDENTIAL

  4. Security Precognition @aaronrinehart @verica_io #chaosengineering

  5. “Resilience is the story of the outage that never happened.” - John Allspaw @aaronrinehart

  6. About A.A.Ron CTO of Stealthy Startup ● Former Chief Security Architect ● @UnitedHealth responsible for security engineering strategy Led the DevOps and Open Source ● Transformation at UnitedHealth Group Former (DOD, NASA, DHS, CollegeBoard ) ● Frequent speaker and author on Chaos ● Engineering & Security Pioneer behind Security Chaos Engineering ● Led ChaoSlingr team at UnitedHealth ● 6

  7. In this Session we will cover

  8. Our systems have evolved beyond human ability to mentally model their behavior. 8

  9. Our systems have evolved beyond human ability to mentally model their behavior. everyone 9

  10. Complex? Microservice Continuous Distribute Architectures Delivery d Systems Automation Pipelines Containers Continuous Blue/Green DevOps Integration Deployments Immutable Cloud Infrastructure Infracode CI/CD Computing Service Mesh Auto Canaries API Circuit Breaker Patterns 11

  11. Security? Stateful in Mostly Expert nature Monolithic Systems Prevention Adversary Poorly focused Focused Aligned DevSecOps Defense Requires not widely in Depth Domain adopted Knowledge 12

  12. Simplify?

  13. Software Only Increases in Complexity

  14. Software Complexity Essential Accidental Complexity Complexity

  15. Woods Theorem: “As the complexity of a system increases, the accuracy of any single agent’s own model of that system decreases” - Dr. David Woods

  16. How well do you really understand how your system works?

  17. Difficult to Grok behavior

  18. So what does all of this have to do with Security?

  19. Failure Happens.

  20. Incidents & System Outages are Expensive

  21. Security Incidents are Subjective in Nature

  22. We really don't know very much Who? Where? Why? What? How?

  23. Lets face it, when outages happen…..

  24. Teams spend too much time reacting to outages instead of building more resilient systems .

  25. “Response” is the problem with Incident Response

  26. “Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s ability to withstand turbulent conditions”

  27. P Ch A A Control A Experiment

  28. Who is doing Chaos?

  29. Security Engineering

  30. Security Engineering

  31. People Operate Differently when they expect things to fail

  32. The Normal Condition of a Human & Systems they Build is to FAIL

  33. We need failure to Learn & Grow 39

  34. Lets Flip the Model Post Mortem = Preparation

  35. Bring Order through Chaos

  36. Use Chaos Engineering to initiate Objective Feedback Loops about Security Effectiveness

  37. Proactively Manage & Measure Validate Runbooks Measure Team Skills Determine Control Effectiveness Learn new insights into system behavior Transfer knowledge Build a learning culture

  38. Testing vs. Experimentation

  39. Security Crayon Differences Noisy distributed system behavior Not geared for Cascading Events Point-in-time even if Automated Performed by Security Teams with Specialized skill sets

  40. Security Chaos Differences Distributed Systems Focus Goal: Experimentation Human Factors focused Small Isolated Scope Focus on Cascading Events Performed by Mixed Engineering Teams in Gameday During business hours

  41. 2018 Causes of Data Breaches

  42. 2018 Causes of Data Breaches

  43. 2018 Causes of Data Breaches

  44. 2018 Causes of Data Breaches

  45. ‘Human Error’, Root Cause, & Blame Culture

  46. Proactively Manage & Measure

  47. Continuous SECURITY Validation

  48. Build Confidence in What Actually Works

  49. So how does it work?

  50. Stop looking for better answers and start asking better questions. - John Allspaw

  51. What is the system actually doing?

  52. What is the system actually doing? Has it done this before?

  53. What is the system actually doing? Has it done this before? Why is it behaving that way?

  54. What is the system actually doing? Has it done this before? Why is it behaving that way? What is it supposed to do next?

  55. What is the system actually doing? Has it done this before? Why is it behaving that way? What is it supposed to do next? How did it get into this state?

  56. How does My Security Really Work?

  57. What evidence do I have to prove it?

  58. An Open Source Tool 64

  59. ChaoSlingr Product Features Serverless App in AWS • ChatOps Integration • 100% Native AWS • Configuration-as-Code • Configurable Operational Mode & • Example Code & Open Framework • Frequency Opt-In | Opt-Out Model •

  60. Firewall? Config Log Alert IR Wait... Mgmt? data? SOC? Triage Misconfigured Port Injection Hypothesis: If someone accidentally or maliciously introduced a misconfigured port then we would immediately detect, block, and alert on the event.

  61. Firewall? Config Log Alert IR Wait... Mgmt? data? SOC? Triage Result: Hypothesis disproved. Firewall did not detect Misconfigured or block the change on all instances. Standard Port Port Injection AAA security policy out of sync on the Portal Team instances. Port change did not trigger an alert and log data indicated successful change audit. However we unexpectedly learned the configuration mgmt tool caught change and alerted the SoC.

  62. More Experiment Examples ● Software Secret Clear ● Internet exposed Text Disclosure Kubernetes API ● Permission collision in ● Unauthorized Bad Shared IAM Role Policy Container Repo ● Disabled Service Event ● Unencrypted S3 Bucket Logging ● Disable MFA ● Introduce Latency on ● Bad AWS Automated Block Security Controls Rule ● API Gateway Shutdown

  63. Q&A @aaronrinehart aaron@verica.io

  64. Thank you! @aaronrinehart aaron@verica.io

  65. CONFIDENTIAL

Recommend


More recommend