Distributed Authorization System: A Netflix case study Manish Mehta - - PowerPoint PPT Presentation

distributed authorization system a netflix case study
SMART_READER_LITE
LIVE PREVIEW

Distributed Authorization System: A Netflix case study Manish Mehta - - PowerPoint PPT Presentation

Distributed Authorization System: A Netflix case study Manish Mehta - Chief Security Architect @ Volterra Torin Sandall - Co-founder of Open Policy Agent project - Software Engineer @ Velocity 2018 June 12-14 Manish Mehta Torin Sandall


slide-1
SLIDE 1

Distributed Authorization System: A Netflix case study

Manish Mehta

  • Chief Security Architect @ Volterra

Torin Sandall

  • Co-founder of Open Policy Agent project
  • Software Engineer @

Velocity 2018

June 12-14

slide-2
SLIDE 2

Velocity San Jose '18

Manish Mehta

Senior Security Engineer @ Netflix Chief Security Architect @ Volterra manish@ves.io Projects:

  • Bootstrapping Identities
  • Secrets Management
  • PKI
  • Authentication
  • Authorization

Torin Sandall

Co-founder of the OPA project Software Engineer @ Styra Projects:

  • Open Policy Agent
  • Kubernetes
  • Istio (security SIG)
  • Likes: Go, Quality, Good abstractions

@sometorin @OpenPolicyAgent

slide-3
SLIDE 3

Velocity San Jose '18

Background - Definitions

Transfer $1000 from Account X to Account Y

Me My Bank

These 2 steps do not need to be tied together !!

  • 1. Verify the Identity of the Requester (Authentication or AuthN)
  • 2. Verify that the Requestor is authorized to perform

the requested operation (Authorization or AuthZ)

slide-4
SLIDE 4

Velocity San Jose '18

Background - Netflix Architecture

Cloud Provider Resources Netflix Backend - Internal Resources Customer Employee Partner Resources CDN

slide-5
SLIDE 5

Velocity San Jose '18

Background - Netflix Architecture

Cloud Provider Resources Customer Partner Resources CDN Netflix Backend - Internal Resources Employee

slide-6
SLIDE 6

Velocity San Jose '18

AuthZ Problem

  • Identity I
  • can/cannot perform
  • Operation O
  • on
  • Resource R
  • For ALL combinations of I, O, and R in the ecosystem.

A (simple) way to define and enforce rules that read

slide-7
SLIDE 7

Velocity San Jose '18

Design Considerations

Company Culture

  • Freedom and Responsibility

Resource Types

  • REST endpoints, gRPC methods,

SSH, Crypto Keys, Kafka Topics, …

Identity Types

  • VM/Container Services, Batch Jobs,

Employees, Contractors, …

Underlying Protocols

  • HTTP(S), gRPC, Custom/Binary, …

Implementation Languages

  • Java, Node JS, Python, Ruby, …

Latency

  • Call depth and Service rate

Flexibility of Rules

  • Hard-coded structure vs. language-based

Capture Intent

  • Did you actually do what you think you did?
  • Don’t just trust, verify !!
slide-8
SLIDE 8

Velocity San Jose '18

High-level Architecture

Distributor Distributor

Distributor

AuthZ Agent

App Code

S S H

Policy Portal

App Code AuthZ Agent Distributor Distributor

Aggregator

Employee Management System Policy DB Build Manifest

Service A

Service B Application Ownership DB Policy DB

slide-9
SLIDE 9

Velocity San Jose '18

High-level Architecture

Distributor Distributor

Distributor

AuthZ Agent

App Code

S S H

Service A

App Code AuthZ Agent Service B Distributor Distributor

Aggregator

Employee Management System Build Manifest Application Ownership DB

Policy Portal

Policy DB

slide-10
SLIDE 10

Velocity San Jose '18

High-level Architecture

Distributor Distributor

Distributor

Policy Portal

AuthZ Agent

App Code

S S H

Service A

App Code AuthZ Agent Service B Distributor Distributor

Aggregator

Employee Management System Build Manifest Application Ownership DB Policy DB

slide-11
SLIDE 11

Velocity San Jose '18

High-level Architecture

Policy Portal

AuthZ Agent

App Code

S S H

Service A

App Code AuthZ Agent Service B Employee Management System Build Manifest Application Ownership DB Distributor Distributor

Distributor

Policy DB Distributor Distributor

Aggregator

slide-12
SLIDE 12

Velocity San Jose '18

High-level Architecture

Policy Portal

Distributor Distributor

Aggregator

Employee Management System Build Manifest Application Ownership DB Policy DB

App Code

S S H App Code Distributor Distributor

Distributor

AuthZ Agent

Service A

Service B AuthZ Agent

slide-13
SLIDE 13

Velocity San Jose '18

High-level Architecture

Policy Portal

Distributor Distributor

Aggregator

Employee Management System Build Manifest Application Ownership DB Policy DB Distributor Distributor

Distributor

AuthZ Agent

App Code

S S H

Service A

App Code AuthZ Agent Service B

slide-14
SLIDE 14

Velocity San Jose '18

AuthZ Agent Internals

AuthZ Agent API Stager

Open Policy Agent Engine

Updater

Periodic updates

  • n policies

and associated data Request Decision

slide-15
SLIDE 15

Velocity San Jose '18

Example Setup

AuthZ Agent

App Code Payroll Service

GET /getSalary/{user} POST /updateSalary/{user}

Performance Review Report Generator Bob Alice

Authorization Policy

1. Employees can read their

  • wn salary and the salary
  • f anyone who reports to

them. 2. Report Generator Job should be able to Read all users' salaries 3. Performance Review Application should be able to update all users' salaries

/getSalary/alice /getSalary/bob /getSalary/bob /getSalary/* /updateSalary/*

slide-16
SLIDE 16

@sometorin @OpenPolicyAgent

Open Policy Agent

slide-17
SLIDE 17

@sometorin @OpenPolicyAgent

What about RBAC?

slide-18
SLIDE 18

@sometorin @OpenPolicyAgent

RBAC solves XX% of the problem.

slide-19
SLIDE 19

@sometorin @OpenPolicyAgent

RBAC is not enough.

"QA must sign-off on images deployed to the production namespace." "Analysts can read client data but PII must be redacted." "Restrict employees from accessing the service outside of work hours." "Allow all HTTP requests from 10.1.2.0/24." "Restrict ELB changes to senior SREs that are on-call." "Give developers SSH access to machines listed in JIRA tickets assigned to them." "Prevent developers from running containers with privileged security contexts in the production namespace." "Workloads for euro-bank must be deployed on PCI-certified clusters in the EU."

slide-20
SLIDE 20

@sometorin @OpenPolicyAgent

OPA is a general-purpose policy engine.

Service

OPA

Policy (Rego) Data (JSON) Policy Query Policy Decision

slide-21
SLIDE 21

@sometorin @OpenPolicyAgent

Decisions are decoupled from enforcement.

Service

OPA

Policy Query Policy Decision Enforcement Policy (Rego) Data (JSON)

slide-22
SLIDE 22

@sometorin @OpenPolicyAgent

Evaluate policies locally.

  • Daemon (HTTP API)
  • Library (Go)
  • Service Mesh (Istio)

Node Service OPA Node Service OPA

slide-23
SLIDE 23

@sometorin @OpenPolicyAgent Node Service OPA Node Service OPA Node Service Node Host Failures OPA Node Service Node Network Partitions OPA Network Network Fate Sharing ✔ Low latency ✔ High availability

slide-24
SLIDE 24

@sometorin @OpenPolicyAgent

Policy and data are stored in-memory. No external dependencies during enforcement.

Service

OPA

Policy Query Policy Decision Policy (Rego) Data (JSON)

slide-25
SLIDE 25

@sometorin @OpenPolicyAgent

Declarative Language (Rego)

  • Is Identity I allowed to perform Operation O on Resource R?
  • What labels must applied to Deployment X?
  • Which users can SSH into production servers?

Service

OPA

Policy Query Policy Decision Policy (Rego) Data (JSON)

slide-26
SLIDE 26

@sometorin @OpenPolicyAgent

"Employees can read their own salaries and the salaries of their subordinates."

slide-27
SLIDE 27

@sometorin @OpenPolicyAgent

"Employees can read their own salaries [...]"

slide-28
SLIDE 28

@sometorin @OpenPolicyAgent

"Employees can read their own salaries [...]"

Input {"method": "GET", "path": ["salaries", "bob"], "user": "bob"}

slide-29
SLIDE 29

@sometorin @OpenPolicyAgent

"Employees can read their own salaries [...]"

allow = true { input.method = "GET" input.path = ["salaries", employee_id] input.user = employee_id }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "bob"}

slide-30
SLIDE 30

@sometorin @OpenPolicyAgent

allow = true { input.method = "GET" input.path = ["salaries", "bob"] input.user = "bob" }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "bob"}

"Employees can read their own salaries [...]"

slide-31
SLIDE 31

@sometorin @OpenPolicyAgent

allow = true { input.method = "GET" # OK input.path = ["salaries", "bob"] # OK input.user = "bob" # OK }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "bob"}

"Employees can read their own salaries [...]"

slide-32
SLIDE 32

@sometorin @OpenPolicyAgent

allow = true { input.method = "GET" input.path = ["salaries", employee_id] input.user = employee_id }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "alice"}

"Employees can read their own salaries [...]"

"alice" instead of "bob"

slide-33
SLIDE 33

@sometorin @OpenPolicyAgent

allow = true { input.method = "GET" # OK input.path = ["salaries", "bob"] # OK "alice" = "bob" # FAIL }

"Employees can read their own salaries [...]"

Input {"method": "GET", "path": ["salaries", "bob"], "user": "alice"} "alice" instead of "bob"

slide-34
SLIDE 34

@sometorin @OpenPolicyAgent

allow = true { input.method = "GET" # OK input.path = ["salaries", "bob"] # OK "alice" = "bob" # FAIL }

"Employees can read [...] the salaries of their subordinates."

Input {"method": "GET", "path": ["salaries", "bob"], "user": "alice"} "alice" instead of "bob"

slide-35
SLIDE 35

@sometorin @OpenPolicyAgent

"Employees can read [...] the salaries of their subordinates."

allow = true { input.method = "GET" input.path = ["salaries", employee_id] input.user = employee_id }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "alice"} Data (in-memory) {"manager_of": { "bob": "alice", "alice": "janet"}}

slide-36
SLIDE 36

@sometorin @OpenPolicyAgent

"Employees can read [...] the salaries of their subordinates."

allow = true { input.method = "GET" input.path = ["salaries", employee_id] input.user = employee_id } allow = true { input.method = "GET" input.path = ["salaries", employee_id] input.user = data.manager_of[employee_id] }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "alice"} Data (in-memory) {"manager_of": { "bob": "alice", "alice": "janet"}}

slide-37
SLIDE 37

@sometorin @OpenPolicyAgent

"Employees can read [...] the salaries of their subordinates."

allow = true { input.method = "GET" input.path = ["salaries", employee_id] input.user = employee_id } allow = true { input.method = "GET" input.path = ["salaries", "bob"] input.user = data.manager_of["bob"] }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "alice"} Data (in-memory) {"manager_of": { "bob": "alice", "alice": "janet"}}

slide-38
SLIDE 38

@sometorin @OpenPolicyAgent

"Employees can read [...] the salaries of their subordinates."

allow = true { input.method = "GET" input.path = ["salaries", employee_id] input.user = employee_id } allow = true { input.method = "GET" input.path = ["salaries", "bob"] input.user = "alice" }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "alice"} Data (in-memory) {"manager_of": { "bob": "alice", "alice": "janet"}}

slide-39
SLIDE 39

@sometorin @OpenPolicyAgent

"Employees can read [...] the salaries of their subordinates."

allow = true { input.method = "GET" input.path = ["salaries", employee_id] input.user = employee_id } allow = true { input.method = "GET" # OK input.path = ["salaries", "bob"] # OK input.user = "alice" # OK }

Input {"method": "GET", "path": ["salaries", "bob"], "user": "alice"} Data (in-memory) {"manager_of": { "bob": "alice", "alice": "janet"}}

slide-40
SLIDE 40

@sometorin @OpenPolicyAgent

deny { is_read_operation is_pii_topic not in_pii_consumer_whitelist }

  • peration: Read

resource: name: credit-scores resourceType: Topic session: principal: principalType: User name: CN=anon_producer,O=OPA clientAddress: 172.21.0.5 deny { not metadata.labels["qa-signoff"] metadata.namespace == "prod" spec.containers[_].privileged } metadata: name: nginx-149353-bvl8q namespace: production spec: containers:

  • image: nginx

name: nginx securityContext: privileged: true nodeName: minikube allow { input.method = "GET" input.path = ["salary", user] input.user = user } method: GET path: /salary/bob service.source: namespace: production service: landing_page service.target: namespace: production service: details user: alice allow { risk_score <= risk_budget count(plan_names["aws_iam"]) == 0 blast_radius < 500 } aws_autoscaling_group.lamb: availability_zones#: '1' availability_zones.3205: us-west-1a desired_capacity: '4' launch_configuration: kitten wait_for_capacity_timeout: 10m aws_instance.puppy: ami: ami-09b4b74c instance_type: t2.micro

OPA enables flexible

  • RBAC
  • ABAC
  • Admission Control
  • Data Protection
  • Risk Management
  • ...

OPA supports any

  • Resource Type
  • Identity Type
  • Implementation Language
  • Underlying Protocol
slide-41
SLIDE 41

@sometorin @OpenPolicyAgent

deny { is_read_operation is_pii_topic not in_pii_consumer_whitelist }

  • peration: Read

resource: name: credit-scores resourceType: Topic session: principal: principalType: User name: CN=anon_producer,O=OPA clientAddress: 172.21.0.5 deny { not metadata.labels["qa-signoff"] metadata.namespace == "prod" spec.containers[_].privileged } metadata: name: nginx-149353-bvl8q namespace: production spec: containers:

  • image: nginx

name: nginx securityContext: privileged: true nodeName: minikube allow { input.method = "GET" input.path = ["salary", user] input.user = user } method: GET path: /salary/bob service.source: namespace: production service: landing_page service.target: namespace: production service: details user: alice allow { risk_score <= risk_budget count(plan_names["aws_iam"]) == 0 blast_radius < 500 } aws_autoscaling_group.lamb: availability_zones#: '1' availability_zones.3205: us-west-1a desired_capacity: '4' launch_configuration: kitten wait_for_capacity_timeout: 10m aws_instance.puppy: ami: ami-09b4b74c instance_type: t2.micro

  • Submillisecond Latency
  • Composition
  • External Context
  • Partial Evaluation
  • Rule Indexing
  • Tracing
  • Interactive Shell (REPL)
  • IDE Integrations (VS Code)
  • Test Framework
  • Coverage
  • Dependency Analysis
slide-42
SLIDE 42

@sometorin @OpenPolicyAgent

  • pen-policy-agent/opa
slide-43
SLIDE 43

Velocity San Jose '18

Capturing Intent

slide-44
SLIDE 44

Velocity San Jose '18

Capturing Intent

slide-45
SLIDE 45

Velocity San Jose '18

Summary

Resource types

REST, gRPC method, SSH Login, Keys, Kafka Topics

Identity types

VM/Container Services, Batch Jobs, FTEs, Contractors

Underlying Protocols

HTTP, gRPC, SSH, Kafka Protocol

Implementation Languages

Java, Node JS, Ruby, Python

Latency

< 0.2 ms for basic policies

Flexibility of Rules

OPA Policy Engine

Company Culture

Policy Portal - Exercising Freedom, Responsibly

Capture Intent

Policy Portal UI hides Policy Syntax

slide-46
SLIDE 46

Velocity San Jose '18

Take Away

  • AuthZ is a fundamental security problem
  • Comprehensive solution gives better Control and Visibility
  • Get there faster with Open Source Tools (like OPA)
  • Get involved in communities (like PADME)
slide-47
SLIDE 47

Questions?

(Volterra is hiring!)

Torin Sandall

@sometorin

Manish Mehta

manish@ves.io