Monitoring Containers with BPF Jonathan Perry, Flowmill Agenda / - - PowerPoint PPT Presentation

monitoring containers with bpf
SMART_READER_LITE
LIVE PREVIEW

Monitoring Containers with BPF Jonathan Perry, Flowmill Agenda / - - PowerPoint PPT Presentation

Monitoring Containers with BPF Jonathan Perry, Flowmill Agenda / Claims Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. Agenda / Claims 1. Visibility into connections between services facilitates


slide-1
SLIDE 1

Jonathan Perry, Flowmill

Monitoring Containers with BPF

slide-2
SLIDE 2

Agenda / Claims

slide-3
SLIDE 3
  • 1. Visibility into connections between services facilitates SRE/DevOps.

Agenda / Claims

slide-4
SLIDE 4
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.

Agenda / Claims

slide-5
SLIDE 5
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Agenda / Claims

slide-6
SLIDE 6
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.

Agenda / Claims

slide-7
SLIDE 7
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Agenda / Claims

slide-8
SLIDE 8
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.
  • 6. Linux CLI provides great visibility without per-application changes.

Agenda / Claims

slide-9
SLIDE 9
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.
  • 6. Linux CLI provides great visibility without per-application changes.
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Agenda / Claims

slide-10
SLIDE 10
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.
  • 6. Linux CLI provides great visibility without per-application changes.
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.
  • 8. BPF can handle encrypted connections (with uprobes)

Agenda / Claims

slide-11
SLIDE 11

Hi! I’m Jonathan Perry

jperry@flowmill.com www.flowmill.com

  • Government: large-scale deployments
  • MIT: extreme monitoring systems

○ prod at Facebook

  • Flowmill: founder
slide-12
SLIDE 12

Demo application

github.com/GoogleCloudPlatform/microservices-demo

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Self-DDoS

Your Services

slide-16
SLIDE 16
  • 1. Visibility into connections between services facilitates SRE/DevOps.

v3→v4

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Self-DDoS

Your Services

slide-17
SLIDE 17
  • 1. Visibility into connections between services facilitates SRE/DevOps.

v3→v4

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Self-DDoS

Your Services

slide-18
SLIDE 18
  • 1. Visibility into connections between services facilitates SRE/DevOps.

this shows the components, but how do they interact?

slide-19
SLIDE 19
  • 1. Visibility into connections between services facilitates SRE/DevOps.
slide-20
SLIDE 20
  • 1. Visibility into connections between services facilitates SRE/DevOps.
slide-21
SLIDE 21
  • 1. Visibility into connections between services facilitates SRE/DevOps.

API v1 API v2

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Self-DDoS

Your Services

slide-22
SLIDE 22
  • 1. Visibility into connections between services facilitates SRE/DevOps.

API v1 API v2

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Self-DDoS

Your Services

slide-23
SLIDE 23
  • 1. Visibility into connections between services facilitates SRE/DevOps.

us-east us-west

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Self-DDoS

Your Services

slide-24
SLIDE 24
  • 1. Visibility into connections between services facilitates SRE/DevOps.

us-east us-west

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Self-DDoS

Your Services

slide-25
SLIDE 25
  • 1. Visibility into connections between services facilitates SRE/DevOps.

v3 v4
 canary

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Pinpoint Self-DDoS

Your Services

slide-26
SLIDE 26
  • 1. Visibility into connections between services facilitates SRE/DevOps.

v3 v4
 canary

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Pinpoint Self-DDoS

Your Services

slide-27
SLIDE 27
  • 1. Visibility into connections between services facilitates SRE/DevOps.

Database

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Pinpoint Self-DDoS

Your Services

slide-28
SLIDE 28
  • 1. Visibility into connections between services facilitates SRE/DevOps.

Database

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Pinpoint Self-DDoS

Your Services

slide-29
SLIDE 29
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Pinpoint Self-DDoS

Your Services

slide-30
SLIDE 30
  • 1. Visibility into connections between services facilitates SRE/DevOps.

Discover every service dependency Rate/Error/Duration for all service pairs

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Pinpoint Self-DDoS

Your Services

slide-31
SLIDE 31
  • 1. Visibility into connections between services facilitates SRE/DevOps.

Add lots of logging, tracing, metrics and documentation Discover every service dependency Rate/Error/Duration for all service pairs

  • Share architecture knowledge
  • Deprecate, migrate services
  • Verify HA deployment
  • Track SLOs between services
  • Pinpoint Self-DDoS

Your Services

slide-32
SLIDE 32
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-33
SLIDE 33
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-34
SLIDE 34
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-35
SLIDE 35
  • 2. Effective triage requires visibility into how network infrastructure affects services.

DNS

  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-36
SLIDE 36
  • 2. Effective triage requires visibility into how network infrastructure affects services.

us-east us-west

  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-37
SLIDE 37
  • 2. Effective triage requires visibility into how network infrastructure affects services.

us-east us-west

  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-38
SLIDE 38
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-39
SLIDE 39
  • 2. Effective triage requires visibility into how network infrastructure affects services.

Detect network infrastructure problems

  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-40
SLIDE 40
  • 2. Effective triage requires visibility into how network infrastructure affects services.

ssh, tcpdump, netstat, traceroute Detect network infrastructure problems

  • Network connectivity failures
  • Incorrect security rules
  • Service discovery
  • Expensive cross zone traffic

Network Infrastructure

slide-41
SLIDE 41
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Even small deployments can have
 complex connectivity

slide-42
SLIDE 42
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Even small deployments can have
 complex connectivity

slide-43
SLIDE 43
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Even small deployments can have
 complex connectivity

slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Neighborhood:
 All the services up to N hops from the selection

slide-47
SLIDE 47
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Neighborhood:
 All the services up to N hops from the selection

  • 1. Search
slide-48
SLIDE 48
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Neighborhood:
 All the services up to N hops from the selection

  • 1. Search
  • 2. Detected


anomalies

slide-49
SLIDE 49
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Neighborhood:
 All the services up to N hops from the selection

  • 1. Search
  • 2. Detected


anomalies

  • 3. Alerts


(Slack/PagerDuty etc.)

slide-50
SLIDE 50

Common causes:

  • New version deploy
  • Overloaded / borked instance
  • Geo / zone failure

Or, helpful to know if failure concentrated on single

  • Container spec
  • Process
  • Port
  • 4. Connection visibility can point to failure domains: version, instance, zone.

Got an alert / anomaly. Now what?

slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53

Agenda / Claims

slide-54
SLIDE 54
  • 1. Visibility into connections between services facilitates SRE/DevOps.

Agenda / Claims

slide-55
SLIDE 55
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.

Agenda / Claims

slide-56
SLIDE 56
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.

Agenda / Claims

slide-57
SLIDE 57
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.

Agenda / Claims

slide-58
SLIDE 58
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Agenda / Claims

slide-59
SLIDE 59
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.
  • 6. Linux CLI provides great visibility without per-application changes.

Agenda / Claims

slide-60
SLIDE 60
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.
  • 6. Linux CLI provides great visibility without per-application changes.
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Agenda / Claims

slide-61
SLIDE 61
  • 1. Visibility into connections between services facilitates SRE/DevOps.
  • 2. Effective triage requires visibility into how network infrastructure affects services.
  • 3. It is easy to navigate large deployments by looking at neighborhoods.
  • 4. Connection visibility can point to failure domains: version, instance, zone.
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.
  • 6. Linux CLI provides great visibility without per-application changes.
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.
  • 8. BPF can handle encrypted connections (with uprobes)

Agenda / Claims

slide-62
SLIDE 62

Each is great for its own use case!

  • Logs: low barrier, app internals
  • Metrics: dashboards on internals & business metrics
  • Tracing: cross-service examples of bad cases
  • Service mesh: aggregated connectivity, security, circuit breaking

But…

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Logs, metrics, tracing, service meshes

slide-63
SLIDE 63

Cons:

  • Engineering time: requires per-service work (and maintenance)
  • Performance and cost
  • No infra visibility (drops, RTT)
  • Logs+Metrics: service centric, not connection
  • Tracing: sampling, cost
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.
slide-64
SLIDE 64

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-65
SLIDE 65
  • misconfigured mesh → broken telemetry.

○ want telemetry from a different source to debug the mesh

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-66
SLIDE 66
  • misconfigured mesh → broken telemetry.

○ want telemetry from a different source to debug the mesh

  • partial deployments & managed services

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-67
SLIDE 67
  • misconfigured mesh → broken telemetry.

○ want telemetry from a different source to debug the mesh

  • partial deployments & managed services
  • no transport layer data (packet drops, RTT)

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-68
SLIDE 68
  • misconfigured mesh → broken telemetry.

○ want telemetry from a different source to debug the mesh

  • partial deployments & managed services
  • no transport layer data (packet drops, RTT)
  • doesn’t solve the analysis part. Data is either

○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-69
SLIDE 69
  • misconfigured mesh → broken telemetry.

○ want telemetry from a different source to debug the mesh

  • partial deployments & managed services
  • no transport layer data (packet drops, RTT)
  • doesn’t solve the analysis part. Data is either

○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-70
SLIDE 70
  • misconfigured mesh → broken telemetry.

○ want telemetry from a different source to debug the mesh

  • partial deployments & managed services
  • no transport layer data (packet drops, RTT)
  • doesn’t solve the analysis part. Data is either

○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-71
SLIDE 71
  • misconfigured mesh → broken telemetry.

○ want telemetry from a different source to debug the mesh

  • partial deployments & managed services
  • no transport layer data (packet drops, RTT)
  • doesn’t solve the analysis part. Data is either

○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-72
SLIDE 72
  • misconfigured mesh → broken telemetry.

○ want telemetry from a different source to debug the mesh

  • partial deployments & managed services
  • no transport layer data (packet drops, RTT)
  • doesn’t solve the analysis part. Data is either

○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec

Service Envoy

HTTP conn mgr

cluster cluster Service Envoy

HTTP conn mgr

cluster

  • eBPF user probes can efficiently get data from mesh and transport layer
  • 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

Service mesh caveats

slide-73
SLIDE 73
  • 6. Linux CLI provides great visibility without per-application changes.

Source Destination Ports Bytes Drops RTT Timestamp 1418530010 172.31.16.139 172.31.16.21 20641 22 4249 2 4 ms

Socket:

Method GET checkout?q=hrz4N 200 Endpoint Code

Protocol:

slide-74
SLIDE 74
  • 6. Linux CLI provides great visibility without per-application changes.

Source Destination Ports Bytes Drops RTT Timestamp 1418530010 172.31.16.139 172.31.16.21 20641 22 4249 2 4 ms

Socket:

Pod Image Tag Zone IP 172.31.16.139 frontend frontend-image v1.16 us-west-1c 172.31.16.21 checkoutservice checkout-image v2.12a us-west-1a

K8s:

Method GET checkout?q=hrz4N 200 Endpoint Code

Protocol:

slide-75
SLIDE 75
  • 6. Linux CLI provides great visibility without per-application changes.

Source Destination Ports Bytes Drops RTT Timestamp 1418530010 172.31.16.139 172.31.16.21 20641 22 4249 2 4 ms

Socket:

Pod Image Tag Zone IP 172.31.16.139 frontend frontend-image v1.16 us-west-1c 172.31.16.21 checkoutservice checkout-image v2.12a us-west-1a

K8s:

Source Destination Ports Bytes Drops RTT Timestamp 1418530010 frontend checkout 20641 22 4249 2 4 ms frontend-image checkout-image v1.16 v2.12a us-west-1c us-west-1a

Joined:

Method GET checkout?q=hrz4N 200 Endpoint Code

Protocol:

Method GET checkout?q=hrz4N 200 Endpoint Code

slide-76
SLIDE 76
  • 6. Linux CLI provides great visibility without per-application changes.

Getting Flow Data

A B X

slide-77
SLIDE 77
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X

slide-78
SLIDE 78
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X

slide-79
SLIDE 79
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X

(A,X)

~

(A,B)

slide-80
SLIDE 80
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X A→B

(A,X)

~

(A,B)

slide-81
SLIDE 81
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X A→B

(A,X)

~

(A,B)

$ kubectl describe pod $POD Name: A Namespace: staging ... Status: Running IP: 100.101.198.137 Controlled By: ReplicaSet/A

slide-82
SLIDE 82
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X A→B

(A,X)

~

(A,B)

$ kubectl describe pod $POD Name: A Namespace: staging ... Status: Running IP: 100.101.198.137 Controlled By: ReplicaSet/A

slide-83
SLIDE 83
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X A→B

(A,X)

~

(A,B)

# PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ nsenter -t $PID -n ss -ti ESTAB 0 0 100.101.198.137:34940 100.65.61.118:8000 cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002

$ kubectl describe pod $POD Name: A Namespace: staging ... Status: Running IP: 100.101.198.137 Controlled By: ReplicaSet/A

slide-84
SLIDE 84
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X A→B

(A,X)

~

(A,B)

# PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ nsenter -t $PID -n ss -ti ESTAB 0 0 100.101.198.137:34940 100.65.61.118:8000 cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002

$ kubectl describe pod $POD Name: A Namespace: staging ... Status: Running IP: 100.101.198.137 Controlled By: ReplicaSet/A

A X

slide-85
SLIDE 85
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X A→B

(A,X)

~

(A,B)

# PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ nsenter -t $PID -n ss -ti ESTAB 0 0 100.101.198.137:34940 100.65.61.118:8000 cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002

# conntrack -L tcp 6 86399 ESTABLISHED src=100.101.198.137 dst=100.65.61.118 sport=34940 dport=8000 src=100.101.198.147 dst=100.101.198.137 sport=8000 dport=34940 [ASSURED] mark=0 use=1 $ kubectl describe pod $POD Name: A Namespace: staging ... Status: Running IP: 100.101.198.137 Controlled By: ReplicaSet/A

A X

slide-86
SLIDE 86
  • 6. Linux CLI provides great visibility without per-application changes.

iptables

Getting Flow Data

A B X A→X A→B

(A,X)

~

(A,B)

# PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ nsenter -t $PID -n ss -ti ESTAB 0 0 100.101.198.137:34940 100.65.61.118:8000 cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002

# conntrack -L tcp 6 86399 ESTABLISHED src=100.101.198.137 dst=100.65.61.118 sport=34940 dport=8000 src=100.101.198.147 dst=100.101.198.137 sport=8000 dport=34940 [ASSURED] mark=0 use=1 $ kubectl describe pod $POD Name: A Namespace: staging ... Status: Running IP: 100.101.198.137 Controlled By: ReplicaSet/A

A X A X B A

slide-87
SLIDE 87
  • 6. Linux CLI provides great visibility without per-application changes.

CLI tools have disadvantages

  • Performance:

○ iterates over all sockets ○ built for CLI use (printfs)

slide-88
SLIDE 88
  • Coverage: Linux CLI tools are polling based
  • 6. Linux CLI provides great visibility without per-application changes.

CLI tools have disadvantages

  • Performance:

○ iterates over all sockets ○ built for CLI use (printfs)

slide-89
SLIDE 89
  • Coverage: Linux CLI tools are polling based
  • 6. Linux CLI provides great visibility without per-application changes.

CLI tools have disadvantages

time

poll poll poll poll

  • Performance:

○ iterates over all sockets ○ built for CLI use (printfs)

slide-90
SLIDE 90
  • Coverage: Linux CLI tools are polling based
  • 6. Linux CLI provides great visibility without per-application changes.

CLI tools have disadvantages

time

poll poll poll poll

  • Performance:

○ iterates over all sockets ○ built for CLI use (printfs)

socket

slide-91
SLIDE 91
  • Coverage: Linux CLI tools are polling based
  • 6. Linux CLI provides great visibility without per-application changes.

CLI tools have disadvantages

time

poll poll poll poll

  • Performance:

○ iterates over all sockets ○ built for CLI use (printfs)

socket

→ Misses events between polls

slide-92
SLIDE 92
  • Linux bpf() system call since 3.18
  • Run code on kernel events
  • Only changes, more data
  • Safe: In-kernel verifier, read-only
  • Fast: JIT-compiled

Enter eBPF

Unofficial BPF mascot by Deirdré Straughan

slide-93
SLIDE 93
  • Linux bpf() system call since 3.18
  • Run code on kernel events
  • Only changes, more data
  • Safe: In-kernel verifier, read-only
  • Fast: JIT-compiled

Enter eBPF

Unofficial BPF mascot by Deirdré Straughan

→ 100% coverage + no app changes + low overhead ftw!

slide-94
SLIDE 94

tcptop:

  • instruments tcp_sendmsg and tcp_cleanup_rbuf
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Using eBPF

slide-95
SLIDE 95
slide-96
SLIDE 96
slide-97
SLIDE 97

tcptop:

  • instruments tcp_sendmsg and tcp_cleanup_rbuf
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Using eBPF

slide-98
SLIDE 98

tcptop:

  • instruments tcp_sendmsg and tcp_cleanup_rbuf
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Using eBPF

  • need to be careful of races:

# IPv4: build dict of all seen keys ipv4_throughput = defaultdict(lambda: [0, 0]) for k, v in ipv4_send_bytes.items(): key = get_ipv4_session_key(k) ipv4_throughput[key][0] = v.value ipv4_send_bytes.clear()

as for loop is running, kernel continues with updates, clear() throws those out.

slide-99
SLIDE 99

tcptop:

  • instruments tcp_sendmsg and tcp_cleanup_rbuf
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Using eBPF

  • need to be careful of races:

# IPv4: build dict of all seen keys ipv4_throughput = defaultdict(lambda: [0, 0]) for k, v in ipv4_send_bytes.items(): key = get_ipv4_session_key(k) ipv4_throughput[key][0] = v.value ipv4_send_bytes.clear()

as for loop is running, kernel continues with updates, clear() throws those out.

slide-100
SLIDE 100
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

System architecture

Linux Containers
 Processes
 Socket
 NAT Agent

Flow Collection

ECS Kubernetes Docker

slide-101
SLIDE 101
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

System architecture

Linux Containers
 Processes
 Socket
 NAT Agent

Flow Collection

ECS Match, Enrich, Aggregate

Flow Analysis

Kubernetes Docker

slide-102
SLIDE 102
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

System architecture

Linux Containers
 Processes
 Socket
 NAT Agent

Flow Collection

ECS Match, Enrich, Aggregate

Flow Analysis

Kubernetes Docker UI API:
 timeseries
 autocomplete map monitors events dashboards TSDB (Prometheus) API client (REST, gRPC)

slide-103
SLIDE 103
  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

System architecture

Linux Containers
 Processes
 Socket
 NAT Agent

Flow Collection

ECS Match, Enrich, Aggregate

Flow Analysis

Kubernetes Docker Alerting / Webhooks Statistics Engine UI API:
 timeseries
 autocomplete map monitors events dashboards TSDB (Prometheus) API client (REST, gRPC)

slide-104
SLIDE 104

using perf and FlameGraph[1]

  • To record: perf record -a -g -e cycles -c 5000000 -- sleep 60
  • Post-process: perf script | FlameGraph/stackcollapse-perf.pl > raw.txt
  • Analyze: grep -E ‘(cleanup_module|flowmill_agent)’ raw.txt | 


FlameGraph/flamegraph.pl > flame.svg

→ observed 0.1% - 0.25% CPU overhead across deployments

  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Evaluation: CPU overhead

[1] github.com/brendangregg/FlameGraph

slide-105
SLIDE 105

using perf and FlameGraph[1]

  • To record: perf record -a -g -e cycles -c 5000000 -- sleep 60
  • Post-process: perf script | FlameGraph/stackcollapse-perf.pl > raw.txt
  • Analyze: grep -E ‘(cleanup_module|flowmill_agent)’ raw.txt | 


FlameGraph/flamegraph.pl > flame.svg

→ observed 0.1% - 0.25% CPU overhead across deployments

  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Evaluation: CPU overhead

Node Application TCP stack Collector M cycles (%) 480,000 (100%) 220,775 (46%) 27135 (5.6%) 4,120 (0.86%) [1] github.com/brendangregg/FlameGraph

Most aggressive customer load test:

slide-106
SLIDE 106

Connection visibility → also for telemetry connections Megabytes / second

  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Evaluation: Network overhead

App throughput Flow telemetry % Cluster 1 186.2 0.85 0.46% Cluster 2 217.1 2.49 1.15% Cluster 3 249.6 0.25 0.10% Cluster 4 (batch) 522.0 0.16 0.031% Cluster 5 183.0 0.02 0.013%

slide-107
SLIDE 107

Connection visibility → also for telemetry connections Megabytes / second

  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Evaluation: Network overhead

App throughput Flow telemetry % Cluster 1 186.2 0.85 0.46% Cluster 2 217.1 2.49 1.15% Cluster 3 249.6 0.25 0.10% Cluster 4 (batch) 522.0 0.16 0.031% Cluster 5 183.0 0.02 0.013%

→ Usually < 0.5% network overhead, outliers ~1%

slide-108
SLIDE 108

TCP UDP NAT process container DNS Total events/s per agent Company A 1429.2 82.0 20.8 146.5 0.014 10.5 1689.014 Company B 4017.3 89.0

  • 1562.1
  • 1.98

5670.38 Company C (batch) 51.0 28.8 1.05 43.8 0.55 0.5 125.7

  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Evaluation: Backend QPS

Agent event counts (per second):

slide-109
SLIDE 109

TCP UDP NAT process container DNS Total events/s per agent Company A 1429.2 82.0 20.8 146.5 0.014 10.5 1689.014 Company B 4017.3 89.0

  • 1562.1
  • 1.98

5670.38 Company C (batch) 51.0 28.8 1.05 43.8 0.55 0.5 125.7

→ For a 50-node cluster, need to process 84.4k-283.5k QPS
 (~20x less for batch workloads)

  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Evaluation: Backend QPS

Agent event counts (per second):

slide-110
SLIDE 110

TCP UDP NAT process container DNS Total events/s per agent Company A 1429.2 82.0 20.8 146.5 0.014 10.5 1689.014 Company B 4017.3 89.0

  • 1562.1
  • 1.98

5670.38 Company C (batch) 51.0 28.8 1.05 43.8 0.55 0.5 125.7

→ For a 50-node cluster, need to process 84.4k-283.5k QPS
 (~20x less for batch workloads) → C++ analysis pipeline: hundreds of nodes w/2 second latency
 (thousands soon)

  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Evaluation: Backend QPS

Agent event counts (per second):

slide-111
SLIDE 111
  • CPU:

  • bserved 0.1% - 0.25% CPU overhead across deployments

○ 0.86% max load test

  • Network:

○ Usually < 0.5% network overhead, outliers ~1%

  • QPS:

○ ~100k QPS for 50-node cluster ○ can handle 100s of nodes with 2 second latency

  • 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

Evaluation

slide-112
SLIDE 112
  • eBPF supports user probes


→ get payload info from userspace

  • Demo
  • 8. BPF can handle encrypted connections (with uprobes)

Visibility for encrypted traffic (TLS/mTLS)

$ go tool nm /root/hello | grep 'net/http\.' 690a40 t net/http.Error 64eee0 t net/http.Get 6929e0 t net/http.HandleFunc 6b6230 t net/http.Handler.ServeHTTP-fm 6909e0 t net/http.HandlerFunc.ServeHTTP 6805b0 t net/http.Header.Add 680700 t net/http.Header.Del 680690 t net/http.Header.Get 680620 t net/http.Header.Set 680750 t net/http.Header.Write 681190 t net/http.Header.WriteSubset 680840 t net/http.Header.clone

$ /funccount -p 31328 '/root/hello:net/http.*Header*' Tracing 111 functions for "/root/hello:net/ http.*Header*"... Hit Ctrl-C to end. ^C FUNC COUNT net/http.Header.Del 3 net/http.Header.sortedKeyValues 3 net/http.Header.WriteSubset 3 net/http.(*response).WriteHeader 3 net/http.extraHeader.Write 3 net/http.(*chunkWriter).writeHeader 3 net/http.(*chunkWriter).writeHeader.func1 3 Detaching...

slide-113
SLIDE 113
slide-114
SLIDE 114
slide-115
SLIDE 115

Connection+Infra / BPF monitoring

(Summary)

slide-116
SLIDE 116

Connection+Infra / BPF monitoring

(Summary)

Discover every service dependency Rate/Error/Duration for all service pairs Detect network infrastructure problems

slide-117
SLIDE 117

Connection+Infra / BPF monitoring

(Summary)

No code changes Negligible overhead

Discover every service dependency Rate/Error/Duration for all service pairs Detect network infrastructure problems

slide-118
SLIDE 118

Connection+Infra / BPF monitoring

(Summary)

No code changes Negligible overhead

Discover every service dependency Rate/Error/Duration for all service pairs Detect network infrastructure problems

Questions? Please come say hi at the booth!