Auto-sizing for Stream Processing Applications at LinkedIn
Rayman Preet Singh, Bharath Kumarasubramanian, Prateek Maheshwari, and Samarth Shetty
Stream Processing @ LinkedIn
Auto-sizing for Stream Processing Applications at LinkedIn Rayman - - PowerPoint PPT Presentation
Auto-sizing for Stream Processing Applications at LinkedIn Rayman Preet Singh, Bharath Kumarasubramanian, Prateek Maheshwari, and Samarth Shetty Stream Processing @ LinkedIn Stream Processing Skills Top App Skills Jobs Streaming input,
Rayman Preet Singh, Bharath Kumarasubramanian, Prateek Maheshwari, and Samarth Shetty
Stream Processing @ LinkedIn
2
App Skills Jobs Top Skills
3
John Doe
Samza: Stateful Scalable Stream Processing at LinkedIn
Stream Processing App Example
4
Real-time distributed tracing for web performance and efficiency optimizations LinkedIn Engineering Blog
Front-end Feed Service URN Resolution Service Profile Service Notification Service
Profile Service DB Service
Mini-profile Service Graph Service
Stream Processing App Example
5
LinkedIn Sales and EMEA Blog
Stream Processing App Example
Notification, monitoring, recommendation, fraud-detection, search, …
6
App Skills Jobs Top Skills
7
App-1 Stream Processing as a Service App-2
App-3
App developers Data scientists …
APIs Capacity provisioning Security & privacy Operational ease Scalability Fault-tolerance Efficiency Performance …
8
Throughput, Latency Parallelism CPU-cores, #threads, … Memory Heap, native, … Specialized hardware GPUs, RDMA, … Over-provisioning 50% of users by approx. 50%, Google-Autopilot [EuroSys’20], … Under-provisioning OOMs, stalls, failures, under-performing, ...
9
App Controller
Throughput, Latency goals Input load App internals Environmental conditions Dependency-service, network latencies, … Hardware, software evolution …
Sizing parameters
10
Apps are DAGs of cataloged operators
SoCC ‘17, VLDB ‘17, ToN ‘17, OSDI ‘18, ICDE ‘15, ICDE ‘20, IC2E ’16, … Tune parallelism Optimize throughput, latency, utilization, time-taken
Arrival rates, service-times follow specific distributions
ToN ’17, ICDE ’15, … Poisson, exponential, … Tune parallelism – queuing theory, hill-climb, …
Map Join Filter Filter Filter Filter
11
Op1 Op3 UDF Op2
Web Service Blob Storage KV Store
. . .
0.2 0.4 0.6 0.8 1 1 10 100 1000 Service time (in ms) CDF of service time (ms) App 1 App 2 App 3 App 4
Service time depends on remote services’ latencies, error-rates & retries, network latencies, … No specific distribution of service-times
12 Time-series of input load for sample apps
Throughput depends on input load variation and remote services’ throughput
Op1 Op3 UDF Op2
Web Service Blob Storage KV Store
. . .
Input load (messages per sec)
13
No specific distributions of arrival-rates
0.2 0.4 0.6 0.8 1 105 106 Arrival rate (messages/sec) CDF of arrival-rate (messages/sec) App 1 App 2 App 3 App 4
14
Additional functionalities
External frameworks TensorFlow, DL4j, … Out-of-order processing Input priorities State User-defined functions (UDFs) Customized input checkpointing …
Op1 Op3 UDF Op2
Periodic UDF Client Cache State Web Service Blob Storage KV Store
. . .
External Frameworks
15
Heterogenous combinations of functionalities DAG-only based models are insufficient
16
CPU bottleneck à Input buffering à Memory use àLowered throughput à…
Java-based apps Apache Flink, Samza, …
Memory bottleneck à GC overhead à Low throughput, High latency & CPU utilization
17
0.2 0.4 0.6 0.8 1
0.01 0.1 1 10 100 1000 10000 100000
Fraction of applications Application p50 service time (ms) CDF of application p50 service time (ms)
18
0.2 0.4 0.6 0.8 1
1 10 100 1000
Fraction of applications Number of input streams CDF of number of input streams (per application)
19
0.2 0.4 0.6 0.8 1
0.1 1 10 100 1000 10000 100000 1x106 1x107
Fraction of applications Application state (in MB) CDF of application state size (MB)
20
App Controller Right size vs. optimal size Operational ease
Interpretable Safe-trajectory
Minimize time-taken Scalable, fault-tolerant, efficient, …
Sizing parameters
21
Black-box approaches
Azure-VMSS, AWS-EC2 autoscale, Dhalion VLDB ’17, .. Interpretable Right sizing Time-taken, oscillations [DS2 OSDI’18, Turbine ICDE ’20]
Undo, redo, refine, …
22
Optimization approaches
Bilal et al. SoCC ‘17, Gencer et al. Middleware ‘15, … Training data (trial runs), parameter & criteria tuning, assumptions, … Optimal sizing, minimize time-taken Operability (interpretable actions), service dependencies, network, ..
23
Feedback control system Policies encapsulate strategies for sizing a single resource
Priority order Periodically on all apps Only if, no inflight action on app
24
Policy priority order
Deterministic -- interpretable, modifiable, .. Programmability for policies Tailored to continuous-operator systems like Apache Samza, Flink, … P1: Memory scale-up P2: CPU scale-up P3: Parallelism tuning
25
Straggling app
Increase memory? CPU? Parallelism?
Bounded buffers
Tuning memory before CPU
Tuning parallelism Triggered by backlog increases (after P1, P2) Correlation with remote service metrics? TLCC (time-lagged cross-correlation)
P1: Memory scale-up P2: CPU scale-up P3: Parallelism tuning
Op1 Op3 UDF Op2
Periodic UDF Client Cache State Web Service Blob Storage KV Store
. . .
External Frameworks
Work in progress Implemented as a stream processing app Used for hundreds of production mix of apps
14% larger size vs. hand-tuned optimal (selected apps) At-most one scale-down for each resource
26
Streaming apps go beyond DAG of operators Use remote services
Customize functionalities, heterogeneous Widely varying workloads
Multiple resource-use, performance, cost, operability trade-offs
Sage: a rule-based solution to navigate them in production
27
28
29
30
Op1 Op3 UDF Op2
Web Service Blob Storage KV Store
. . .
0.2 0.4 0.6 0.8 1 1 10 100 1000 Service time (in ms) CDF of service time (ms) App 1 App 2 App 3 App 4
Service time depends on remote services’ latencies, error-rates & retries, network latencies, … No specific distribution Throughput depends on input load variation and remote services’ throughput No specific distribution
0.2 0.4 0.6 0.8 1 105 106 Arrival rate (messages/sec) CDF of arrival-rate (messages/sec) App 1 App 2 App 3 App 4
31
Additional functionalities
External frameworks TensorFlow, DL4j, … Out-of-order processing Input priorities State User-defined functions (UDFs) Customized input checkpointing … Apps combine operators and functionalities in different ways Heterogenous mix
Op1 Op3 UDF Op2
Periodic UDF Client Cache State Web Service Blob Storage KV Store
. . .
External Frameworks
32
CPU bottleneck à Input buffering à Memory use àLowered throughput à…
Java-based apps Apache Flink, Samza, …
Memory bottleneck à GC overhead à Low throughput, High latency & CPU utilization
33
Feedback control system Policies encapsulate strategies for sizing a single resource
Priority order Periodically on all apps Only if, no inflight action on app
34
Policy priority order
Deterministic -- interpretable, modifiable, .. Programmability for policies Tailored to continuous-operator systems like Apache Samza, Flink, … P1: Memory scale-up P2: CPU scale-up P3: Parallelism tuning
14% larger size vs. hand-tuned optimal (selected apps) At-most one scale-down for each resource
35
Streaming apps go beyond DAG of operators Use remote services
Customize functionalities, heterogeneous Widely varying workloads
Multiple resource-use, performance, cost, operability trade-offs
Sage: a rule-based solution to navigate them in production
36
37
0.2 0.4 0.6 0.8 1
0.01 0.1 1 10 100 1000 10000 100000
Fraction of applications Application p50 service time (ms) CDF of application p50 service time (ms)
0.2 0.4 0.6 0.8 1
1 10 100 1000
Fraction of applications Number of input streams CDF of number of input streams (per application)
0.2 0.4 0.6 0.8 1
0.1 1 10 100 1000 10000 100000 1x106 1x107
Fraction of applications Application state (in MB) CDF of application state size (MB)