Releasing Cloud Databases from the Chains of Prediction Models Ryan - - PowerPoint PPT Presentation
Releasing Cloud Databases from the Chains of Prediction Models Ryan - - PowerPoint PPT Presentation
Releasing Cloud Databases from the Chains of Prediction Models Ryan Marcus and Olga Papaemmanouil Brandeis University Cloud Databases Landscape Cloud Infrastructure as a Service (IaaS) Deployment Challenges Q Q Q Q Data Management
Infrastructure as a Service (IaaS)
Cloud Databases Landscape
Cloud
IaaS Provider
Cost Management Performance Management Resource Provisioning Workload Scheduling NP-hard problem
Deployment Challenges
Data Management Application
Q Q Q Q
VM VM VM VM
Placement Provisioning Scheduling PMAX (Liu et al.) Auto (Rogers et al.) SmartSLA (Xiong et al.) Shepherd (Chi et al.) SLATree (Chi et al.) Multi-tenant SLOs (Lang et al.) iCBS (Chi et al.) Delphi / Pythia (Elmore et al.) Hypergraph (Çatalyürek et al.) SCOPE (Chaiken et al.) Bazaar (Jalaparti et al.) many traditional methods ...
State-of-the-art
Placement Provisioning Scheduling PMAX (Liu et al.) Auto (Rogers et al.) SmartSLA (Xiong et al.) Shepherd (Chi et al.) SLATree (Chi et al.) Multi-tenant SLOs (Lang et al.) iCBS (Chi et al.) Delphi / Pythia (Elmore et al.) Hypergraph (Çatalyürek et al.) SCOPE (Chaiken et al.) Bazaar (Jalaparti et al.) many traditional methods ...
State-of-the-art
Query deadline Workload deadline Piecewise linear Average latency Percentile deadline
Performance Prediction Models
q DBMS-related challenges
q isolated vs. concurrent query execution q known vs unseen query types (“templates”) q extensive off-line training q state-of-the-art: 15-20% prediction error
q Cloud-related challenges
q numerous resource configurations q dynamic environment: “noisy neighbors”
Wish List
Challenges
complex interactions arbitrary workloads arbitrary goals End-to-end cost-aware service
(resource provisioning, workload scheduling)
Agnostic to workload characteristics
(templates, arrival rates, execution times)
Application-defined performance goals
(per query deadline, percentile, average latency, max latency )
ML approach: model dynamic, complex decisions
Dynamic resource availability arbitrary resources
Bandit: ML-Based Cost Management
IaaS Provider
Data Management Application
Cost Management SLA Management Resource Provisioning Workload Scheduling
VM VM VM VM
Q Q Q Q
Reinforcement Learning
VM VM IaaS Provider VM action reward
Environment
internal state (past experiences)
- bservation
agent
internal state (past experiences)
CMABs
(Contextual Multi-Armed Bandits)
VM VM IaaS Provider VM
Environment
action reward
- bservation
agent
CMABs in Bandit
(Contextual Multi-Armed Bandits) VM VM IaaS Provider
Data Management Application
VM
Environment
action cost $$
- bservation
agent
internal state (past experiences)
Q Q Q Q
CMABs in Bandit
(Contextual Multi-Armed Bandits)
VM
IaaS Provider
Data Management Application
action cost $$
- bservation
VM VM VM VM VM Tier 1 VM Tier 2
SLA Q Q Q Q
internal state (past experiences)
CMABs in Bandit
(Contextual Multi-Armed Bandits)
VM
IaaS Provider
Data Management Application
action cost $$
- bservation
VM VM VM VM VM Tier 1 VM Tier 2
Q SLA Q Q Q
internal state (past experiences)
pass down accept
CMABs in Bandit
(Contextual Multi-Armed Bandits)
VM
IaaS Provider
Data Management Application
action cost $$
- bservation
VM VM VM VM VM Tier 1 VM Tier 2
SLA Q Q Q
internal state (past experiences)
Q
CMABs in Bandit
(Contextual Multi-Armed Bandits)
VM
IaaS Provider
Data Management Application
action cost $$
- bservation
VM VM VM VM VM Tier 1 VM Tier 2
Q SLA Q Q Q
(pass, context, $$) (down, context, $$) (accept, context, $)
Feature Selection
Data Management Application
Model Generator Context Collector Experience Collector Q Q Q Q IaaS Provider
VM VM VM VM
Probabilistic Action Selection
Data Management Application
Model Generator Context Collector Experience Collector action Q Q Q Q IaaS Provider
VM VM VM VM
Evaluation
100 200 300 400 500 1000 2000 3000 4000 5000 6000 7000
Average cost per query (1/10 cent) Queries processed
Bandit, one query at a time Bandit, one query per vCPU Bandit, two queries per vCPU Clairvoyant, one query at a time Clairvoyant, one query per vCPU Clairvoyant, two queries per vCPU 100 200 300 400 500 600 700 800 500 1000 1500 2000 2500 3000 3500 4000
Average cost per query (1/10 cent) Queries processed
All new templates at once New templates over time
200 400 600 800 1000 500 1000 1500 2000 2500 3000 3500
Average cost per query (1/10 cent) Queries processed
8 templates 80 templates 800 templates 50 100 150 200 Value-based Hash-based
Converged cost (1/10 cent) Segmentation Type
Round-robin Clairvoyant PO2 Bandit
4% cost from solutions with perfect prediction model Adapts quickly to new unseen queries templates Converges after few 1000s queries of 100s templates Learns best execution site for partitioned data
Conclusions
q Cost vs performance trade-offs are complex
q human ability to derive insight is not improving
q Benefits of ML-drive approach
q discover customized solutions q automate decision making q adapt to dynamic environments
q Future Steps
q alternative learning techniques q more advanced tasks: scheduling, data movement q learning-based database as a service (DaaS) systems