Cortex: Prometheus as a Service, One Year On Tom Wilkie, PromCon - PowerPoint PPT Presentation

Cortex: Prometheus as a Service, One Year On Tom Wilkie, PromCon 2017 tom.wilkie@gmail.com https://github.com/weaveworks/cortex

https://www.youtube.com/watch?v=3Tb4Wc0kfCM

Prometheus HA Grafana Cortex: Prometheus as a Service Alertmanager • Natively multi tenant; isolate different customers in the same services. Your Your Your Your Your Jobs • Different story around scaling & HA Cortex • “Virtually infinite” retention and durability • Opportunities for performance enhancements

Write requests Read requests Control requests Frontend Prometheus Your Jobs Ditributor Consul Ingester DynamoDB S3 Memcache Cortex Architecture

A Year’s Evolution

Problem #1: DynamoDB Write Throughput

https://github.com/weaveworks/cortex/issues/254

Write requests Read requests Control requests Frontend Prometheus Your Jobs Ditributor Consul Ingester Table Manager DynamoDB S3 Memcache Cortex Architecture

Problem #2: DynamoDB Write Throughput, again

Original schema: • Hash Key: <user ID>:<hour>:<metric name> • Range Key: <label name>:<label value>:<chunk ID> New schema: • Hash Key: <user ID>:<day>:<metric name>:<label name> • Range Key: <chunk ID>:<chunk end time> https://github.com/weaveworks/cortex/pull/262

Problem #3: Queries of Death

Write requests Read requests Control requests Frontend Prometheus Your Jobs Ditributor Querier Consul Ingester Table Manager DynamoDB S3 Memcache Cortex Architecture

Problem #3: Recording rules and alerts

Write requests Read requests Control requests Frontend Prometheus Your Jobs Ditributor Querier Consul Ingester Ruler Table Manager DynamoDB S3 Memcache Cortex Architecture

Problem #4: Long tail

https://www.weave.works/blog/the-long-tail-tools-to-investigate-high-long-tail-latency/

Problem #5: Cost

S3 DynamoDB IOP Cost 5x10 -6 2x10 -7 ($/IOP) Storage Cost 0.023 0.250 ($/GB/Month) https://github.com/weaveworks/cortex/issues/141

0.025 DynamoDB 0.02 0.015 Cost ($) S3 0.01 0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 Object size (GB)

Write requests Read requests Control requests Frontend Prometheus Your Jobs Ditributor Querier Consul Ingester Ruler Table Manager DynamoDB Memcache Cortex Architecture

Problem #6: DynamoDB, again

Write requests Read requests Control requests Frontend Prometheus Your Jobs Ditributor Querier Consul Ingester Ruler Table Manager BigTable Memcache Cortex Architecture

DynamoDB BigTable 99th Percentile Write 70-100 50-150 Latency (ms) 99th Percentile Read 100-2500 ~250 Latency (ms) LOC ~2000 ~400 DynamoDB numbers courtesy of Weaveworks

Closing thoughts

1. DynamoDB Write Throughput 2. DynamoDB Write Throughput, again 3. Recording rules and alerts 4. Long tail 5. Cost 6. DynamoDB, again

Running for >12months • Availability: querier unavailable for <12hrs ~99.9% • Durability: lost <2 days of data >99.5% • 99th percentile write performance ~60ms • 99th percentile query performance <200ms

Future • Direct chunk writes from Prometheus to Cortex Chunk Store • Separate ingester index for better load balancing • Use prometheus/tsdb for the ingesters • Etcd & gossip for ring storage • Chunks in Google Cloud Storage

One more thing…

I left Weaveworks at the begging of June to focus on Prometheus & Cortex development. Since then I’ve teamed up with David to develop some ideas around Prometheus, logging, and tracing. We’re available for Prometheus hosting, consulting, training and support. email: hello@kausal.co

Metrics

Traces

Thank you! Questions?

Cortex: Prometheus as a Service, One Year On Tom Wilkie, PromCon - PowerPoint PPT Presentation

Cortex: Prometheus as a Service, One Year On Tom Wilkie, PromCon 2017 tom.wilkie@gmail.com https://github.com/weaveworks/cortex https://www.youtube.com/watch?v=3Tb4Wc0kfCM Prometheus HA Grafana Cortex: Prometheus as a Service Alertmanager

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Prometheus Best Practices and Beastly Pitfalls Julius Volz, August 17, 2017 Prometheus

Chapter 6 Vision Exam 1 Anatomy of vision Primary visual cortex (striate cortex, V1)

110 Rules for Prometheus Brian Brazil Founder Rule 110 110 Rules for Prometheus Brian Brazil

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

PromCon 2017 Welcome and Introduction Julius Volz, 17. August 2017 Prometheus Welcome and Thank

The Shmitah Cycle Common Holy Year 1 Year 2 Year 1 Year 2 Year 3 Year 4 Year 5 Year 6

Knowledge in Interviews Brian Brazil Founder Who am I? One of the developers of Prometheus

Practical monitoring with Prometheus and Grafana Jess Portnoy jess.portnoy@kaltura.com, Kaltura,

Rethinking monitoring with Prometheus Martn Ferrari Based on a previous talk prepared with

3. Agent-Oriented Methodologies Part 2: D) ems Design (MASD The PROMETHEUS The PROMETHEUS

Deploying Prometheus Filippo Giunchedi - Operations Engineer filippo@wikimedia.org Agenda

Prometheus Adam Goldsmith, Jack Gonsalves, Ben Gillette, and Luke Buquicchio Prometheus

Agenda Technical updates Project Cortex availability Project Cortex partners

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Telencephalon/Cerebral Cortex - Anatomy Cerebral Cortex Box 26D Brain Size and Intelligence

Intro to Cortex M0 and LPCxpresso 1114 Minute Quiz Minute Quiz Just kidding, but... What

24 Jacob was left alone; and a man wrestled with him until 24

Motion Capture Specialized Motion Capture N. Alberto Borghese Laboratory of Human Motion

Emotions and EMG measures of facial muscles in interactive contexts Sascha Mahlke Berlin

Interrupt-Driven Input/Output on the STM32F407 Microcontroller Textbook: Chapter 11 (Interrupts)

Vision: From Eye to Brain (Chap 3, Part II) Lecture 7 Jonathan Pillow Sensation &

Cortex-M4 optimizations for { R,M } LWE schemes Erdem Alkm 1,2 Yusuf Alper Bilgin 3,4 Murat Cenk

Explaining Cortical Adaptation with a Statistically Optimized Normalization Mo del Martin

Cortex: Prometheus as a Service, One Year On Tom Wilkie, PromCon - PowerPoint PPT Presentation

Cortex: Prometheus as a Service, One Year On Tom Wilkie, PromCon 2017 tom.wilkie@gmail.com https://github.com/weaveworks/cortex https://www.youtube.com/watch?v=3Tb4Wc0kfCM Prometheus HA Grafana Cortex: Prometheus as a Service Alertmanager

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Prometheus Best Practices and Beastly Pitfalls Julius Volz, August 17, 2017 Prometheus

Chapter 6 Vision Exam 1 Anatomy of vision Primary visual cortex (striate cortex, V1)

110 Rules for Prometheus Brian Brazil Founder Rule 110 110 Rules for Prometheus Brian Brazil

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

PromCon 2017 Welcome and Introduction Julius Volz, 17. August 2017 Prometheus Welcome and Thank

The Shmitah Cycle Common Holy Year 1 Year 2 Year 1 Year 2 Year 3 Year 4 Year 5 Year 6

Knowledge in Interviews Brian Brazil Founder Who am I? One of the developers of Prometheus

Practical monitoring with Prometheus and Grafana Jess Portnoy jess.portnoy@kaltura.com, Kaltura,

Rethinking monitoring with Prometheus Martn Ferrari Based on a previous talk prepared with

3. Agent-Oriented Methodologies Part 2: D) ems Design (MASD The PROMETHEUS The PROMETHEUS

Deploying Prometheus Filippo Giunchedi - Operations Engineer filippo@wikimedia.org Agenda

Prometheus Adam Goldsmith, Jack Gonsalves, Ben Gillette, and Luke Buquicchio Prometheus

Agenda Technical updates Project Cortex availability Project Cortex partners

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Telencephalon/Cerebral Cortex - Anatomy Cerebral Cortex Box 26D Brain Size and Intelligence

Intro to Cortex M0 and LPCxpresso 1114 Minute Quiz Minute Quiz Just kidding, but... What

24 Jacob was left alone; and a man wrestled with him until 24

Motion Capture Specialized Motion Capture N. Alberto Borghese Laboratory of Human Motion

Emotions and EMG measures of facial muscles in interactive contexts Sascha Mahlke Berlin

Interrupt-Driven Input/Output on the STM32F407 Microcontroller Textbook: Chapter 11 (Interrupts)

Vision: From Eye to Brain (Chap 3, Part II) Lecture 7 Jonathan Pillow Sensation &amp;

Cortex-M4 optimizations for { R,M } LWE schemes Erdem Alkm 1,2 Yusuf Alper Bilgin 3,4 Murat Cenk

Explaining Cortical Adaptation with a Statistically Optimized Normalization Mo del Martin

Vision: From Eye to Brain (Chap 3, Part II) Lecture 7 Jonathan Pillow Sensation &