Scaling Automated Database Monitoring at Uber with M3 and - PowerPoint PPT Presentation

Scaling Automated Database Monitoring at Uber … with M3 and Prometheus Richard Artoul

Agenda 01 Automated database monitoring 02 Why scaling automated monitoring is hard 03 M3 architecture and why it scales so well 04 How you can use M3

Uber’s “Architecture” 4000+ micr crose services ices - 2019 2015 Most of which directly or indirectly depend on storage 22+ storage technologie ies s - Ranging from C* to MySQL 1000’s of dedicated servers running ing database ses s - Monitoring all of these is hard

Monitoring Databases Application Technology Application Hardware

Hardware Level Metrics

Technology Level Metrics E E

Application Level Metrics ● Number of successes, errors, and latency broken down by ○ All queries against a given database ○ All queries issue by a specific service ○ A specific query ■ SELECT * FROM TABLE WHERE USER_ID = ?

What’s so hard about that?

Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per DB cluster 20 Queries per service 10 Metrics per query X 480 million unique time series 100+ million dollars!

Workload 800M 110M Datapoints emitted pre- Writes per second/s aggregation (post replication) 200B 9B Datapoints read per second Unique Metric IDs

How do we do it?

A Brief History of M3 2014-2015: Graphite + WhisperDB ● No replication, operations were ‘cumbersome’ ○ 2015-2016: Cassandra ● Solved operational issues ○ 16x YoY Growth ○ Expensive (> 1500 Cassandra Hosts) ○ Compactions => R.F=2 ○ 2016-Today: M3DB ●

M3DB Overview

M3DB An open source distributed time series database Store arbitrary timestamp precision data points at any resolution for any ● retention Tag (key/value pair) based inverted index ● Optimized file-system storage with minimal need for compactions of time ● series data for real-time workloads

High-Level Architecture Like a Log Structured Merge Tree (LSM) Except a typical LSM has levelled or size based compaction M3DB has time window compaction

Topology and Consistency Strong consistent topology (using etcd) ● No gossip ○ Replicated with zone/rack aware layout and configurable replication ○ factor Consistency managed via synchronous quorum writes and reads ● Configurable consistency level for both read and write ○

Cost Savings and Performance Disk Usage in 2017 ● ~1.4PB for Cassandra at R.F=2 ○ ~200TB for M3DB at R.F=3 ○ Much higher throughput per node with M3DB ● Hundreds of thousands of writes/s on commodity hardware ○

But what about the index?

Centralized Elasticsearch Index Actual time series data was stored in Cassandra and the M3DB ● But indexing of data (for querying) was handled by Elasticsearch ● Worked for us for a long time ● … but scaling writes and reads required a lot of engineering ●

Elasticsearch Index - Write Path M3DB Influx of new metrics: “Don’t write cache” 1. Large Service Indexer Deployment Redis 2. Datacenter Failover m3agg Cache E.S

Elasticsearch Index - Read Path 1 E.S Query 2 M3DB

Elasticsearch Index - Read Path Query Redis Query Cache E.S Need high T.T.L to prevent overwhelming E.S … but high T.T.L means long delay for new time series to become queryable

Elasticsearch Index - Read Path Short TTL Redis Query Cache E.S Short Merge on Query read Redis Query Cache E.S Long Long TTL

Elasticsearch Index - Final Breakdown Two elasticsearch clusters with separate configuration ● Two query caches ● Two don’t -write caches ● A stateful indexing tier that requires consistent hashing, an in-memory ● cache, and breaks everything if you deploy it too quickly Another service just for automatically rotating the short-term elasticsearch ● cluster indexes A background process that’s always running and trying to delete stale ● documents from the long term index

M3 Inverted Index Not nearly as expressive or feature-rich as Elasticsearch ● … but like M3DB, it’s designed for high throughput ● Temporal by nature (T.T.Ls are cheap) ● Fewer moving parts ●

M3 Inverted Index Primary use-case, support queries in the form: ● service = “foo” AND ○ endpoint = “bar” AND ○ client_version regexp matches “3.*” ○ client_version=”3.*” endpoint=”bar” service=”foo” AND → Intersection AND AND client_version=”3.1” OR → Union OR client_version=”3.2” OR

M3 Inverted Index - F.S.Ts The inverted index is similar to Lucene in that it uses F.S.T segments to build an ● efficient and compressed structure for fast regexp . Each time series’ label has its own F.S.T that can be searched to find the offset to ● unpack another data structure that contains the set of metrics IDs associated with a particular label value ( postings list ) Encoded Relationships are --> 4 ate --> 2 see --> 3 Compressed + fast regexp!

M3 Inverted Index - Postings List For every term combination in the form of service=”foo” need to store a set of metric ● IDs (integers) that match, this is called a postings list . Index is broken into blocks and segments , so need to be able to calculate ● intersection (AND - across terms ) and union (OR - within a term ). 12P.M -> 2P.M Index Block Intersect → AND client_version=”3.*” endpoint=”bar” service=”foo” INTERSECT INTERSECT Union → OR client_version=”3.1” Union Primary data structure for the postings list in M3DB client_version=”3.2” is the roaring bitmap Union

M3 Inverted Index - File Structure ────────────────────────────── Time ──────────────────────────────────────── ▶ ┌──────────────────────────────────────┐ │/var/lib/m3db/data/namespace -a/shard- 0│ └───────────────────┬───────────────┬──┴────────────┬───────────────┬───────────────┐ │Fileset File │Fileset File │Fileset File │Fileset File │ │Block │Block │Block │Block │ └───────────────┴───────────────┴───────────────┴───────────────┘ ┌──────────────────────────────────────┐ │/var/lib/m3db/index/namespace - a │ └───────────────────┬──────────────────┴────────────┬───────────────────────────────┐ │Index Fileset File │Index Fileset File │ │Block │Block │ └───────────────────────────────┴───────────────────────────────┘

M3 Summary Time series compression + temporal data and file structures ● Distributed and horizontally scalable by default ● Complexity pushed to the storage layer ●

How you can use M3

M3 and Prometheus Scalable monitoring

M3 and Prometheus My App Prometheus Directly query M3 Grafana using coordinator for M3 single Grafana datasource Coordinator Alerting M3DB M3DB M3DB

Roadmap

Roadmap Bring the Kubernetes operator out of Alpha with further lifecycle management ● Arbitrary out of order writes for writing data into the past and backfilling ● Asynchronous cross region replication (for disaster recovery) ● Evolving M3DB into a generic time series database (think event store) ● Efficient compression of events in the form of Protobuf messages ○

We’re Hiring! Want to work on M3DB? We’re hiring! ● Reach out to me at rartoul@uber.com ○

Scaling Automated Database Monitoring at Uber with M3 and - PowerPoint PPT Presentation

Scaling Automated Database Monitoring at Uber with M3 and Prometheus Richard Artoul Agenda 01 Automated database monitoring 02 Why scaling automated monitoring is hard 03 M3 architecture and why it scales so well 04 How you can use M3

Time Predictions in Uber Eats Zi Wang@Uber QCon New York 2019 June 2019 Agenda 1. ML in Uber

Peeking Beneath the Hood of Uber Le Chen, Alan Mislove, Christo Wilson Northeastern University

STREAM PROCESSING @ UBER DANNY YUAN @ UBER What is Uber Transportation at your fingertips

WHAT I WISH I KNEW BEFORE SCALING UBER TO 1,000 SERVICES MATT RANNEY WHAT I WISH I KNEW BEFORE

Scaling Uber with Node.js Amos Barreto @amos_barreto Uber is everyones Private driver.

The Architecture of Uber's Realtime System March 25, 2015 Amos Barreto Danny Yuan

Tracing polyglot systems An OpenTracing Tutorial Yuri Shkuro (Uber), Won Jun Jang (Uber),

Apache Hadoop Ingestion & Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

Uber & MADD Franchesca Cassanese Victoria Walker Natalia Colon Lee Andrews Uber &

Plug and Play Language Model : A Simple Baseline for Controlled Language Generation ICLR20

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Scaling Ubers Elasticsearch as an Geo-Temporal Database Danny Yuan @ Uber Use Cases for a

Scaling up Near real-time Analytics @ Uber and LinkedIn Who we are Chinmay Soman @ChinmaySoman

Self-Driving Cars As Edge Computing Devices Matt Ranney - @mranney Uber ATG Why Self-Driving?

FESAC Slides Jonathan Hall Chief Economist Uber Technologies Uber Labor Market Primer Prices

Distributed Storage Systems part 2 Marko Vukoli Distributed Systems and Cloud Computing

iSocial meeting Sarunas Girdzijauskas, KTH September 19, Barcelona User: bieuxv.tmp Pass:

Question Answering Biographic Information and Social Networks Powered by the Semantic Web Peter

Distributed Data Parallel Computing: The Sector Perspective on Big Data RobertGrossman July 25,

redis cluster or: distributed systems are hard Jan-Erik Rediger 28. Mai 2015 Hi, Im Jan-Erik

Pangaea: Wide-area File System Taming Aggressive Replication in the Pangaea o Support the daily

RapidChain: Scaling Blockchain via Full Sharding Mahdi Zamani Visa Research Join work with

Secure Scuttlebutt: An Identity-Centric Protocol for Subjective and Decentralized Applications

Scaling Automated Database Monitoring at Uber with M3 and - PowerPoint PPT Presentation

Scaling Automated Database Monitoring at Uber with M3 and Prometheus Richard Artoul Agenda 01 Automated database monitoring 02 Why scaling automated monitoring is hard 03 M3 architecture and why it scales so well 04 How you can use M3

Time Predictions in Uber Eats Zi Wang@Uber QCon New York 2019 June 2019 Agenda 1. ML in Uber

Peeking Beneath the Hood of Uber Le Chen, Alan Mislove, Christo Wilson Northeastern University

STREAM PROCESSING @ UBER DANNY YUAN @ UBER What is Uber Transportation at your fingertips

WHAT I WISH I KNEW BEFORE SCALING UBER TO 1,000 SERVICES MATT RANNEY WHAT I WISH I KNEW BEFORE

Scaling Uber with Node.js Amos Barreto @amos_barreto Uber is everyones Private driver.

The Architecture of Uber's Realtime System March 25, 2015 Amos Barreto Danny Yuan

Tracing polyglot systems An OpenTracing Tutorial Yuri Shkuro (Uber), Won Jun Jang (Uber),

Apache Hadoop Ingestion &amp; Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

Uber &amp; MADD Franchesca Cassanese Victoria Walker Natalia Colon Lee Andrews Uber &amp;

Plug and Play Language Model : A Simple Baseline for Controlled Language Generation ICLR20

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Scaling Ubers Elasticsearch as an Geo-Temporal Database Danny Yuan @ Uber Use Cases for a

Scaling up Near real-time Analytics @ Uber and LinkedIn Who we are Chinmay Soman @ChinmaySoman

Self-Driving Cars As Edge Computing Devices Matt Ranney - @mranney Uber ATG Why Self-Driving?

FESAC Slides Jonathan Hall Chief Economist Uber Technologies Uber Labor Market Primer Prices

Distributed Storage Systems part 2 Marko Vukoli Distributed Systems and Cloud Computing

iSocial meeting Sarunas Girdzijauskas, KTH September 19, Barcelona User: bieuxv.tmp Pass:

Question Answering Biographic Information and Social Networks Powered by the Semantic Web Peter

Distributed Data Parallel Computing: The Sector Perspective on Big Data RobertGrossman July 25,

redis cluster or: distributed systems are hard Jan-Erik Rediger 28. Mai 2015 Hi, Im Jan-Erik

Pangaea: Wide-area File System Taming Aggressive Replication in the Pangaea o Support the daily

RapidChain: Scaling Blockchain via Full Sharding Mahdi Zamani Visa Research Join work with

Secure Scuttlebutt: An Identity-Centric Protocol for Subjective and Decentralized Applications

Apache Hadoop Ingestion & Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

Uber & MADD Franchesca Cassanese Victoria Walker Natalia Colon Lee Andrews Uber &