Altinity Building Multi-Petabyte Data Warehouses with ClickHouse - PowerPoint PPT Presentation

Altinity Building Multi-Petabyte Data Warehouses with ClickHouse Alexander Zaitsev LifeSteet, Altinity Percona Live Dublin, 2017

Who am I • Graduated Moscow State University in 1999 • Software engineer since 1997 • Developed distributed systems since 2002 • Focused on high performance analytics since 2007 • Director of Engineering in LifeStreet • Co-founder of Altinity

• Ad Tech company (ad exchange, ad server, RTB, DMP etc.) since 2006 • 10,000,000,000+ events/day • 2K/event • 3 months retention (90-120 days) 10B * 2K * [90-120] = [1.8-2.4]PB

Tried/used/evaluated: • – MySQL (TokuDB, ShardQuery) – InfiniDB ClickHouse – MonetDB – InfoBright EE – Paraccel (now RedShift) – Oracle – Greenplum – Snowflake DB – Vertica

Flashback: ClickHouse at 08/2016 • 1-2 months in Open Source • Internal Yandex product – no other installations • No support, roadmap, communicated plans • 3 official devs • A number of visible limitations (and many invisible) • Stories of other doomed open-sourced DBs

Develop production system with “that”?

• Transactions • Constraints • Consistency • UPDATE/DELETE ClickHouse • NULLs (added few months ago) is/was • Milliseconds missing: • Implicit type conversions • Standard SQL support • Partitioning by any column (date only) • Enterprise operation tools

SQL developers reaction:

But we tried and succeeded

Before you go: ü Confirm your use case ü Check benchmarks ü Run your own ü Consider limitations, not features ü Make a POC

Migration problem: basic things do n

Main Challenges • Efficient schema – Use ClickHouse bests – Workaround limitations • Reliable data ingestion • Sharding and replication • Client interfaces

LifeStreet Use Case • Publisher/Advertiser performance • Campaign/Creative performance prediction • Realtime algorithmic bidding • DMP

LifeStreet Requirements • Load 10B rows/day, 500 dimensions/row • Ad-hoc reports on 3 months of data • Low data and query latency • High Availability

Multi-Dimensional Analysis OLAP query: aggregation + filter + group by N-dimensional Range filter cube Query result slice M- dimensional Disclaimer: averages lie projection

Typical schema: “star” • Facts • Dimensions • Metrics • Projections

Star Schema Approach De-normalized: Normalized: dimensions in a fact table dimension keys in a fact table separate dimension tables Single table, simple Multiple tables Simple queries, no joins More complex queries with joins Data can not be changed Data in dimension tables can be changed Sub-efficient storage Efficient storage Sub-efficient queries More efficient queries

Normalized schema: traditional approach - joins • Limited support in ClickHouse (1 level, cascade sub-selects for multiple) • Dimension tables are not updatable

Dictionaries - ClickHouse dimensions approach • Lookup service: key -> value • Supports different external sources (files, databases etc.) • Refreshable

Dictionaries. Example SELECT country_name, sum(imps) FROM T ANY INNER JOIN dim_geo USING (geo_key) GROUP BY country_name; vs SELECT dictGetString(‘dim_geo’, ‘country_name’, geo_key) country_name, sum(imps) FROM T GROUP BY country_name;

Dictionaries. Configuration <dictionary> <name></name> <source> … </source> <lifetime> ... </lifetime> <layout> … </layout> <structure> <id> ... </id> <attribute> ... </attribute> <attribute> ... </attribute> ... </structure> </dictionary>

Dictionaries. Sources • file • mysql table • clickhouse table • odbc data source • executable script • http service

Dictionaries. Layouts • flat • hashed • cache • complex_key_hashed • range_hashed

Dictionaries. range_hashed • ‘Effective Dated’ queries <layout> <range_hashed /> </layout> dictGetFloat32('srv_ad_serving_costs', <structure> 'ad_imps_cpm', toUInt64(0), event_day) <id> <name>id</name> </id> <range_min> <name>start_date</name> </range_min> <range_max> <name>end_date</name> </range_max>

Dictionaries. Update values • By timer (default) • Automatic for MySQL MyISAM • Using ‘invalidate_query’ • Manually touching config file • Warning: N dict * M nodes = N * M DB connections

Dictionaries. Restrictions • ‘Normal’ keys are only UInt64 • No on demand update (added in Sep 2017 1.1.54289) • Every cluster node has its own copy • XML config (DDL would be better)

Dictionaries vs. Tables + No JOINs + Updatable + Always in memory for flat/hash (faster) - Not a part of the schema - Somewhat inconvenient syntax

Tables • Engines • Sharding/Distribution • Replication

Engine = ? In memory: • Interface: • – Distributed – Memory – Merge – Buffer – Dictionary – Join Special purpose: • – Set – View On disk: • – Materialized View – Log, TinyLog – Null – MergeTree family

Merge tree • What is ‘merge’ Block 1 Block 2 • PK sorting and index • Date partitioning Merged block • Query performance PK index See details at: https://medium.com/@f1yegor/clickhouse-primary-keys-2cf2a45d7324

MergeTree family Replacing Collapsing Replicated + + MergeTree Summing Aggergating Graphite

Data Load • Multiple formats are supported, including CSV, TSV, JSONs, native binary • Error handling • Simple Transformations • Load locally (better) or distributed (possible) • Temp tables help • Replicated tables help with de-dup

The power of Materialized Views • MV is a table, i.e. engine, replication etc. • Updated synchronously • Summing/AggregatingMergeTree – consistent aggregation • Alters are problematic

Data Load Diagram Realtime producers Log Files MySQL INSERT Buffer tables INSERT Temp tables (local) (local) INSERT Buffer flush Fact tables (shard) Dictionaries MV MV SummingMergeTree SummingMergeTree (shard) (shard) CLICKHOUSE NODE

Updates and deletes • Dictionaries are refreshable • Replacing and Collapsing merge trees – eventually updates – SELECT … FINAL • Partitions

Sharding and Replication • Sharding and Distribution => Performance – Fact tables and MVs – distributed over multiple shards – Dimension tables and dicts – replicated at every node (local joins and filters) • Replication => Reliability – 2-3 replicas per shard – Cross DC

Distributed Query SELECT foo FROM distributed_table GROUP by col1 Server 1, 2 or 3 SELECT foo FROM local_table GROUP BY col1 • Server 1 SELECT foo FROM local_table GROUP BY col1 • Server 2 SELECT foo FROM local_table GROUP BY col1 • Server 3

Replication • Per table topology configuration: – Dimension tables – replicate to any node – Fact tables – replicate to mirror replica • Zookeper to communicate the state – State: what blocks/parts to replicate • Asynchronous => faster and reliable enough • Synchronous => slower • Isolate query to replica • Replication queues

SQL • Supports basic SQL syntax • Non-standard JOINs implementation: – 1 level only – ANY vs ALL – only USING • Aliasing everywhere • Array and nested data types, lambda-expressions, ARRAY JOIN • GLOBAL IN, GLOBAL JOIN • Approximate queries • Some analytical functions

Hardware and Deployment • Load is CPU intensive => more cores • Query is disk intensive => faster disks • 10-12 SATA RAID10 – SAS/SSD => x2 performance for x2 price for x0.5 capacity • 10 TB/server seems optimal • Zookeper – keep in on DC for fast quorum • Remote DC work bad (e.g. East an West coast in US)

Main Challenges Revisited • Design efficient schema – Use ClickHouse bests – Workaround limitations • Design sharding and replication • Reliable data ingestion • Client interfaces

Migration project timelines August 2016: POC • October 2016: first test runs • December 2016: production scale data load: • – 10-50B events/ day, 20TB data/day – 12 x 2 servers with 12x4TB RAID10 March 2017: Client API ready, starting migration • – 30+ client types, 20 req/s query load May 2017: extension to 20 x 3 servers • June 2017: migration completed! • – 2-2.5PB uncompressed data

Few examples :) select count(*) from dw.ad8_fact_event where access_day=today()-1; SELECT count(*) FROM dw.ad8_fact_event WHERE access_day = (today() - 1) ┌────count()─┐ │ 7585106796 │ └────────────┘ 1 rows in set. Elapsed: 0.503 sec. Processed 12.78 billion rows, 25.57 GB (25.41 billion rows/s., 50.82 GB/s.)

Altinity Building Multi-Petabyte Data Warehouses with ClickHouse - PowerPoint PPT Presentation

Altinity Building Multi-Petabyte Data Warehouses with ClickHouse Alexander Zaitsev LifeSteet, Altinity Percona Live Dublin, 2017 Who am I Graduated Moscow State University in 1999 Software engineer since 1997 Developed distributed

The First Billion Rows Alexander Zaitsev and Robert Hodges About Us Robert Hodges - Altinity CEO

ClickHouse In Real Life Case Studies and Best Practices Alexander Zaitsev, LifeStreet/Altinity

CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE ANALYTICS Robert Hodges --

ClickHouse for Time-Series Alexander Zaitsev Agenda What is special about time series What is

Introducing Erlang to OpenX Anthony Molinaro Sunday, September 22, 13 What is OpenX? An

Marcel de Vries CTO Xpirit @marcelv @marcelv Lets sell music online! @marcelv Assemble a team

Secure and Efficient Metering Moni Naor and Benny Pinkas Eurocrypt '98 Contents Motivation

Firewall Configuration 1. As a first step, check to see if the Splunk universal forwarder is

How I Stopped Worrying and Learned to Love Open Source David Cleary Progress Progress Who? 3

Rajesh Ponnurangam Infosol The Four Big Steps 1 Preparing for the Upgrade 2 Installing BI

Single sign-on enabled OpenCms Architecture for Single sign-on implementation into OpenCms

14/05/2018 Online advertising revenue in the United States from 2000 to 2016 US Display Ad

Defense Enterprise Computing Alfred J. Rivera Director, Computing Services DISA 24 May 2011

Applications. Archan Misra, Singapore Management University archanm@smu.edu.sg Feb 17, 2012

An Integrated Active Hybrid Filter for ADSL Jim Hellums, Ph.D. TI Fellow Seminar for IEEE May

ADSL Area de Ingeniera Telemtica http://www.tlm.unavarra.es Redes 4 Ingeniera

ADSL Surasak Sanguanpong nguan@ku.ac.th http://www.cpe.ku.ac.th/~nguan Last updated: 9 Feb 2001

(DSL) ETI 2506 TELECOMMUNICATION SYSTEMS Monday, 10 October 2016 1 COURSE OUTLINE (5) 2

UNDERSTANDING INTERNET USAGE AND NETWORK LOCALITY IN A RURAL COMMUNITY WIRELESS MESH NETWORK

Relational reasoning using concurrent separation logic Robbert Krebbers 1 Delft University of

Asia Broadband Summit (ABS) Toru Takahashi IAjapan Origin of ABS Dr. Kilnam Chon suggested

Competition Issues Arising From Communication Over the Internet Henry ERGAS Key points A

Jem Taylor Head of Strategy & Development Learning & Information Services UHI Millennium

Semileptonic Mixing Asymmetry Measurements of A d SL and A s SL Martino Margoni Universita` di

Altinity Building Multi-Petabyte Data Warehouses with ClickHouse - PowerPoint PPT Presentation

Altinity Building Multi-Petabyte Data Warehouses with ClickHouse Alexander Zaitsev LifeSteet, Altinity Percona Live Dublin, 2017 Who am I Graduated Moscow State University in 1999 Software engineer since 1997 Developed distributed

The First Billion Rows Alexander Zaitsev and Robert Hodges About Us Robert Hodges - Altinity CEO

ClickHouse In Real Life Case Studies and Best Practices Alexander Zaitsev, LifeStreet/Altinity

CLICKHOUSE MATERIALIZED VIEWS A SECRET WEAPON FOR HIGH PERFORMANCE ANALYTICS Robert Hodges --

ClickHouse for Time-Series Alexander Zaitsev Agenda What is special about time series What is

Introducing Erlang to OpenX Anthony Molinaro Sunday, September 22, 13 What is OpenX? An

Marcel de Vries CTO Xpirit @marcelv @marcelv Lets sell music online! @marcelv Assemble a team

Secure and Efficient Metering Moni Naor and Benny Pinkas Eurocrypt '98 Contents Motivation

Firewall Configuration 1. As a first step, check to see if the Splunk universal forwarder is

How I Stopped Worrying and Learned to Love Open Source David Cleary Progress Progress Who? 3

Rajesh Ponnurangam Infosol The Four Big Steps 1 Preparing for the Upgrade 2 Installing BI

Single sign-on enabled OpenCms Architecture for Single sign-on implementation into OpenCms

14/05/2018 Online advertising revenue in the United States from 2000 to 2016 US Display Ad

Defense Enterprise Computing Alfred J. Rivera Director, Computing Services DISA 24 May 2011

Applications. Archan Misra, Singapore Management University archanm@smu.edu.sg Feb 17, 2012

An Integrated Active Hybrid Filter for ADSL Jim Hellums, Ph.D. TI Fellow Seminar for IEEE May

ADSL Area de Ingeniera Telemtica http://www.tlm.unavarra.es Redes 4 Ingeniera

ADSL Surasak Sanguanpong nguan@ku.ac.th http://www.cpe.ku.ac.th/~nguan Last updated: 9 Feb 2001

(DSL) ETI 2506 TELECOMMUNICATION SYSTEMS Monday, 10 October 2016 1 COURSE OUTLINE (5) 2

UNDERSTANDING INTERNET USAGE AND NETWORK LOCALITY IN A RURAL COMMUNITY WIRELESS MESH NETWORK

Relational reasoning using concurrent separation logic Robbert Krebbers 1 Delft University of

Asia Broadband Summit (ABS) Toru Takahashi IAjapan Origin of ABS Dr. Kilnam Chon suggested

Competition Issues Arising From Communication Over the Internet Henry ERGAS Key points A

Jem Taylor Head of Strategy &amp; Development Learning &amp; Information Services UHI Millennium

Semileptonic Mixing Asymmetry Measurements of A d SL and A s SL Martino Margoni Universita` di

Jem Taylor Head of Strategy & Development Learning & Information Services UHI Millennium