ClickHouse Deep Dive Aleksei Milovidov ClickHouse use cases A - - PowerPoint PPT Presentation

clickhouse deep dive
SMART_READER_LITE
LIVE PREVIEW

ClickHouse Deep Dive Aleksei Milovidov ClickHouse use cases A - - PowerPoint PPT Presentation

ClickHouse Deep Dive Aleksei Milovidov ClickHouse use cases A stream of events Actions of website visitors Ad impressions DNS queries E-commerce transactions We want to save info about these events and then glean some


slide-1
SLIDE 1

Aleksei Milovidov

ClickHouse Deep Dive

slide-2
SLIDE 2

A stream of events

› Actions of website visitors › Ad impressions › DNS queries › E-commerce transactions › …

We want to save info about these events and then glean some insights from it

ClickHouse use cases

2

slide-3
SLIDE 3

› Interactive queries on data updated in real time › Cleaned structured data is needed › Try hard not to pre-aggregate anything › Query language: a dialect of SQL + extensions

ClickHouse philosophy

3

slide-4
SLIDE 4

Top-10 referers for a website for the last week. SELECT Referer, count(*) AS count FROM hits WHERE CounterID = 111 AND Date BETWEEN ‘2018-04-18’ AND ‘2018-04-24’ GROUP BY Referer ORDER BY count DESC LIMIT 10

Sample query in a web analytics system

4

slide-5
SLIDE 5

Read data fast

› Only needed columns: CounterID, Date, Referer › Locality of reads (an index is needed!) › Data compression

How to execute a query fast?

5

slide-6
SLIDE 6

Read data fast

› Only needed columns: CounterID, Date, Referer › Locality of reads (an index is needed!) › Data compression

Process data fast

› Vectorized execution (block-based processing) › Parallelize to all available cores and machines › Specialization and low-level optimizations

How to execute a query fast?

6

slide-7
SLIDE 7

The principle is the same as with classic DBMSes A majority of queries will contain conditions on
 CounterID and (possibly) Date (CounterID, Date) fits the bill Check this by mentally sorting the table by primary key Differences

› The table will be physically sorted on disk › Is not a unique constraint

Index needed!

7

slide-8
SLIDE 8

Index internals

8

222 … 2018-04-20 2017-07-22 … 111 111 111 … 2017-10-04 … 2013-02-16 222 2013-03-12 primary.idx CounterID (One entry each 8192 rows) Date Referer N N+8192 N+16384 (CounterID, Date) .mrk .mrk .mrk .bin .bin .bin

slide-9
SLIDE 9

Index is sparse

› Must fit into memory › Default value of granularity (8192) is good enough › Does not create a unique constraint › Performance of point queries is not stellar

Table is sorted according to the index

› There can be only one › Using the index is always beneficial

Things to remember about indexes

9

slide-10
SLIDE 10

Inserted events are (almost) sorted by time But we need to sort by primary key! MergeTree: maintain a small set of sorted parts Similar idea to an LSM tree

How to keep the table sorted

10

slide-11
SLIDE 11

How to keep the table sorted

11

Primary key Part

  • n disk

To insert Insertion number M N N+1

slide-12
SLIDE 12

How to keep the table sorted

12

Primary key Part

  • n disk

Part

  • n disk

Insertion number M N N+1

slide-13
SLIDE 13

How to keep the table sorted

13

Insertion number Primary key Part [M, N] Part [N+1] Merge in the background M N N+1

slide-14
SLIDE 14

How to keep the table sorted

14

Part [M, N+1] Insertion number Primary key M N+1

slide-15
SLIDE 15

Replace/update records

› ReplacingMergeTree › CollapsingMergeTree

Pre-aggregate data

› AggregatingMergeTree

Metrics rollup

› GraphiteMergeTree

Things to do while merging

15

slide-16
SLIDE 16

ENGINE = MergeTree … PARTITION BY toYYYYMM(Date)

› Table can be partitioned by any expression (default: by month) › Parts from different partitions are not merged › Easy manipulation of partitions



 ALTER TABLE DROP PARTITION
 ALTER TABLE DETACH/ATTACH PARTITION

MinMax index by partition columns


MergeTree partitioning

16

slide-17
SLIDE 17

Merging runs in the background

› Even when there are no queries!

Control total number of parts

› Rate of INSERTs › MaxPartsCountForPartition and DelayedInserts

metrics are your friends

Things to remember about MergeTree

17

slide-18
SLIDE 18

› The data won’t fit on a single server… › You want to increase performance by adding more servers… › Multiple simultaneous queries are competing for resources…

When one server is not enough

18

slide-19
SLIDE 19

› The data won’t fit on a single server… › You want to increase performance by adding more servers… › Multiple simultaneous queries are competing for resources…

ClickHouse: Sharding + Distributed tables!

When one server is not enough

19

slide-20
SLIDE 20

Reading from a Distributed table

20

Shard 1 Shard 2 Shard 3 SELECT FROM distributed_table GROUP BY column SELECT FROM local_table GROUP BY column

slide-21
SLIDE 21

Reading from a Distributed table

21

Shard 1 Shard 2 Shard 3 Full result Partially aggregated result

slide-22
SLIDE 22

CSV 227 Gb, ~1.3 bln rows SELECT passenger_count, avg(total_amount)
 FROM trips GROUP BY passenger_count

NYC taxi benchmark

22

Shards 1 3 140 Time, s. 1,224 0,438 0,043 Speedup x2.8 x28.5

slide-23
SLIDE 23

Inserting into a Distributed table

23

Shard 1 Shard 2 Shard 3 INSERT INTO distributed_table

slide-24
SLIDE 24

Inserting into a Distributed table

24

Shard 1 Shard 2 Shard 3 Async insert into shard # sharding_key % 3 INSERT INTO local_table

slide-25
SLIDE 25

Inserting into a Distributed table

25

Shard 1 Shard 2 Shard 3 Split by sharding_key and insert SET insert_distributed_sync=1; INSERT INTO distributed_table…;

slide-26
SLIDE 26

It is just a view

› Doesn’t store any data by itself

Will always query all shards
 Ensure that the data is divided into shards uniformly

› either by inserting directly into local tables › or let the Distributed table do it


(but beware of async inserts by default)

Things to remember about Distributed tables

26

slide-27
SLIDE 27

› Protection against hardware failure › Data must be always available for reading and writing

When failure is not an option

27

slide-28
SLIDE 28

› Protection against hardware failure › Data must be always available for reading and writing

ClickHouse: ReplicatedMergeTree engine!

› Async master-master replication › Works on per-table basis

When failure is not an option

28

slide-29
SLIDE 29

Replication internals

29

Replica 1 Replica 2 Replica 3 merge Replication queue (ZooKeeper) Inserted block number fetch fetch INSERT merge

slide-30
SLIDE 30

What happens in case of network failure (partition)?

› Not consistent❋


As is any system with async replication ❋But you can turn linearizability on

› Highly available (almost)❋


Tolerates the failure of one datacenter, if ClickHouse replicas
 are in min 2 DCs and ZK replicas are in 3 DCs. ❋A server partitioned from ZK quorum is unavailable for writes

Replication and the CAP–theorem

30

slide-31
SLIDE 31

Putting it all together

31

Shard 1 Replica 1 Shard 2 Replica 1 Shard 3 Replica 1 Shard 1 Replica 2 Shard 2 Replica 2 Shard 3 Replica 2 SELECT FROM distributed_table SELECT FROM replicated_table

slide-32
SLIDE 32

Use it!

› Replicas check each other › Unsure if INSERT went through?


Simply retry - the blocks will be deduplicated

ZooKeeper needed, but only for INSERTs 
 (No added latency for SELECTs)

Monitor replica lag

› system.replicas and system.replication_queue

tables are your friends

Things to remember about replication

32

slide-33
SLIDE 33

› Column–oriented › Fast interactive queries on real time data › SQL dialect + extensions › Bad fit for OLTP, Key–Value, blob storage › Scales linearly › Fault tolerant › Open source!

Brief recap

33

slide-34
SLIDE 34

Questions? Or reach us at:

› clickhouse-feedback@yandex-team.com › Telegram: https://t.me/clickhouse_en › GitHub: https://github.com/yandex/ClickHouse/ › Google group: https://groups.google.com/group/clickhouse

Thank you

34