Clickhouse at MessageBird Analysing billions of events in real-time* - - PowerPoint PPT Presentation

clickhouse at messagebird
SMART_READER_LITE
LIVE PREVIEW

Clickhouse at MessageBird Analysing billions of events in real-time* - - PowerPoint PPT Presentation

Clickhouse at MessageBird Analysing billions of events in real-time* Aleksandar Aleksandrov & Flix Mattrat NOVEMBER 2018 About us Data engineers & Team leads Aleksandar Flix Aleksandrov Mattrat 2 225+ Agreements ABOUT


slide-1
SLIDE 1

NOVEMBER 2018

Aleksandar Aleksandrov & Félix Mattrat

Clickhouse at MessageBird

Analysing billions of events in real-time*

slide-2
SLIDE 2

2

About us

Aleksandar Aleksandrov Félix Mattrat

Data engineers & Team leads

slide-3
SLIDE 3

ABOUT

Introducing MessageBird

MessageBird is a cloud communications platform that empowers consumers to communicate with your business in the same way they communicate with their friends - seamlessly, on their own timeline and with the context of previous conversations.

225+ Agreements

We have 225+ direct-to-carrier agreements with operators worldwide.

15,000+ Customers

Customers in over 60+ countries, across a great variety of industries.

180+ Employees

More than 180 employees speaking over 20 languages based in the Americas, Europe & Asia.

3

For additional information visit: www.messagebird.com

slide-4
SLIDE 4

Data at MessageBird The past - Age Of Darkness Enlightenment - ClickHouse use case What’s next? - Nirvana

What’s on the menu?

01. 02. 03. 4 04.

slide-5
SLIDE 5

5

Internal needs External needs

  • State of the system
  • Routing SMS
  • Training algorithms
  • ML Models
  • Customer dashboard
  • Reporting API

Mostly about statistics and reporting

DATA AT MESSAGEBIRD

Needs

slide-6
SLIDE 6

6

DATA AT MESSAGEBIRD

The landscape

  • Multiple carriers is messy - no uniformity of the data
  • SMS messages go through many state changes up to months into

the past

  • Pricing (both carrier and customer) changes retro-actively
slide-7
SLIDE 7

7

Age of Darkness

slide-8
SLIDE 8

8

  • MySQL based
  • Aggregates re-computed every X period
  • f time
  • Served us well for +5 years

AGE OF DARKNESS

Hello CRON my old friend

slide-9
SLIDE 9

9

AGE OF DARKNESS

Scaling problems

  • The system had difficulty scaling and was often lagging
  • Loss of granularity with pre-aggregation
  • Performed poorly while doing analytical queries
  • Inaccuracies
slide-10
SLIDE 10

10

ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE

Re-thinking data collection

  • Able to keep up with continuously changing SMS message states
  • In real time*
  • Scalable to handle MessageBird’s global growth
  • More flexible to accommodate wider use of data
slide-11
SLIDE 11

11

ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE

Introducing event sourcing

  • Event sourcing, fairly common technique
  • An immutable stream of events from which all states can be derivate
slide-12
SLIDE 12

12

slide-13
SLIDE 13

13

ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE

Introducing event sourcing

  • Problem: now we have increased our data by an order of magnitude.
  • How can we query this efficiently?
slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

  • Able to ingest large amount of data
  • Data available immediately after ingestion
  • No loss of granularity
  • Flexible querying capabilities
  • Sub-second response time
  • Horizontally scalable

ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE

What is our unicorn database?

slide-16
SLIDE 16

16

ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE

Vitess

  • Let’s shard the data
  • Now we have N shards of problems
  • Still has the limitations of MySQL
  • Poor analytical support (at the time)
slide-17
SLIDE 17

17

ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE

Kudu/Impala

  • Promising, very clean and well defined SQL

interface

  • Compatible with HDFS & Parquets
  • Column oriented
  • But unable to reach sub-second querying time
  • ver billions of rows
slide-18
SLIDE 18

18

ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE

Google BigQuery

  • Scale well, millions or billions doesn’t matter
  • Fully managed: it’s someone else problem
  • Standard SQL support
  • Not open source
  • Not made for sub-second querying
slide-19
SLIDE 19

19

ENLIGHTENMENT - CLICKHOUSE USE CASE

ClickHouse

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

ENLIGHTENMENT - CLICKHOUSE USE CASE

ClickHouse

  • Able to ingest a huge amount of data
  • Sub-second on large dataset of non-aggregated data
  • Flexible query capabilities: SQLish dialect
  • Column oriented
  • Scales very well vertically
  • Horizontally scalable
  • Open source
slide-22
SLIDE 22

SELECT toStartOfQuarter(created_at) AS Quarter, mcc AS Country, floor(sum(sign * rate)) AS Total, sum(sign) AS MessageCount FROM messages WHERE created_at >= '2018-01-01' AND customer = 666 GROUP BY Quarter, Country

30 rows in set. Elapsed: 0.33sec. Processed 497.91 million rows, 4.95 GB (1.42 billions rows/s., 14.39 GB/s.)

22

ENLIGHTENMENT - CLICKHOUSE USE CASE

ClickHouse

slide-23
SLIDE 23

23

ENLIGHTENMENT - CLICKHOUSE USE CASE

ClickHouse what’s the trick?

  • Column oriented, you only pay for what you select
  • Each column can potentially be processed in parallel
  • Carefully crafted code makes use of vectorisation instructions
  • Different table engines fit for different needs
  • Horizontally scalable
slide-24
SLIDE 24

24

So, how to ingest ever changing data into ClickHouse

slide-25
SLIDE 25

25

ENLIGHTENMENT - CLICKHOUSE USE CASE

CollapsingMergeTree

  • You write twice the amount of data, but eventually end up with a

single row per PK

  • Based on the idea of log compaction
  • Excels at analytical queries on a large amount of data
slide-26
SLIDE 26

26

sign date id status price

1 2018-10-08 666 ACCEPTED 0.01

ENLIGHTENMENT - CLICKHOUSE USE CASE

Collapsing what?

Primary key style

slide-27
SLIDE 27

27

sign date id status price

1 2018-10-08 666 ACCEPTED 0.01

  • 1

2018-10-08 666 ACCEPTED 0.01

ENLIGHTENMENT - CLICKHOUSE USE CASE

Collapsing what?

slide-28
SLIDE 28

28

sign date id status price

1 2018-10-08 666 ACCEPTED 0.01

  • 1

2018-10-08 666 ACCEPTED 0.01 1 2018-10-08 666 DELIVERED 0.05

ENLIGHTENMENT - CLICKHOUSE USE CASE

Collapsing what?

slide-29
SLIDE 29

29

SELECT sum(sign * price) AS total FROM dataset

sign date id status price

1 2018-10-08 666 ACCEPTED 0.01

  • 1

2018-10-08 666 ACCEPTED 0.01 1 2018-10-08 666 DELIVERED 0.05

ENLIGHTENMENT - CLICKHOUSE USE CASE

Collapsing what?

slide-30
SLIDE 30

30

sign price sign * price

1 0.01 0.01

  • 1

0.01

  • 0.01

1 0.05 0.05

0.05 0.05

ENLIGHTENMENT - CLICKHOUSE USE CASE

Collapsing what?

SELECT sum(sign * price) AS total FROM dataset

slide-31
SLIDE 31

sign date id status price

1 2018-10-08 666 ACCEPTED 0.01

  • 1

2018-10-08 666 ACCEPTED 0.01 1 2018-10-08 666 DELIVERED 0.05

ENLIGHTENMENT - CLICKHOUSE USE CASE

Collapsing what?

slide-32
SLIDE 32

32

How to insert the proper “negative” row?

slide-33
SLIDE 33

33

ENLIGHTENMENT - CLICKHOUSE USE CASE

CollapsingMergeTree, keeping track

  • f states
  • Need to be aware of the previous row to properly negate it
  • ClickHouse is not made for random access of single rows
slide-34
SLIDE 34

34

slide-35
SLIDE 35

35

What about availability?

slide-36
SLIDE 36

36

ENLIGHTENMENT - CLICKHOUSE USE CASE

Replication

  • High availability and reliability
  • To bring data closer to consumer
  • More than one way to do it with ClickHouse
slide-37
SLIDE 37

37

  • Is supported by the MergeTree table family
  • ReplicatedCollapsingMergeTree
  • ReplicatedAggregatingMergeTree
  • Uses Zookeeper to coordinate the replication between nodes

ENLIGHTENMENT - CLICKHOUSE USE CASE

ReplicatedMergeTree*

slide-38
SLIDE 38

38

slide-39
SLIDE 39

39

ClickHouse scalability?

slide-40
SLIDE 40

40

ENLIGHTENMENT - CLICKHOUSE USE CASE

Horizontal scalability

  • Distributed engine
  • Dispatch read queries to all the nodes
  • Shard the data and dispatch it to the right node
  • Flexible sharding capabilities
  • Let ClickHouse do the work
  • Shard manually: inserting directly into the wanted node and 

  • nly use the distributed engine to dispatch read queries
slide-41
SLIDE 41

41

slide-42
SLIDE 42

42

ENLIGHTENMENT - CLICKHOUSE USE CASE

Vertical scalability

  • Very efficient use of available CPU
  • Data is on one machine (or even in memory) makes queries even

faster

  • You don’t care about sharding of the data, operations can be done on

local table

  • Generally accepted to have more CPU, rather than more servers
slide-43
SLIDE 43

43

ClickHouse in production?

slide-44
SLIDE 44

44

ENLIGHTENMENT - CLICKHOUSE USE CASE

Our setup

  • Single region
  • Two availability zones
  • 8 CPU/30 GB RAM
  • 2TB+ compressed
  • 10 nodes
  • Replica factor 2
slide-45
SLIDE 45

45

ENLIGHTENMENT - CLICKHOUSE USE CASE

How far ClickHouse took us

  • Between the moment we designed and implemented our fist data

pipeline with ClickHouse from an average of 1000 events/s to 10000+ event/s without having to scale the cluster.

  • Most of MessageBird products’ data is in ClickHouse
slide-46
SLIDE 46

46

ClickHouse is a skyscraper without guard-rails: it will take you far but be cautious.

slide-47
SLIDE 47

47

ENLIGHTENMENT - CLICKHOUSE USE CASE

Don’t forget it’s not a RDBMS

  • Eventual consistency *
  • No transactions
  • A single non unique primary index
  • Limited support of JOIN
  • Experimental features are experimental FOR REAL that stuff will break
  • Resharding isn’t out-of-the-box
  • Not made for deleting/updating random rows
slide-48
SLIDE 48

48

ENLIGHTENMENT - CLICKHOUSE USE CASE

ClickHouse among many

  • ClickHouse is still one among many
  • Dictionaries: periodically refreshed view of external databases
  • JDBC/OBDC drivers, remote/local file, custom executable
  • Non standard SQL can make third party like business

intelligence tools integration can be challenging

slide-49
SLIDE 49

49

ENLIGHTENMENT - CLICKHOUSE USE CASE

PostgreSQL + ClickHouse

slide-50
SLIDE 50

50

SELECT sum(sign * price) AS total FROM dataset WHERE dataset = 666 SELECT sign, price AS total FROM dataset ENLIGHTENMENT - CLICKHOUSE USE CASE

Query forwarding

slide-51
SLIDE 51

51

SELECT sum(sign * price) AS total FROM dataset WHERE dataset = 666 SELECT sign, price AS total FROM dataset WHERE dataset = 666 ENLIGHTENMENT - CLICKHOUSE USE CASE

Query forwarding and push down

slide-52
SLIDE 52

52

ENLIGHTENMENT - CLICKHOUSE USE CASE

PostgreSQL + ClickHouse, looping the loop

  • Instantly gain to one of the most standard SQL interface
  • Still leverage the most important feature of ClickHouse by pushing

down the filters and aggregations

  • Bastion like approach to share data with third-party BI tools
slide-53
SLIDE 53

53

slide-54
SLIDE 54

54

ENLIGHTENMENT - CLICKHOUSE USE CASE

PostgreSQL + ClickHouse, looping the loop

  • Almost out-of-the-box data federation
  • But only a PoC, we are still dreaming of production

Did we say we are hiring?

slide-55
SLIDE 55

55

FUTURE - WRAPPING WORDS

Even more possibilities

  • ML features with catboost
  • Kafka base table engine
  • Upcoming better JOIN supports
  • Cap’n Proto and upcoming Protobuf / Parquet support
slide-56
SLIDE 56

Questions

Late questions? Come say hello or drop us an email. aleksandar@messagebird.com & felix@messagebird.com

www.messagebird.com/careers

slide-57
SLIDE 57

Rate the session