The State of Databases in 2019 Dinesh A. Joshi @dineshjoshi - - PowerPoint PPT Presentation

the state of databases in 2019
SMART_READER_LITE
LIVE PREVIEW

The State of Databases in 2019 Dinesh A. Joshi @dineshjoshi - - PowerPoint PPT Presentation

The State of Databases in 2019 Dinesh A. Joshi @dineshjoshi dinesh.joshi@gatech.edu apache cassandra About Me Senior Software Engineer Apache Cassandra Committer > 10 YoE in Distributed Systems MS CS (Distributed Systems),


slide-1
SLIDE 1

The State of Databases in 2019

Dinesh A. Joshi

@dineshjoshi dinesh.joshi@gatech.edu

apache cassandra

slide-2
SLIDE 2

About Me

  • Senior Software Engineer
  • Apache Cassandra Committer
  • > 10 YoE in Distributed Systems
  • MS CS (Distributed Systems), Georgia Tech, Atlanta, USA
slide-3
SLIDE 3

Data Trends 📋

slide-4
SLIDE 4

Data Growth

Source: https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf
slide-5
SLIDE 5

Data Criticality

Source: https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf
slide-6
SLIDE 6

Data Growth Fuel ⛽

  • Embedded Devices
  • IoT
  • Sensors
  • Wearables

Time Series!

slide-7
SLIDE 7
slide-8
SLIDE 8 Source: https://www.datameer.com/blog/big-data-ecosystem/ apache cassandra
slide-9
SLIDE 9

Database Landscape 2019

slide-10
SLIDE 10

Choices? 🧑

350+ !!!

slide-11
SLIDE 11

Operators & Developers

slide-12
SLIDE 12

Operators & Developers

Developers Operators Both

slide-13
SLIDE 13

Not always aligned!

slide-14
SLIDE 14

Cascading Costs

DB Access Layer Services (REST, GRPC) UI / Presentation $$$ $

slide-15
SLIDE 15
slide-16
SLIDE 16

Polyglot Persistence

Source: https://en.wikipedia.org/wiki/Polyglot_persistence

Polyglot persistence is the concept of using different data storage technologies to handle different data storage needs within a given software application – Wikipedia

slide-17
SLIDE 17

Polyglot Persistence

Source: https://www.infoq.com/presentations/microservices-polyglot-persistence
slide-18
SLIDE 18

Polyglot Persistence

Source: https://www.infoq.com/presentations/microservices-polyglot-persistence
slide-19
SLIDE 19

Database Landscape 2019

slide-20
SLIDE 20

Landscape 2019

  • Relational
  • NoSQL
  • NewSQL
  • Graph
  • Time Series
  • Document Stores
  • Search Engines
  • In Memory
slide-21
SLIDE 21

Relational Databases

slide-22
SLIDE 22

Relational Databases

slide-23
SLIDE 23

Relational Databases

  • Data is Relational
  • Joins
  • Transactions
  • SQL is well known
  • Dataset fits
slide-24
SLIDE 24

NoSQL Databases

slide-25
SLIDE 25

NoSQL Databases

  • Key-Value Stores
  • Wide Column Stores
  • Document Stores
  • Graph DBMS
  • RDF Stores
  • Native XML DBMS
  • Content Stores
  • Search Engines
slide-26
SLIDE 26

NoSQL Databases

LevelDB

slide-27
SLIDE 27

Industry Trends

slide-28
SLIDE 28

SQL

Source: Google Trends
slide-29
SLIDE 29

Relational

Source: https://db-engines.com/en/ranking_categories
slide-30
SLIDE 30

Graph vs Relational

Source: https://db-engines.com/en/ranking_categories
slide-31
SLIDE 31

Time Series, Wide Column DBs

Source: https://db-engines.com/en/ranking_categories
slide-32
SLIDE 32

Popularity Trends

Source: https://db-engines.com/en/ranking_categories
slide-33
SLIDE 33

Popularity Trends (All)

Source: https://db-engines.com/en/ranking_categories
slide-34
SLIDE 34

Apache Cassandra

apache cassandra

slide-35
SLIDE 35

Manage massive amounts of data, fast, without losing sleep!

apache cassandra

Source: http://cassandra.apache.org/
slide-36
SLIDE 36

1500+

slide-37
SLIDE 37

What is MASSIVE Scale?

  • 75000+ nodes
  • 10+ PBs of data
  • Over 1 Trillion requests / day

apache cassandra

DURABLE

Source: http://cassandra.apache.org/
slide-38
SLIDE 38

What is FAST?

apache cassandra

Source: http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf

LINEAR SCALABILITY

slide-39
SLIDE 39

RELIABILITY?

  • No SPoFs
  • Decentralized
  • Shared-nothing architecture

apache cassandra

slide-40
SLIDE 40

Cassandra Origins

Dynamo BigTable

apache cassandra

slide-41
SLIDE 41

CAP Theorem

apache cassandra

Consistency Availability Partition Tolerance

slide-42
SLIDE 42

Apache Cassandra 4.0

What's new?

apache cassandra

slide-43
SLIDE 43

300+!

Cassandra 4.0 Changes

slide-44
SLIDE 44

Reliability & Stability 🐏

  • Checksummed Transport
  • Checksummed Storage
slide-45
SLIDE 45

Scalability

  • Zero Copy Data Streaming
  • New internode messaging
  • Transient Replication
slide-46
SLIDE 46

Throughput vs Cluster Size

Throughput (RPS) Cluster Size (# of nodes)

~1000 nodes

slide-47
SLIDE 47

Time to recover (4.0 vs 3.x)

Time to recover (minutes) 20 40 60 80 100 120 AWS Instance Type i3.2xl i3.4xl i3.8xl

trunk 3.x

Source: https://issues.apache.org/jira/browse/CASSANDRA-14765
slide-48
SLIDE 48

Time to recover (4.0 vs 3.x)

Source: https://issues.apache.org/jira/browse/CASSANDRA-14765
slide-49
SLIDE 49

Netty OpenSSL vs JDK SSL

Source: https://speakerdeck.com/normanmaurer/netty-one-framework-to-rule-them-all?slide=29
slide-50
SLIDE 50

Cassandra Networking (4.0 vs Pre 4.0)

  • Lower Latencies (40% lower avg 60% lower p99)
  • Memory Efficiency (~10x reduction)
  • Scalable internode encryption (~4x throughput)
  • Better throughput & response times (~2x vs 3.0)
slide-51
SLIDE 51

Contribute

  • https://cassandra.apache.org
  • dev@cassandra.apache.org, user@cassandra.apache.org
  • #cassandra-dev (irc.freenode.net)
slide-52
SLIDE 52

Questions?