NoSQL performance in the real world David Mytton Woop Japan! - - PowerPoint PPT Presentation

Scaling reads • Many SSTables • Locate the right one(s) Can be many of these so reads use bloom filters to find the correct SSTable without having to load it from disk. Very e ffj cient in memory storage.

Scaling reads • Many SSTables • Locate the right one(s) • Fragmentation This causes fragmentation and lot of files. Although Cassandra does do compaction, it’s not immediate. 1 bloom filter per table. This works well and scales by simply adding nodes = less data per node

Scaling reads Image: www.acunu.com But for range queries it requires every SSTable be queried as bloom filters cannot be used. So performance is directly related to how many SSTables there are = reliant on compaction.

Bottlenecks • RAM http://www.flickr.com/photos/comedynose/4388430444/ www.flickr.com/photos/comedynose/4388430444/ RAM isn’t as directly correlated to performance as it is with MongoDB because bloom filters are memory e ffj cient and fit into RAM easily. This means there is no disk i/o until it’s needed. But as always the more RAM the better = avoids any disk i/o at all.

Bottlenecks • RAM • Compression • 2x-4x reduction in data size • 25-35% performance improvement on reads • 5-10% performance improvement on writes http://www.flickr.com/photos/comedynose/4388430444/ www.flickr.com/photos/comedynose/4388430444/ Compression in Cassandra 1.0 helps with reads and writes - reduces SSTable size so requires less memory. This works well on column families with many rows having the same columns.

Bottlenecks • RAM • Compression • Wide rows http://www.flickr.com/photos/comedynose/4388430444/ www.flickr.com/photos/comedynose/4388430444/ Using bloom filters, Cassandra is able to know which SSTables the row is located in and so reduce disk i/o. However for wide rows or rows written over time, it may be that the row exists across every SSTable. This can be mitigated by compaction but this requires multiple passes eventually degrading to random i/o which defeats the whole point of compacting - sequential i/o.

Bottlenecks • Node size No larger than a few 100GB, less with many small values Disk ops become very slow due to prev mention issue accessing every bloom filter / SSTable Locks when changing schemas - time taken related to data size.

Bottlenecks • Node size • Startup time Startup time proportional to data size which could see a restart taking hours as stu fg loaded into mem

Bottlenecks • Node size • Startup time • Heap All the bloom filters and indexes must fit into its heap, which you can't make larger than ~8GB, as then various GC issues start to kill performance (and introduce random, long pauses, up to 35 seconds!).

Failover • Replication Replication = core. Required.

Failover • SimpleStrategy Image: www.datastax.com Data is evenly distributed around all the nodes.

Failover • NetworkTopologyStrategy Image: www.datastax.com - Local reads - don’t need to go across data centres - Redundancy - allow for full failure - Data centre and rack aware

Failover • Replication • Consistency Queries define the level of consistency so writes go to a minimum number of nodes and reads also do the same. Where the same data exists on multiple nodes the most recent copy gets priority. Reads - can be direct = not necessarily consistent / read repair = consistent

Case Study

Case Study • Britain’s Got Talent • RDS m1.large = 300/s • 10k votes/s • 2 nodes Originally on RDS Peak load 10k/s and atomic Switched to 2 Cassandra nodes

Scaling www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek) 3 things

Scaling • Replication www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)

Scaling • Replication • Replication www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)

Scaling • Replication • Replication • Replication www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek) Each node is individual and on it’s own Configure replication on a node level Master / slave configuration up to you Can be master / master with 2 way replication

Scaling Picture is unrelated! Mmm, ice cream.

Scaling • HTTP Picture is unrelated! Mmm, ice cream. Access is over HTTP / REST so down to you to implement it. Overhead of HTTP vs wire protocol?

Scaling • HTTP • Load balancer Picture is unrelated! Mmm, ice cream. Can therefore use load balancing like a normal HTTP service

Bottlenecks www.flickr.com/photos/daddo83/3406962115/

Bottlenecks • Disk space www.flickr.com/photos/daddo83/3406962115/ Disk space quickly inflates. We found CouchDB using hundreds of GB which fit into just a few GB in MongoDB. Compaction doesn’t help much. Option to not store full document when building queries.

Bottlenecks • Disk space • No ad-hoc www.flickr.com/photos/daddo83/3406962115/ Have to know all your queries up-front. Very slow to build new queries because requires full m/r job.

Bottlenecks • Disk space • No ad-hoc • Append only www.flickr.com/photos/daddo83/3406962115/ Lots of updates can cause merge errors on replication. Namespace also inflates significantly. Compaction is extremely intensive.

Failover Master / master so up to you to decide which is the slave

Failover • Replication Master / master so up to you to decide which is the slave

Failover • Replication • Eventual consistency Unlike MongoDB / Cassandra, no built in consistency features

Failover • Replication • Eventual consistency • DNS Failover on a DNS level

DIY • Replication Replication works very well but it’s up to you to define roles

DIY • Replication • Failover There is no failover handling

DIY • Replication • Failover • Queries You can’t query anything without defining everything in advance

Case Study

Case Study • BBC • Eventual consistency • 8 nodes per DC • 8 nodes per DC • DNS failover Master / master pairing across DCs Eventual consistency handled by replication Use DNS level failover

Case Study • BBC • 500 GET/s • 24 PUT/s • Max 1k PUT/s/node Hardware benchmarked to 1k PUT/s maximum

NoSQL performance in the real world David Mytton Woop Japan! - - PowerPoint PPT Presentation

NoSQL performance in the real world David Mytton Woop Japan! - Examining each database in turn to look at 3 important factors for production - scaling reads and writes, where bottlenecks can occur and how to deal with redundancy and failover. -

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

Real graduates, Real graduates, real transitions, real transitions, real stories: real

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim,

Security and Performance Analysis of Encrypted NoSQL Databases M.W. Grim BSc., Abe Wiersma BSc.

PENTAIR 2020 RANDY HOGAN Chairman and CEO November 6, 2015 2015 Investor & Analyst Day

Microsemi Analyst Day September 8, 2016 Microsemi Analyst Day Agenda 9:00-9:05 Introduction

Daikin Presentation Understanding A2L Refrigerants for Air Conditioners Daikin U.S. Corporation

CockroachDB Architecture of a Geo-Distributed SQL Database Peter Mattis (@petermattis),

2018-2019 MAY 15, 2019 OLYMPIC ROOM, PASON CENTENNIAL ARENA 7:00 PM AGENDA Roll Call of

Max Group Investor Presentation February 2016 www.maxfinancialservices.com www.maxindia.com

the complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan

Keep Off the Grass Locking the Right Path for Atomicity Dave Cunningham Khilan Gudka Susan

NoSQL performance in the real world David Mytton Woop Japan! - - PowerPoint PPT Presentation

NoSQL performance in the real world David Mytton Woop Japan! - Examining each database in turn to look at 3 important factors for production - scaling reads and writes, where bottlenecks can occur and how to deal with redundancy and failover. -

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

Real graduates, Real graduates, real transitions, real transitions, real stories: real

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL Concepts, Techniques &amp; Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

NoSQL Concepts, Techniques &amp; Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim,

Security and Performance Analysis of Encrypted NoSQL Databases M.W. Grim BSc., Abe Wiersma BSc.

PENTAIR 2020 RANDY HOGAN Chairman and CEO November 6, 2015 2015 Investor &amp; Analyst Day

Microsemi Analyst Day September 8, 2016 Microsemi Analyst Day Agenda 9:00-9:05 Introduction

Daikin Presentation Understanding A2L Refrigerants for Air Conditioners Daikin U.S. Corporation

CockroachDB Architecture of a Geo-Distributed SQL Database Peter Mattis (@petermattis),

2018-2019 MAY 15, 2019 OLYMPIC ROOM, PASON CENTENNIAL ARENA 7:00 PM AGENDA Roll Call of

Max Group Investor Presentation February 2016 www.maxfinancialservices.com www.maxindia.com

the complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan

Keep Off the Grass Locking the Right Path for Atomicity Dave Cunningham Khilan Gudka Susan

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

PENTAIR 2020 RANDY HOGAN Chairman and CEO November 6, 2015 2015 Investor & Analyst Day