NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. - PDF document

Università degli Studi di Roma “ Tor Vergata ” Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference Big Data stack High-level Interfaces Support / Integration Data Processing Data Storage Resource Management Valeria Cardellini - SABD 2017/18 1

Traditional RDBMSs • RDBMSs: the traditional technology for storing structured data in web and business applications • SQL is good – Rich language and toolset – Easy to use and integrate – Many vendors • RDBMSs promise ACID guarantees Valeria Cardellini - SABD 2017/18 2 ACID properties • A tomicity – All included statements in a transaction are either executed or the whole transaction is aborted without affecting the database (“all or nothing” principle) • C onsistency – A database is in a consistent state before and after a transaction • I solation – Transactions cannot see uncommitted changes in the database (i.e., the results of incomplete transactions are not visible to other transactions) • D urability – Changes are written to a disk before a database commits a transaction so that committed data cannot be lost through a power failure. Valeria Cardellini - SABD 2017/18 3

RDBMS constraints • Domain constraints – Restricts the domain of each attribute or the set of possible values for the attribute • Entity integrity constraint – No primary key value can be null • Referential integrity constraint – To maintain consistency among the tuples in two relations: every value of one attribute of a relation should exist as a value of another attribute in another relation • Foreign key – To cross-reference between multiple relations: it is a key in a relation that matches the primary key of another relation Valeria Cardellini - SABD 2017/18 4 Pros and cons of RDBMS Pros Cons • Well-defined consistency • Performance as major constraint, scaling is difficult model • Limited support for complex • ACID guarantees data structures • Relational integrity • Complete knowledge of DB maintained through entity structure required to create and referential integrity ad hoc queries constraints • Commercial DBMSs are • Well suited for OLTP apps expensive OLTP : OnLine Transaction • Some DBMSs have limits on Processing fields size • Sound theoretical foundation • Data integration from • Stable and standardized multiple RDBMSs can be cumbersome DBMSs available • Well understood Valeria Cardellini - SABD 2017/18 5

RDBMS challenges • Web-based applications caused spikes – Internet-scale data size – High read-write rates – Frequent schema changes • Let’s scale RDBMSs – RDBMS were not designed to be distributed • Possible solutions: – Replication – Sharding Valeria Cardellini - SABD 2017/18 6 Replication • Primary backup with master/worker architecture • Replication improves read scalability • Write operations? Valeria Cardellini - SABD 2017/18 7

Sharding • Horizontal partitioning of data across many separate servers • Scales read and write operations • Cannot execute transactions across shards (partitions) • Consistent hashing is one form of sharding - Hash both data and nodes using the same hash function in a same ID space Valeria Cardellini - SABD 2017/18 8 Scaling RDBMSs is expensive and inefficient Source: Couchbase technical report Valeria Cardellini - SABD 2017/18 9

NoSQL data stores • NoSQL = Not Only SQL – SQL-style querying is not the crucial objective • Main features of NoSQL data stores – Support flexible schema • No requirement for fixed rows in a table schema – Scale horizontally • Partitioning of data and processing over multiple nodes – Provide scalability and high availability by replicating data in multiple nodes, often across datacenters – Multiprocessor support – Mainly utilize shared-nothing architecture • With exception of graph-based database Valeria Cardellini - SABD 2017/18 10 NoSQL data stores (2) • Main features of NoSQL data stores (continued) – Avoid unneeded complexity • E.g., elimination of join operations – Useful when working with Big data when the data’s nature does not require a relational model – Support weaker concurrency models than the standard ACID transaction model • Rather BASE: compromising reliability for better performance 11 Valeria Cardellini - SABD 2017/18

ACID vs BASE • Two design philosophies at opposite ends of the consistency-availability spectrum - Keep in mind the CAP theorem ! Pick two of Consistency, Availability and Partition tolerance • ACID: the traditional approach to address the consistency issue in RDBMS – A pessimistic approach: prevent conflicts from occurring • Usually implemented with write locks managed by the system – But ACID does not scale well when handling petabytes of data (remember of latency!) Valeria Cardellini - SABD 2017/18 12 ACID vs BASE (2) • BASE stands for B asically A vailable, S oft state, E ventual consistency – An optimistic approach • Lets conflicts occur, but detects them and takes action to sort the out • Approaches: • conditional updates: test the value just before updating • save both updates: record that they are in conflict and then merge them – Basically Available: the system is available most of the time and there could exist a subsystem temporarily unavailable – Soft state: data is not durable in the sense that its persistence is in the hand of the user that must take care of refresh them – Eventually consistent: the system eventually converge to a consistent state • Usually adopted in NoSQL databases Valeria Cardellini - SABD 2017/18 13

Consistency • Biggest change from a centralized RDBMS to a cluster-oriented NoSQL • RDBMS: strong consistency – Traditional RDBMS are CA systems (or CP systems, depending on the configuration) • NoSQL systems: mostly eventual consistency – AP systems Valeria Cardellini - SABD 2017/18 14 Consistency: an example • Ann is trying to book a room of the Ace Hotel in New York on a node located in London of a booking system • Pathin is trying to do the same on a node located in Mumbai • The booking system uses a replicated database with the master located in Mumbai and the slave in London • There is only a room available • The network link between the two servers breaks Pathin Ann London Mumbay Valeria Cardellini - SABD 2017/18 15

Consistency: an example • CA system: neither user can book any hotel room – No tolerance to network partitions • CP system: – Pathin can make the reservation – Ann can see the inconsistent room information but cannot book the room • AP: both nodes accept the hotel reservation – Overbooking! • Remember that the tolerance to this situation depends on the application type – Blog, financial exchange, shopping chart, … Valeria Cardellini - SABD 2017/18 16 Pessimistic vs. optimistic approach • Concurrency involves a fundamental tradeoff between: - Safety (avoiding errors such as update conflicts) and - Liveness (responding quickly to clients) • Pessimistic approaches often: - Severely degrade the responsiveness of a system - Leads to deadlocks, which are hard to prevent and debug Valeria Cardellini - SABD 2017/18 17

NoSQL cost and performance Source: Couchbase technical report Valeria Cardellini - SABD 2017/18 18 Pros and cons of NoSQL Pros Cons • Easy to scale-out • Do not provide ACID guarantees, less suitable for • Higher performance for OLTP apps massive data scale • No fixed schema, no • Allows sharing of data common data storage model across multiple servers • Limited support for • Most solutions are either aggregation (sum, avg, open-source or cheaper count, group by) • HA and fault tolerance • Performance for complex join is poor Valeria Cardellini - SABD 2017/18 provided by data replication • No well defined approach for • Supports complex data DB design (different structures and objetcs solutions have different data • No fixed schema, supportrs models) unstructured data • Lack of consistent model • Very fast retrieval of data, can lead to solution lock-in suitable for real-time apps 19

Barriers to NoSQL • Main barriers to NoSQL adoption – No full ACID transaction support – Lack of standardized interfaces – Huge investments already made in existing RDBMSs • A commercial example – AWS launched two NoSQL services (SimpleDB in 2007 and later DynamoDB in 2012) and one RDBMS service (RDS in 2009) Valeria Cardellini - SABD 2017/18 20 NoSQL data models • A number of largely diverse data stores not based on the relational data model Valeria Cardellini - SABD 2017/18 21

NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. - PDF document

Universit degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference Big Data stack

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. 2019/20 Valeria Cardellini

Key-Value Stores Key-value stores are popular. web searching, social networks, e-commerce,

Redis and Memcached Speaker: Vladimir Zivkovic, Manager, IT June, 2019 Problem Scenario

Mondex / Alloy Last Updates Tahina Ramananandro cole Normale Suprieure Paris, France

Model-Based Testing of ETCS RBCs Aled Rhys Walters Swansea University An iCASE PhD in

Web Mining Mining content Simple rank is confused by rank sinks, e.g. two pages that

Building Your Own BaaS With Apache Usergrid & Docker : Lessons Learned At Scale Sungju

AWS Solutions Architect -- Associate Certification Review Brent Tuggle, Chris Kuehn, Phil Winans,

Introduction to NoSQL Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer

Get ready to be whats next. Jared Shockley http://jaredontech.com Senior Service Engineer

NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. - PDF document

Universit degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference Big Data stack

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

NoSQL Concepts, Techniques &amp; Systems Part 1 Valentina Ivanova IDA, Linkping University

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Concepts, Techniques &amp; Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. 2019/20 Valeria Cardellini

Key-Value Stores Key-value stores are popular. web searching, social networks, e-commerce,

Redis and Memcached Speaker: Vladimir Zivkovic, Manager, IT June, 2019 Problem Scenario

Mondex / Alloy Last Updates Tahina Ramananandro cole Normale Suprieure Paris, France

Model-Based Testing of ETCS RBCs Aled Rhys Walters Swansea University An iCASE PhD in

Web Mining Mining content Simple rank is confused by rank sinks, e.g. two pages that

Building Your Own BaaS With Apache Usergrid &amp; Docker : Lessons Learned At Scale Sungju

AWS Solutions Architect -- Associate Certification Review Brent Tuggle, Chris Kuehn, Phil Winans,

Introduction to NoSQL Instructor: Ekpe Okorafor 1. Big Data Academy - Accenture 2. Computer

Get ready to be whats next. Jared Shockley http://jaredontech.com Senior Service Engineer

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

Building Your Own BaaS With Apache Usergrid & Docker : Lessons Learned At Scale Sungju