NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. - PDF document

Università degli Studi di Roma “ Tor Vergata ” Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference Big Data stack High-level Interfaces Support / Integration Data Processing Data Storage Resource Management Valeria Cardellini - SABD 2016/17 1

Traditional RDBMSs • RDMBSs: the traditional technology for storing structured data in web and business applications • SQL is good – Rich language and toolset – Easy to use and integrate – Many vendors • They promise ACID guarantees Valeria Cardellini - SABD 2016/17 2 ACID properties • A tomicity – All included statements in a transaction are either executed or the whole transaction is aborted without affecting the database (“all or nothing” principle) • C onsistency – A database is in a consistent state before and after a transaction • I solation – Transactions cannot see uncommitted changes in the database (i.e., the results of incomplete transactions are not visible to other transactions) • D urability – Changes are written to a disk before a database commits a transaction so that committed data cannot be lost through a power failure. Valeria Cardellini - SABD 2016/17 3

RDBMS constraints • Domain constraints – Restricts the domain of each attribute or the set of possible values for the attribute • Entity integrity constraint – No primary key value can be null • Referential integrity constraint – To maintain consistency among the tuples in two relations: every value of one attribute of a relation should exist as a value of another attribute in another relation • Foreign key – To cross-reference between multiple relations: it is a key in a relation that matches the primary key of another relation Valeria Cardellini - SABD 2016/17 4 Pros and cons of RDBMS Pros Cons • Well-defined consistency • Performance as major constraint, scaling is difficult model • Limited support for complex • ACID guarantees data structures • Relational integrity • Complete knowledge of DB maintained through entity structure required to create and referential integrity ad hoc queries constraints • Commercial DBMSs are • Well suited for OLTP apps expensive – OLTP: online transaction • Some DBMSs have limits on processing fields size • Sound theoretical foundation • Data integration from • Stable and standardized multiple RDBMSs can be cumbersome DBMSs available • Well understood Valeria Cardellini - SABD 2016/17 5

RDBMS challenges • Web-based applications caused spikes – Internet-scale data size – High read-write rates – Frequent schema changes • Let’s scale RDBMSs – RDBMS were not designed to be distributed • Possible solutions: – Replication – Sharding Valeria Cardellini - SABD 2016/17 6 Replication • Master/slave architecture • Scales read operations • Write operations? Valeria Cardellini - SABD 2016/17 7

Sharding • Horizontal partitioning of data across many separate servers • Scales read and write operations • Cannot execute transactions across shards (partitions) • Consistent hashing is one form of sharding - Hash both data and nodes using the same hash function in a same ID space Valeria Cardellini - SABD 2016/17 8 Scaling RDBMSs is expensive and inefficient Source: Couchbase technical report Valeria Cardellini - SABD 2016/17 9

NoSQL data stores • NoSQL = Not Only SQL – SQL-style querying is not the crucial objective • Main features of NoSQL data stores – Avoid unneeded complexity – Support flexible schema – Scale horizontally – Provide scalability and high availability by storing and replicating data in distributed systems, often across datacenters – Useful when working with Big data when the data’s nature does not require a relational model • Traditional join operations cannot be used – Do not typically support ACID properties, but rather BASE • Compromising reliability for better performance 10 Valeria Cardellini - SABD 2016/17 ACID vs BASE • Two design philosophies at opposite ends of the consistency-availability spectrum - Keep in mind the CAP theorem ! Pick two of Consistency, Availability and Partition tolerance • ACID: the traditional approach to address the consistency issue in RDBMS – A pessimistic approach: prevent conflicts from occurring • Usually implemented with write locks managed by the system – But ACID does not scale well when handling petabytes of data (remember of latency!) Valeria Cardellini - SABD 2016/17 11

ACID vs BASE (2) • BASE stands for B asically A vailable, S oft state, E ventual consistency – An optimistic approach • Lets conflicts occur, but detects them and takes action to sort the out • Approaches: • conditional updates: test the value just before updating • save both updates: record that they are in conflict and then merge them – Basically Available: the system is available most of the time and there could exist a subsystem temporarily unavailable – Soft state: data is not durable in the sense that its persistence is in the hand of the user that must take care of refresh them – Eventually consistent: the system eventually converge to a consistent state • Usually adopted in NoSQL databases Valeria Cardellini - SABD 2016/17 12 Consistency • Biggest change from a centralized relational database to a cluster-oriented NoSQL • RDBMS: strong consistency – Traditional RDBMS are CA systems • NoSQL systems: mostly eventual consistency Valeria Cardellini - SABD 2016/17 13

Consistency: an example • Ann is trying to book a room of the Ace Hotel in New York on a node located in London of a booking system • Pathin is trying to do the same on a node located in Mumbai • The booking system uses a replicated database with the master located in Mumbai and the slave in London • There is only a room available • The network link between the two servers breaks Pathin Ann London Mumbay Valeria Cardellini - SABD 2016/17 14 Consistency: an example • CA system: neither user can book any hotel room – No tolerance to network partitions • CP system: – Pathin can make the reservation – Ann can see the inconsistent room information but cannot book the room • AP: both nodes accept the hotel reservation – Overbooking! • Remember that the tolerance to this situation depends on the application type – Blog, financial exchange, shopping chart, … Valeria Cardellini - SABD 2016/17 15

Pessimistic vs. optimistic approach • Concurrency involves a fundamental tradeoff between: - Safety (avoiding errors such as update conflicts) and - Liveness (responding quickly to clients) • Pessimistic approaches often: - Severely degrade the responsiveness of a system - Leads to deadlocks, which are hard to prevent and debug Valeria Cardellini - SDCC 2016/17 16 NoSQL cost and performance Source: Couchbase technical report Valeria Cardellini - SABD 2016/17 17

Pros and cons of NoSQL Pros Cons • Easy to scale-out • Do not provide ACID guarantees, less suitable for • Higher performance for OLTP apps massive data scale • No fixed schema, no • Allows sharing of data common data storage model across multiple servers • Limited support for • Most solutions are either aggregation (sum, avg, open-source or cheaper count, group by) • HA and fault tolerance • Performance for complex join is poor Valeria Cardellini - SABD 2016/17 provided by data replication • No well defined approach for • Supports complex data DB design (different structures and objetcs solutions have different data • No fixed schema, supportrs models) unstructured data • Lack of consistent model • Very fast retrieval of data, can lead to solution lock-in suitable for real-time apps 18 Barriers to NoSQL • Main barriers to NoSQL adoption – No full ACID transaction support – Lack of standardized interfaces – Huge investments already made in existing RDBMSs • A commercial example – AWS launched two NoSQL services (SimpleDB in 2007 and later DynamoDB in 2012) and one RDBMS service (RDS in 2009) Valeria Cardellini - SABD 2016/17 19

NoSQL data models • A number of largely diverse data stores not based on the relational data model Valeria Cardellini - SABD 2016/17 20 NoSQL data models • A data model is a set of constructs for representing the information – Relational model: tables, columns and rows • Storage model: how the DBMS stores and manipulates the data internally • A data model is usually independent of the storage model • Data models for NoSQL systems: – Aggregate-oriented models: key-value , document , and column-family – Graph-based models Valeria Cardellini - SABD 2016/17 21

NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. - PDF document

Universit degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference Big Data stack

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Security and Performance Analysis of Encrypted NoSQL Databases M.W. Grim BSc., Abe Wiersma BSc.

CS 61: Database Systems NoSQL/Mongo CRUD Adapted from mongodb.com unless otherwise noted Agenda

4th Generation 4th Generation Obj Object Databases t D t b (we are not alone 3 more nosql events

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

Student(sid, name, addr, age, GPA) sid name addr age GPA 301 John Ki#Bu!GK.$@q 19 2.1

The Relational Model Chapter 3 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Why

Database Integrity Constraints and Procedural SQL (Elmasri/Navathe ch. 5.2, 8.2 and 9.1, 9.6) (

CS 327E Class 4 September 30, 2019 1) What type of relationship do we have between the Actor and

CS 61: Database Systems Intermediate SQL Adapted from Silberschatz, Korth, and Sundarshan unless

PDG Database David W. Robertson Computational Research Division Lawrence Berkeley National

Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults Xiaotian Zou, Lei Chu Outline

Information Systems Relational Databases Nikolaj Popov Research Institute for Symbolic

NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. - PDF document

Universit degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference Big Data stack

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Security and Performance Analysis of Encrypted NoSQL Databases M.W. Grim BSc., Abe Wiersma BSc.

CS 61: Database Systems NoSQL/Mongo CRUD Adapted from mongodb.com unless otherwise noted Agenda

4th Generation 4th Generation Obj Object Databases t D t b (we are not alone 3 more nosql events

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

NoSQL Concepts, Techniques &amp; Systems Part 1 Valentina Ivanova IDA, Linkping University

Student(sid, name, addr, age, GPA) sid name addr age GPA 301 John Ki#Bu!GK.$@q 19 2.1

The Relational Model Chapter 3 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Why

Database Integrity Constraints and Procedural SQL (Elmasri/Navathe ch. 5.2, 8.2 and 9.1, 9.6) (

CS 327E Class 4 September 30, 2019 1) What type of relationship do we have between the Actor and

CS 61: Database Systems Intermediate SQL Adapted from Silberschatz, Korth, and Sundarshan unless

PDG Database David W. Robertson Computational Research Division Lawrence Berkeley National

Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults Xiaotian Zou, Lei Chu Outline

Information Systems Relational Databases Nikolaj Popov Research Institute for Symbolic

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University