nosql databases
play

NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. - PDF document

Universit degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference Big Data stack


  1. Università degli Studi di Roma “ Tor Vergata ” Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference Big Data stack High-level Interfaces Support / Integration Data Processing Data Storage Resource Management Valeria Cardellini - SABD 2016/17 1

  2. Traditional RDBMSs • RDMBSs: the traditional technology for storing structured data in web and business applications • SQL is good – Rich language and toolset – Easy to use and integrate – Many vendors • They promise ACID guarantees Valeria Cardellini - SABD 2016/17 2 ACID properties • A tomicity – All included statements in a transaction are either executed or the whole transaction is aborted without affecting the database (“all or nothing” principle) • C onsistency – A database is in a consistent state before and after a transaction • I solation – Transactions cannot see uncommitted changes in the database (i.e., the results of incomplete transactions are not visible to other transactions) • D urability – Changes are written to a disk before a database commits a transaction so that committed data cannot be lost through a power failure. Valeria Cardellini - SABD 2016/17 3

  3. RDBMS constraints • Domain constraints – Restricts the domain of each attribute or the set of possible values for the attribute • Entity integrity constraint – No primary key value can be null • Referential integrity constraint – To maintain consistency among the tuples in two relations: every value of one attribute of a relation should exist as a value of another attribute in another relation • Foreign key – To cross-reference between multiple relations: it is a key in a relation that matches the primary key of another relation Valeria Cardellini - SABD 2016/17 4 Pros and cons of RDBMS Pros Cons • Well-defined consistency • Performance as major constraint, scaling is difficult model • Limited support for complex • ACID guarantees data structures • Relational integrity • Complete knowledge of DB maintained through entity structure required to create and referential integrity ad hoc queries constraints • Commercial DBMSs are • Well suited for OLTP apps expensive – OLTP: online transaction • Some DBMSs have limits on processing fields size • Sound theoretical foundation • Data integration from • Stable and standardized multiple RDBMSs can be cumbersome DBMSs available • Well understood Valeria Cardellini - SABD 2016/17 5

  4. RDBMS challenges • Web-based applications caused spikes – Internet-scale data size – High read-write rates – Frequent schema changes • Let’s scale RDBMSs – RDBMS were not designed to be distributed • Possible solutions: – Replication – Sharding Valeria Cardellini - SABD 2016/17 6 Replication • Master/slave architecture • Scales read operations • Write operations? Valeria Cardellini - SABD 2016/17 7

  5. Sharding • Horizontal partitioning of data across many separate servers • Scales read and write operations • Cannot execute transactions across shards (partitions) • Consistent hashing is one form of sharding - Hash both data and nodes using the same hash function in a same ID space Valeria Cardellini - SABD 2016/17 8 Scaling RDBMSs is expensive and inefficient Source: Couchbase technical report Valeria Cardellini - SABD 2016/17 9

  6. NoSQL data stores • NoSQL = Not Only SQL – SQL-style querying is not the crucial objective • Main features of NoSQL data stores – Avoid unneeded complexity – Support flexible schema – Scale horizontally – Provide scalability and high availability by storing and replicating data in distributed systems, often across datacenters – Useful when working with Big data when the data’s nature does not require a relational model • Traditional join operations cannot be used – Do not typically support ACID properties, but rather BASE • Compromising reliability for better performance 10 Valeria Cardellini - SABD 2016/17 ACID vs BASE • Two design philosophies at opposite ends of the consistency-availability spectrum - Keep in mind the CAP theorem ! Pick two of Consistency, Availability and Partition tolerance • ACID: the traditional approach to address the consistency issue in RDBMS – A pessimistic approach: prevent conflicts from occurring • Usually implemented with write locks managed by the system – But ACID does not scale well when handling petabytes of data (remember of latency!) Valeria Cardellini - SABD 2016/17 11

  7. ACID vs BASE (2) • BASE stands for B asically A vailable, S oft state, E ventual consistency – An optimistic approach • Lets conflicts occur, but detects them and takes action to sort the out • Approaches: • conditional updates: test the value just before updating • save both updates: record that they are in conflict and then merge them – Basically Available: the system is available most of the time and there could exist a subsystem temporarily unavailable – Soft state: data is not durable in the sense that its persistence is in the hand of the user that must take care of refresh them – Eventually consistent: the system eventually converge to a consistent state • Usually adopted in NoSQL databases Valeria Cardellini - SABD 2016/17 12 Consistency • Biggest change from a centralized relational database to a cluster-oriented NoSQL • RDBMS: strong consistency – Traditional RDBMS are CA systems • NoSQL systems: mostly eventual consistency Valeria Cardellini - SABD 2016/17 13

  8. Consistency: an example • Ann is trying to book a room of the Ace Hotel in New York on a node located in London of a booking system • Pathin is trying to do the same on a node located in Mumbai • The booking system uses a replicated database with the master located in Mumbai and the slave in London • There is only a room available • The network link between the two servers breaks Pathin Ann London Mumbay Valeria Cardellini - SABD 2016/17 14 Consistency: an example • CA system: neither user can book any hotel room – No tolerance to network partitions • CP system: – Pathin can make the reservation – Ann can see the inconsistent room information but cannot book the room • AP: both nodes accept the hotel reservation – Overbooking! • Remember that the tolerance to this situation depends on the application type – Blog, financial exchange, shopping chart, … Valeria Cardellini - SABD 2016/17 15

  9. Pessimistic vs. optimistic approach • Concurrency involves a fundamental tradeoff between: - Safety (avoiding errors such as update conflicts) and - Liveness (responding quickly to clients) • Pessimistic approaches often: - Severely degrade the responsiveness of a system - Leads to deadlocks, which are hard to prevent and debug Valeria Cardellini - SDCC 2016/17 16 NoSQL cost and performance Source: Couchbase technical report Valeria Cardellini - SABD 2016/17 17

  10. Pros and cons of NoSQL Pros Cons • Easy to scale-out • Do not provide ACID guarantees, less suitable for • Higher performance for OLTP apps massive data scale • No fixed schema, no • Allows sharing of data common data storage model across multiple servers • Limited support for • Most solutions are either aggregation (sum, avg, open-source or cheaper count, group by) • HA and fault tolerance • Performance for complex join is poor Valeria Cardellini - SABD 2016/17 provided by data replication • No well defined approach for • Supports complex data DB design (different structures and objetcs solutions have different data • No fixed schema, supportrs models) unstructured data • Lack of consistent model • Very fast retrieval of data, can lead to solution lock-in suitable for real-time apps 18 Barriers to NoSQL • Main barriers to NoSQL adoption – No full ACID transaction support – Lack of standardized interfaces – Huge investments already made in existing RDBMSs • A commercial example – AWS launched two NoSQL services (SimpleDB in 2007 and later DynamoDB in 2012) and one RDBMS service (RDS in 2009) Valeria Cardellini - SABD 2016/17 19

  11. NoSQL data models • A number of largely diverse data stores not based on the relational data model Valeria Cardellini - SABD 2016/17 20 NoSQL data models • A data model is a set of constructs for representing the information – Relational model: tables, columns and rows • Storage model: how the DBMS stores and manipulates the data internally • A data model is usually independent of the storage model • Data models for NoSQL systems: – Aggregate-oriented models: key-value , document , and column-family – Graph-based models Valeria Cardellini - SABD 2016/17 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend