THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF - PowerPoint PPT Presentation

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE Genoveva.Vargas@imag.fr http://www.vargas-solar.com/bigdata-managment

STORING AND ACCESSING HUGE AMOUNTS OF DATA Yota 10 24 Cloud Zetta 10 21 • Data formats • Data storage supports • Data collection sizes • Data delivery mechanisms Exa 10 18 RAID Peta 10 15 Disk 2

DEALING WITH HUGE AMOUNTS OF DATA Relational Graph Yota 10 24 Key value Columns Cloud Zetta 10 21 Exa 10 18 RAID Concurrency Peta 10 15 Consistency Disk Atomicity 3

NOSQL STORES CHARACTERISTICS Simple operations ¡ Key lookups reads and writes of one record or a small number of ¡ records No complex queries or joins ¡ Ability to dynamically add new attributes to data records ¡ Horizontal scalability ¡ Distribute data and operations over many servers ¡ Replicate and distribute data over many servers ¡ No shared memory or disk ¡ High performance ¡ Efficient use of distributed indexes and RAM for data storage ¡ Weak consistency model ¡ Limited transactions ¡ Next generation databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable [http://nosql-database.org] 4

• • Data model Availability • • Consistency Query support • Storage • Durability Data stores designed to scale simple OLTP-style application loads Read/Write operations by thousands/millions of users 5

DATA MODELS ¡ Tuple Row in a relational table, where attributes are pre-defined in a schema, and the values are scalar ¡ ¡ Document Allows values to be nested documents or lists, as well as scalar values. ¡ Attributes are not defined in a global schema ¡ ¡ Extensible record Hybrid between tuple and document, where families of attributes are defined in a schema, but new attributes can be added ¡ on a per-record basis 6

DATA STORES Key-value ¡ Systems that store values and an index to find them, based on a key ¡ Document ¡ Systems that store documents, providing index and simple query mechanisms ¡ Extensible record ¡ Systems that store extensible records that can be partitioned vertically and horizontally across nodes ¡ Graph ¡ Systems that store model data as graphs where nodes can represent content modelled as document or key-value structures and arcs ¡ represent a relation between the data modelled by the node Relational ¡ Systems that store, index and query tuples ¡ 7

KEY-VALUE STORES ¡ “Simplest data stores” use a data model similar to S YSTEM A DDRESS the memcached distributed in-memory cache Redis code.google.com/p/redis ¡ Single key-value index for all data Scalaris code.google.com/p/scalaris ¡ Provide a persistence mechanism Tokyo tokyocabinet.sourceforge.net Voldemor project-voldemort.com ¡ Replication, versioning, locking, transactions, sorting t ¡ API: inserts, deletes, index lookups Riak riak.basho.com Membrain schoonerinfotech.com/products ¡ No secondary indices or keys Membase membase.com 8

SELECT name, pic, profile_url SELECT message, attachment FROM user FROM stream WHERE uid = me() WHERE source_id = me() AND type = 80 SELECT name FROM friendlist WHERE owner = me() SELECT name, pic FROM user SELECT name WHERE online_presence = "active" FROM group AND WHERE gid IN ( SELECT gid uid IN ( SELECT uid2 FROM group_member FROM friend WHERE uid = me() ) WHERE uid1 = me() ) https://developers.facebook.com/docs/reference/fql/ 9

<805114856, > 10

DOCUMENT STORES Support more complex data: pointerless objects, i.e., ¡ documents S YSTEM A DDRESS Secondary indexes, multiple types of documents ¡ (objects) per database, nested documents and lists, e.g. SimpleDB amazon.com/simpledb B-trees Couch DB couchdb.apache.org Automatic sharding (scale writes), no explicit locks, ¡ Mongo mongodb.org weaker concurrency (eventual for scaling reads) and DB atomicity properties Terrastor code.google.com/terrastore e API: select, delete, getAttributes, ¡ putAttributes on documents Queries can be distributed in parallel over multiple ¡ nodes using a map-reduce mechanism 11

DOCUMENT STORES 12

EXTENSIBLE RECORD STORES Basic data model is rows and columns ¡ Basic scalability model is splitting rows and columns over ¡ multiple nodes S YSTEM A DDRESS Rows split across nodes through sharding on the primary key ¡ HBase hbase.apache.com Split by range rather than hash function ¡ Rows analogous to documents: variable number of attributes, ¡ HyperTable hypertable.org attribute names must be unique Cassandra Grouped into collections (tables) incubator.apache.org/cassandra ¡ Queries on ranges of values do not go to every node ¡ Columns are distributed over multiple nodes using “column ¡ groups” Which columns are best stored together ¡ Column groups must be pre-defined with the extensible record ¡ stores 13

SCALABLE RELATIONAL SYSTEMS SQL: rich declarative query language ¡ Databases reinforce referential integrity ¡ S YSTEM A DDRESS ACID semantics ¡ Well understood operations: ¡ MySQL C mysql.com/cluster Configuration, Care and feeding, Backups, Tuning, Failure and recovery, ¡ Performance characteristics Volt DB voltdb.com Use small-scope operations ¡ Clustrix clustrix.com Challenge: joins that do not scale with sharding ¡ Use small-scope transactions ScaleDB ¡ scaledb.com ACID transactions inefficient with communication and 2PC overhead ¡ Scale Base scalebase.com Shared nothing architecture for scalability ¡ Nimbus DB nimbusdb.com Avoid cross-node operations ¡ 14

NOSQL DESIGN AND CONSTRUCTION PROCESS Memcached I NDEX Database Database Database Stored population querying organization Replicated Data reside in RAM (memcached) and is eventually replicated and stored ¡ Querying = designing a database according to the type of queries / map reduce model ¡ “On demand” data management: the database is virtually organized per view (external schema) on cache and some ¡ view are made persistent 15 An elastic easy to evolve and explicitly configurable architecture ¡

Use the right tool for the right job… How do I know which is the right tool for the right job? (Katsov-2012) 16

Genoveva.Vargas@imag.fr http://www.vargas-solar.com/bigdata-management

REFERENCES ¡ Eric A., Brewer "Towards robust distributed systems." PODC. 2000 ¡ Rick, Cattell "Scalable SQL and NoSQL data stores." ACM SIGMOD Record 39.4 (2011): 12-27 ¡ Juan Castrejon, Genoveva Vargas-Solar, Christine Collet, and Rafael Lozano, ExSchema: Discovering and Maintaining Schemas from Polyglot Persistence Applications, In Proceedings of the International Conference on Software Maintenance, Demo Paper, IEEE, 2013 ¡ M. Fowler and P. Sadalage. NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Pearson Education, Limited, 2012 ¡ C. Richardson, Developing polyglot persistence applications, http://fr.slideshare.net/chris.e.richardson/developing-polyglotpersistenceapplications- gluecon2013 18

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF - PowerPoint PPT Presentation

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE Genoveva.Vargas@imag.fr http://www.vargas-solar.com/bigdata-managment STORING AND ACCESSING HUGE AMOUNTS OF DATA Yota 10 24 Cloud Zetta 10 21

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Equipe MARS - Mouvement et Action pour le Rtablissement Sanitaire Social et Citoyen Who are our

The Terre-en-vue Mouvement a case study P aleis der Academin_20120309 Maarten Roels Dept.

Prepared by: Abidine MERZOUGH IRA Section Europe I nitiative pour la R surgence du mouvement A

Lecture 7: Indexes and Database Tuning Wednesday, November 10, 2010 Dan Suciu -- CSEP544 Fall 1

Data and Process Modelling 4. Relational Mapping Marco Montali KRDB Research Centre for

Personell Kjell Orsborn, lecturer, examiner email: kjell.orsborn@it.uu.se, phone: 471

15-415/615 - DB Applications Lecture #18: Physical Database Design (R&G ch. 20) Faloutsos

Distributed Databases: Design and Query Execution Data Fragmentation and Placement

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A.

Why Sort? Used for eliminating duplicates Select DISTINCT External Sorting Bulk

on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of

Sambuz

Useful Links

Newsletter

Mail Us

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF - PowerPoint PPT Presentation

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE Genoveva.Vargas@imag.fr http://www.vargas-solar.com/bigdata-managment STORING AND ACCESSING HUGE AMOUNTS OF DATA Yota 10 24 Cloud Zetta 10 21

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL Concepts, Techniques &amp; Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Concepts, Techniques &amp; Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Equipe MARS - Mouvement et Action pour le Rtablissement Sanitaire Social et Citoyen Who are our

The Terre-en-vue Mouvement a case study P aleis der Academin_20120309 Maarten Roels Dept.

Prepared by: Abidine MERZOUGH IRA Section Europe I nitiative pour la R surgence du mouvement A

Lecture 7: Indexes and Database Tuning Wednesday, November 10, 2010 Dan Suciu -- CSEP544 Fall 1

Data and Process Modelling 4. Relational Mapping Marco Montali KRDB Research Centre for

Personell Kjell Orsborn, lecturer, examiner email: kjell.orsborn@it.uu.se, phone: 471

15-415/615 - DB Applications Lecture #18: Physical Database Design (R&amp;G ch. 20) Faloutsos

Distributed Databases: Design and Query Execution Data Fragmentation and Placement

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A.

Why Sort? Used for eliminating duplicates Select DISTINCT External Sorting Bulk

on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of

Sambuz

Useful Links

Newsletter

Mail Us

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

15-415/615 - DB Applications Lecture #18: Physical Database Design (R&G ch. 20) Faloutsos