NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview - PowerPoint PPT Presentation

IKT437 Knowledge Engineering and Representation NoSQL Terje Gjøsæter, Ph.D. UiA, Grimstad – 16. November 2015

Overview • Introduction and Motivation • History of NoSQL • Categories of NoSQL • Examples of NoSQL systems • Encodings • Querying • Examples • Summary 2

Introduction • NoSQL has become increasingly popular and important lately. • NoSQL – No SQL, or Not Only SQL? • Many different variants, covering many different needs and use cases. • So what is NoSQL? Every data store that is not SQL-based RDBMS? • Q: Opinions? 3

Typical characteristics • Non-relational • Flexible schema • Less structured data • Supports big data • Other or additional query languages than SQL • Distributed – horizontal scaling • Eventual consistency – tradeoff due to CAP theorem • Q: Are you all familiar with the CAP theorem and consistency models? 4

CAP Theorem • It is impossible for a distributed system to provide all three of the following at the same time: • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a) • Partition tolerance (the system continues despite partitioning due to network failures) 5

Consistency Models (from distributed computing) • Eventual Consistency • A weak consistency model in a system with lack of simultaneous updates. • If no update takes very long time, all replicas eventually will become consistent. • Strict consistency • The strongest consistency model. • Requires that if a process reads any memory location, the value returned by the read operation is the value written by the most recent write operation to that location. 6

Motivation • Why NoSQL? • Less structured databases needed. • Not all data fit into relational table-based structure. • Social Media and Big Data are the big drivers for new database types. • Data tends to be less structured and too big for traditional RDBMS. • Let’s briefly introduce data storage needs of Social Media and Big Data. 7

Social Media and Web 2.0 – Example of Big Data • Google, Facebook, Twitter, Instagram, Amazon and Yahoo among others need to store and handle enormous amounts of data • These data tend to have different characteristics and requirements compared to «typical» structured database data. • Less strict structure in the data. • Need for a way to distribute of data across clusters that is easy to manage and use. • Different requirements for consistency (see CAP-theorem) • Example: sometimes we see a post on facebook disappearing and then showing up again later 8

Big Data – Early Definitions Large data sets, taxing the capacities of main memory, local disk, and even remote disk (1997) data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges datasets whose size is beyond the ability of typical McKinsey database software tools to capture, store, manage, and analyze 9

HANDLING AND STORAGE OF «TOO BIG» DATA Source: Georgia Tech Library (http://d7.library.gatech.edu) 10

Meanwhile: Big Data – Opportunity-Enablers Data Warehousing Central repository of integrated (and highly structured) data for reporting and analysis Business Intelligence Data Mining Analysing data to make Searching for interesting informed business trends and patterns in decisions. data 11

Defining Big Data as an Opportunity Mayer-Schönberger & Cukier 2013: “The ability of society to harness information in novel ways to produce useful insights…” and “…things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value .” License: CC0 Public Domain 12

Big Data can be…….. SECURITY AND PRIVACY 13 Source: Wikimedia common, Camelia Bobanlicensed under the Creative Commons Attribution-Share Alike 3.0 Unported

Big Data Aspects and Life Cycle Overview • Store all? • Structured or Include unstructured? external data • With meta- from different information? sources? • SQL or NoSQL? Selection, Structuring Harvesting, and Data Storage Integration Protection Analysis, and Usage Visualisation Policy • Security, privacy • Machine • Sharing policy learning; graphs, maps 14

NoSQL to the Rescue! • NoSQL is able to cover the needs of Social Media and Big Data • But different variants of NoSQL also support • Small data • Simple data • Awkwardly shaped data • Funny data • Odd data 15

Typical Features of NoSQL • Running well on clusters • Mostly open-source • Schema-less • Not having to convert your data to and from a relational data model but can use the data model of your software directly. 16

History of NoSQL • Q: When did people first start talking about NoSQL? 18

History of NoSQL • Q: When did people first start talking about NoSQL? • The term NoSQL was used by Carlo Strozzi in 1998 to name his lightweight, Strozzi NoSQL open- source relational database that did not expose the standard SQL interface, but was still relational. • Johan Oskarsson of Last.fm reintroduced the term NoSQL in early 2009 when he organised an event to discuss "open source distributed, non relational databases". • Most early NoSQL systems did not support ACID and Joins. This is changing lately… • Q: Are you all familiar with the ACID requirements for databases? 19

ACID • Atomicity means that database modifications must follow an all or nothing rule. Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails. • Consistency means that only valid data will be written to the database. • Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. • Durability ensures that any transaction committed to the database will not be lost. 20

Categories of NoSQL – Key-value-based • Key-value-based • Supports a dictionary or map of key-value pairs. • Value may be simple or (un/semi/-)structured blob of data. • Often used as basis for more complex data models. • Wide Column Store • A type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. 22

Categories of NoSQL – Document-oriented • Document-oriented • Supports storing, retrieving, and managing document-oriented information. • Documents encapsulate and encode data in some standard formats or encodings. • XML • subclass of document-oriented databases that are optimized to extract their metadata from XML documents. • Object store • Object includes data itself, variable amount of metadata, and globally unique identifier. • Storing photos on Facebook, songs on Spotify, or files in online collaboration service such as Dropbox. 23

Categories of NoSQL – Graph-based • Graph-based • uses graph structures for semantic queries with nodes, edges and properties to represent and store data. • Triplestore RDF • Variant of graph-based. • Stores triples: subject-predicate-object • Alice knows Bob; Bob has Cat; Cat catches Mouse; Alice fears Mouse • Adding a name to the triple makes a "quad store" or named graph. 24

Categories of NoSQL – Hybrids • Multi-model • Support multiple data models against a single, integrated backend. • May also contain relational elements • MultiValue • Differs from RDBMS in that it support and encourage the use of attributes which can take a list of values, rather than all attributes being single-valued. 25

Key-Value Databases • Key-value systems treat the data as a single opaque collection which may have different fields for every record. • This offers considerable flexibility and more closely follows modern concepts like object-oriented programming. • Because optional values are not represented by placeholders as in most RDBs, key-value stores often use far less memory to store the same database, which can lead to large performance gains in certain workloads. • Examples: • CouchDB, • Oracle NoSQL Database, • Dynamo, • MemcacheDB, • Redis 26

Column-oriented Databases • Wide Column-store • The names and format of the columns can vary from row to row in the same table. • A column has three elements: • Unique name: Used to reference the column. • Value: The content of the column. Simple type. • Timestamp: The system timestamp used to determine the valid content. • The timestamp is used to differentiate the valid content from stale ones. • Examples: • Accumulo, • Cassandra, • Druid, • HBase, • Vertica 27

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview - PowerPoint PPT Presentation

IKT437 Knowledge Engineering and Representation NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim,

Consistency of NoSQL Models Au Tran, Thy Nguyen, Chaz Chang, Vijaypal Singh, Timothy To, Akash

Searchable Security Scheme for Cloud NoSQL Mohammad Ahmadian ahmadian@knights.ucf.edu Advisor:

Security and Performance Analysis of Encrypted NoSQL Databases M.W. Grim BSc., Abe Wiersma BSc.

Wolkenschlsser Architekturen fr die Cloud Eberhard Wolff Architecture and Technology Manager,

CAP Theorem Technologies for Scalable Distribu8on CS4230

CAP for Networks Or: How to Stop Worrying and Embrace Failure= Aurojit Panda, Colin Scott, Ali

IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta

Dynamo Saurabh Agarwal What have we looked at so far ? Assumptions CAP Theorem SQL and

TransMR: Data Centric Programming Beyond Data Parallelism Naresh Rapolu Karthik Kambatla Prof.

CE419 Session 26: NoSQL Databases Web Programming The Relational Model Relational database

Weaviate - The Decentralized Knowledge Graph 1 FOSDEM 2019 Our Plan for the What do we get

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview - PowerPoint PPT Presentation

IKT437 Knowledge Engineering and Representation NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL Concepts, Techniques &amp; Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Concepts, Techniques &amp; Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim,

Consistency of NoSQL Models Au Tran, Thy Nguyen, Chaz Chang, Vijaypal Singh, Timothy To, Akash

Searchable Security Scheme for Cloud NoSQL Mohammad Ahmadian ahmadian@knights.ucf.edu Advisor:

Security and Performance Analysis of Encrypted NoSQL Databases M.W. Grim BSc., Abe Wiersma BSc.

Wolkenschlsser Architekturen fr die Cloud Eberhard Wolff Architecture and Technology Manager,

CAP Theorem Technologies for Scalable Distribu8on CS4230

CAP for Networks Or: How to Stop Worrying and Embrace Failure= Aurojit Panda, Colin Scott, Ali

IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta

Dynamo Saurabh Agarwal What have we looked at so far ? Assumptions CAP Theorem SQL and

TransMR: Data Centric Programming Beyond Data Parallelism Naresh Rapolu Karthik Kambatla Prof.

CE419 Session 26: NoSQL Databases Web Programming The Relational Model Relational database

Weaviate - The Decentralized Knowledge Graph 1 FOSDEM 2019 Our Plan for the What do we get

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University