IKT437 Knowledge Engineering and Representation
NoSQL
Terje Gjøsæter, Ph.D. UiA, Grimstad – 16. November 2015
NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview - - PowerPoint PPT Presentation
IKT437 Knowledge Engineering and Representation NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and Motivation History of NoSQL Categories of NoSQL Examples of NoSQL systems Encodings
IKT437 Knowledge Engineering and Representation
Terje Gjøsæter, Ph.D. UiA, Grimstad – 16. November 2015
2
3
Typical characteristics
4
CAP Theorem
5
Consistency Models (from distributed computing)
6
Motivation
7
store and handle enormous amounts of data
«typical» structured database data.
showing up again later
8
9
Large data sets, taxing the capacities of main memory, local disk, and even remote disk (1997) data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze McKinsey
10
Source: Georgia Tech Library (http://d7.library.gatech.edu)
11
Business Intelligence Analysing data to make informed business decisions. Data Warehousing Central repository of integrated (and highly structured) data for reporting and analysis Data Mining Searching for interesting trends and patterns in data
Mayer-Schönberger & Cukier 2013: “The ability of society to harness information in novel ways to produce useful insights…” and “…things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value.”
12
License: CC0 Public Domain
13
Source: Wikimedia common, Camelia Bobanlicensed under the Creative Commons Attribution-Share Alike 3.0 Unported
SECURITY AND PRIVACY
14
learning; graphs, maps
unstructured?
information?
Include external data from different sources?
Selection, Harvesting, Data Integration Structuring and Storage Analysis,
Visualisation
Protection and Usage Policy
15
data model of your software directly.
16
17
18
source relational database that did not expose the standard SQL interface, but was still relational.
event to discuss "open source distributed, non relational databases".
19
is said to be atomic. If one part of the transaction fails, the entire transaction fails.
execution.
20
21
data.
columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.
22
document-oriented information.
some standard formats or encodings.
are optimized to extract their metadata from XML documents.
as Dropbox.
23
store data.
24
list of values, rather than all attributes being single-valued.
25
every record.
programming.
in certain workloads.
26
27
the database engine uses for optimization.
extract their metadata from XML documents.
28
properties, and edges
29
How do we Choose a NoSQL Database for our Project?
preferences, shopping cart data. We would avoid using Key-value databases when we need to query by data, have relationships between the data being stored or we need to operate on multiple keys at the same time.
platforms, maintaining counters, expiring usage, heavy write volume such as log aggregation. We would avoid using column family databases for systems that are in early development, changing query patterns.
web analytics, real-time analytics, ecommerce-applications. We would avoid using document databases for systems that need complex transactions spanning multiple operations or queries against varying aggregate structures.
social networks, spatial data, routing information for goods and money, recommendation engines.
30
Data Model Performance Scalability Flexibility Complexity Functionality
Key–Value Store high high high none variable (none) Column- Oriented Store high high moderate low minimal Document- Oriented Store high variable (high) high low variable (low) Graph Database variable variable high high graph theory Relational Database variable variable low moderate relational algebra
31
Source: Ben Scofield http://www.slideshare.net/bscofield/nosql-codemash-2010
32
33
documents and also include user-defined JavaScript functions.
34
35
36
37
38
available if needed. In the case of a complete system failure on default settings, only a few seconds of data would be lost.
Haskell, Haxe, Io, Java, JavaScript (Node.js), Julia, Lua, Objective-C, Perl, PHP, Pure Data, Python, R, Racket, Ruby, Rust, Scala, Smalltalk and Tcl.
39
query language through a transactional HTTP endpoint.
40
41
42
additional features, such as the ability to define inference rules.
43
44
45
46
47
with all comments included.
48
49
at a given time.
50
51
52
53
54
55