[PPT] - Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, PowerPoint Presentation

SLIDE 1

Data Modeling in the NoSQL World

By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim, Pratyusha Pogaru, Edgar Velazquez

SLIDE 2

Link to paper: https://hal.archives-ouvertes.fr/hal-01611628/document Parts & names

1. Introduction & NoSQL Data Models (1)- Adham
2. The NoAM data model (2)- Edgar, Pratyusha
3. System-independent design of NoSQL databases with NoAM (2)-

Ashutosh, Jordan

4. Related Works & Conclusion (1)- Kevin

2

SLIDE 3

Introduction

NoSQL systems are an effective way to manage large sets of data across multiple

servers

Interest: Supports next generation web technologies where relational DBMS does not
Data has a structure that does not fit with the typical RDBMS
Access to data based on read-write operations
Quality requirements include scalability, performance, and consistency
Main Categories of NoSQL Systems
Key-Value stores
Document stores
Extensible Record stores

3

SLIDE 4

Key-Value Stores

Example: Oracle NoSQL
Database is a schemaless collection of key-value pairs where operations can access

data from a single key-value pair or groups of related pairs

Keys are structured and contain both major and minor keys
Major key: non-empty sequence of strings
Minor key: sequence of strings
Component: each element of a key
‘/’ separates key components
‘-’ separates major key from minor key
Distinction between major and minor keys are important to control data distribution

and sharding

Value: uninterpreted binary string

4

SLIDE 5

5

Key-Value Stores

Two common representation of aggregates

1. Representation using a simple key-value pair

Major key is the aggregate identifier
Value is the complex value of the aggregate

2. Representation using multiple key-value pairs

Aggregate is split into different parts, which are represented by a distinct

key-value pair

Major key is aggregate identifier for each part
Minor key identifies individual part in aggregate

SLIDE 6

6

Document Stores

Example: MongoDB
Database is a set of documents, each having a complex structure and value
Each document is structured: contains complex value and a set of attribute-value

pairs, which can contain values, lists, and nested documents

Documents are schemaless, so it can have its own attributes that are defined at

runtime

Main document: top-level document with a unique identifier that is represented by the

“_id” attribute, which is associated to a value of type ObjectId

Aggregate is represented by a single document
Document ID is the aggregate identifier
Content is the complex value of the aggregate in JSON/BSON

SLIDE 7

7

Extensible Record Stores

Example: Amazon DynamoDB
Database is a set of tables, where each table is a set of rows, and each row contains a

set of columns

Rows in a table are not required to have the same attributes
Operations to access data are typically over individual rows
Each table designates an attribute as a primary key
Composed of partition key and an optional sort key
Aggregates can be represented by a record/row/item
The primary key (partition key) is the aggregate identifier
Item can have a distinct attribute-value pair for each attribute of the value of the

aggregate

SLIDE 8

The NoAM Data Model

NoAM stands for NoSQL Abstract Data Model
System independent data model for NoSQL databases
Intended to support scalability, performance, and

consistency

8

SLIDE 9

In most NoSQL databases, the distribution unit is often:

1. Group of related key-value pairs, in key value stores;
2. Document, in document stores;
3. Record/row/item, in extensible record stores.

In NoAM we introduce the distribution unit modeled as BLOCKS

9

SLIDE 10

Blocks

A block represents a maximal data unit for which atomic, efficient, and scalable access operations are provided. In NoSQL databases, it is easy to manipulate one block at a time, but problems arise when we try to manipulate multiple blocks such as when using JOINS

10

SLIDE 11

NoSQL databases can access (i) an individual key-value pair, in key-value stores; (ii) a field, in document stores; (iii) a column, in extensible record stores. In NoAM we will call these an ENTRY Collections will preserve their name as COLLECTIONS

11

SLIDE 12

We can now resume the NoAM characteristics

A database is a set of collections. Each collection has a

distinct name.

A collection is a set of blocks. Each block in a

collection is identified by a block key, which is unique within that collection.

A block is a non-empty set of entries. Each entry is a

pair <ek, ev>, where ek is the entry key (which is unique within its block) and ev is its value

12

SLIDE 13

13

Representation of aggregates In NoAM model

SLIDE 14

Another way to represent NoAM database

14

SLIDE 15

System-independent design of NoSQL databases with NoAM

The main goal of NoAM is to support a design

methodology for NoSQL databases that are independent of any specific system

By abstracting common features within NoSQL

systems (data access units & distribution units), we can design an intermediate, system-independent representation of data

This eases design process & helps support scalability

and consistency qualities of DB

15

SLIDE 16

System Design following the NoAM approach uses these steps:

16

identifying necessary entities and relationships & grouping related entities into aggregates conceptual data modeling & aggregate design partitioning aggregates into smaller data elements and then mapping to the NoAM intermediate data model aggregate partitioning & high-level NoSQL database design mapping the intermediate data representation to the specific features

f a target database

system implementation

SLIDE 17

Conceptual data modeling & aggregate design

Following domain-driven-design (as described in

running example of paper), 1st step is to design a UML class diagram defining the entities, value

bjects, and relationships of the application
Next, identify the grouping of entities and values into

aggregates based on data access patterns or scalability/consistency needs

Aggregates should be designed as units where

atomicity can be guaranteed

17

SLIDE 18

Properties of good aggregate design

Each aggregate should be large enough, but as small

as possible, to include all the data required by a relevant data access operation

small aggregates reduce concurrency collisions and support

performance and scalability requirements

Each aggregate should include all the data involved by

some integrity constraints or rules

This supports strong consistency/atomicity of update operations

18

SLIDE 19

Data representation in NoAM

In NoAM example:
class of aggregates is represented by a distinct

collection

Individual aggregates are represented by a block
This representation benefits from each concept

representing a unit of data access & distribution respectively at different abstraction levels

Thus, aggregates receive same operational benefits

(scalability, efficiency, atomicity) as blocks

19

SLIDE 20

In General...

A dataset of aggregates can be represented in NoAM

databases in many different ways

Other examples include:
Entry per Aggregate Object (EAO)- each individual

aggregate is represented using a single entry

Entry per Top-level Field (ETF)- each aggregate is

represented by multiple entries

20

SLIDE 21

EAO vs. ETF

21

EAO ETF

SLIDE 22

Aggregate partitioning

Aggregate partitioning is usually based on the

following guidelines

If an aggregate is small or all/most of its data are accessed or

modified together, it should be represented by a single entry

If an aggregate is large and there are operations that access or

modify specific portions of the aggregate, it should be partitioned into multiple entries

Data elements should belong to the same entry if they are usually

accessed or modified together

Data elements should belong to distinct entries if they are usually

accessed or modified separately

Access path, or sequence of steps to reach an element, affects how

data is accessed/modified in relation to one another

22

SLIDE 23

General implementation

Mapping from the intermediate

representation to specific systems will differ slightly with each type of NoSQL system (Key-Value, Document Extensible Record)

NoAM intermediate model for

each example is described in figure 8

23

SLIDE 24

Key-Value Store Implementation: Oracle NoSQL

In the Oracle NoSQL example,

each entry will be represented by a key-value pair

The key is composed of a major

key (collection name & block ID) and a minor key (coding of access path)

Major key controls

distribution of sharding

The Value can be a simple value
r a formatted entry (JSON)

24

SLIDE 25

Extensible Record Store Implementation: DynamoDB

In DynamoDB example, a distinct table will represent each collection with

individual items representing each block

Collection name will be table name, block key id will be primary key for

table, set of entries in block will be used for set of attribute pairs in item

25

SLIDE 26

Document Store Implementation: MongoDB

In MongoDB example, distinct

MongoDB collections will represent each collection of blocks & individual documents will represent each block

Block collection name will be

used for MongoDB collection name, block key id will represent special id field in a document & each entry in a block will fill a field in a document

26

SLIDE 27

Experiment on Performance of Different DB Design

The paper concludes their focus on NoAM designs by comparing

performance of two different NoAM designs (EAO vs Rounds) on our running example

EAO uses a single entry for a game
Rounds splits each game into a group of entries, one for each

round along with other relevant fields

27

EAO Rounds Vs.

SLIDE 28

Experiment Results

28

Experimenters tested 3

different workloads (retrieval of games, round additions & 50-50), measuring runtime milliseconds over DB size (GB)

Results showed that both DB’s

were superior in some regard & performed differently based on workload & size

SLIDE 29

Experiment Results Takeaways

The results in the previous slide emphasize the

importance of the design of NoSQL databases in its effects on performance & consistency of data access

perations
This methodology provides an effective tool for

choosing among different NoAM alternatives

29

SLIDE 30

Related Works

Recognize demand for data modeling approaches
Proposed solutions only cover:

○ Specific problems ○ Limited scenarios ○ Specific databases ○ Specific systems

Not from a general and system independent perspective

30

SLIDE 31

Related Works

Data aggregate is “application data grouped in atomic units

that are accessed and manipulated together.”

Similar notion of aggregates found in related works

○ In Domain Driven Design, related objects are treated as a unit for the purpose of data changes. ○ Entities are “units of distribution and consistency.” ○ In Bigtable, entity groups are manipulated atomically.

31

SLIDE 32

Related Works

Similar notion of determining data units in related works

○ Vertical partitioning & clustering ○ Relational storage of XML documents

32

SLIDE 33

Related Works

High-level representation of data makes it possible to use

different systems and technologies.

Save Our System (SOS) is a uniform programming interface

for NoSQL systems that allows for simple CRUD operations.

The issue of “tools for data access is complementary to data

models and design issues.”

33

SLIDE 34

Conclusion

The paper proposes:

○ Viewing data from the perspective of aggregates ○ Intermediate data model that is system independent ○ Implementation that considers specific features of specific NoSQL databases

34

SLIDE 35

Citation

Paolo Atzeni, Francesca Bugiotti, Luca Cabibbo, Riccardo
Torlone. Data Modeling in the NoSQL World. Computer

Standards and Interfaces, Elsevier, 2020, 67, pp.103149. ff10.1016/j.csi.2016.10.003ff. ffhal-01611628f

35