Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, - - PowerPoint PPT Presentation

data modeling in the nosql world
SMART_READER_LITE
LIVE PREVIEW

Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, - - PowerPoint PPT Presentation

Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim, Pratyusha Pogaru, Edgar Velazquez Link to paper: https://hal.archives-ouvertes.fr/hal-01611628/document Parts & names 1. Introduction & NoSQL


slide-1
SLIDE 1

Data Modeling in the NoSQL World

By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim, Pratyusha Pogaru, Edgar Velazquez

slide-2
SLIDE 2

Link to paper: https://hal.archives-ouvertes.fr/hal-01611628/document Parts & names

  • 1. Introduction & NoSQL Data Models (1)- Adham
  • 2. The NoAM data model (2)- Edgar, Pratyusha
  • 3. System-independent design of NoSQL databases with NoAM (2)-

Ashutosh, Jordan

  • 4. Related Works & Conclusion (1)- Kevin

2

slide-3
SLIDE 3

Introduction

  • NoSQL systems are an effective way to manage large sets of data across multiple

servers

  • Interest: Supports next generation web technologies where relational DBMS does not
  • Data has a structure that does not fit with the typical RDBMS
  • Access to data based on read-write operations
  • Quality requirements include scalability, performance, and consistency
  • Main Categories of NoSQL Systems
  • Key-Value stores
  • Document stores
  • Extensible Record stores

3

slide-4
SLIDE 4

Key-Value Stores

  • Example: Oracle NoSQL
  • Database is a schemaless collection of key-value pairs where operations can access

data from a single key-value pair or groups of related pairs

  • Keys are structured and contain both major and minor keys
  • Major key: non-empty sequence of strings
  • Minor key: sequence of strings
  • Component: each element of a key
  • ‘/’ separates key components
  • ‘-’ separates major key from minor key
  • Distinction between major and minor keys are important to control data distribution

and sharding

  • Value: uninterpreted binary string

4

slide-5
SLIDE 5

5

Key-Value Stores

  • Two common representation of aggregates

1. Representation using a simple key-value pair

  • Major key is the aggregate identifier
  • Value is the complex value of the aggregate

2. Representation using multiple key-value pairs

  • Aggregate is split into different parts, which are represented by a distinct

key-value pair

  • Major key is aggregate identifier for each part
  • Minor key identifies individual part in aggregate
slide-6
SLIDE 6

6

Document Stores

  • Example: MongoDB
  • Database is a set of documents, each having a complex structure and value
  • Each document is structured: contains complex value and a set of attribute-value

pairs, which can contain values, lists, and nested documents

  • Documents are schemaless, so it can have its own attributes that are defined at

runtime

  • Main document: top-level document with a unique identifier that is represented by the

“_id” attribute, which is associated to a value of type ObjectId

  • Aggregate is represented by a single document
  • Document ID is the aggregate identifier
  • Content is the complex value of the aggregate in JSON/BSON
slide-7
SLIDE 7

7

Extensible Record Stores

  • Example: Amazon DynamoDB
  • Database is a set of tables, where each table is a set of rows, and each row contains a

set of columns

  • Rows in a table are not required to have the same attributes
  • Operations to access data are typically over individual rows
  • Each table designates an attribute as a primary key
  • Composed of partition key and an optional sort key
  • Aggregates can be represented by a record/row/item
  • The primary key (partition key) is the aggregate identifier
  • Item can have a distinct attribute-value pair for each attribute of the value of the

aggregate

slide-8
SLIDE 8

The NoAM Data Model

  • NoAM stands for NoSQL Abstract Data Model
  • System independent data model for NoSQL databases
  • Intended to support scalability, performance, and

consistency

8

slide-9
SLIDE 9

In most NoSQL databases, the distribution unit is often:

  • 1. Group of related key-value pairs, in key value stores;
  • 2. Document, in document stores;
  • 3. Record/row/item, in extensible record stores.

In NoAM we introduce the distribution unit modeled as BLOCKS

9

slide-10
SLIDE 10

Blocks

A block represents a maximal data unit for which atomic, efficient, and scalable access operations are provided. In NoSQL databases, it is easy to manipulate one block at a time, but problems arise when we try to manipulate multiple blocks such as when using JOINS

10

slide-11
SLIDE 11

NoSQL databases can access (i) an individual key-value pair, in key-value stores; (ii) a field, in document stores; (iii) a column, in extensible record stores. In NoAM we will call these an ENTRY Collections will preserve their name as COLLECTIONS

11

slide-12
SLIDE 12

We can now resume the NoAM characteristics

  • A database is a set of collections. Each collection has a

distinct name.

  • A collection is a set of blocks. Each block in a

collection is identified by a block key, which is unique within that collection.

  • A block is a non-empty set of entries. Each entry is a

pair <ek, ev>, where ek is the entry key (which is unique within its block) and ev is its value

12

slide-13
SLIDE 13

13

Representation of aggregates In NoAM model

slide-14
SLIDE 14

Another way to represent NoAM database

14

slide-15
SLIDE 15

System-independent design of NoSQL databases with NoAM

  • The main goal of NoAM is to support a design

methodology for NoSQL databases that are independent of any specific system

  • By abstracting common features within NoSQL

systems (data access units & distribution units), we can design an intermediate, system-independent representation of data

  • This eases design process & helps support scalability

and consistency qualities of DB

15

slide-16
SLIDE 16

System Design following the NoAM approach uses these steps:

16

identifying necessary entities and relationships & grouping related entities into aggregates conceptual data modeling & aggregate design partitioning aggregates into smaller data elements and then mapping to the NoAM intermediate data model aggregate partitioning & high-level NoSQL database design mapping the intermediate data representation to the specific features

  • f a target database

system implementation

slide-17
SLIDE 17

Conceptual data modeling & aggregate design

  • Following domain-driven-design (as described in

running example of paper), 1st step is to design a UML class diagram defining the entities, value

  • bjects, and relationships of the application
  • Next, identify the grouping of entities and values into

aggregates based on data access patterns or scalability/consistency needs

  • Aggregates should be designed as units where

atomicity can be guaranteed

17

slide-18
SLIDE 18

Properties of good aggregate design

  • Each aggregate should be large enough, but as small

as possible, to include all the data required by a relevant data access operation

  • small aggregates reduce concurrency collisions and support

performance and scalability requirements

  • Each aggregate should include all the data involved by

some integrity constraints or rules

  • This supports strong consistency/atomicity of update operations

18

slide-19
SLIDE 19

Data representation in NoAM

  • In NoAM example:
  • class of aggregates is represented by a distinct

collection

  • Individual aggregates are represented by a block
  • This representation benefits from each concept

representing a unit of data access & distribution respectively at different abstraction levels

  • Thus, aggregates receive same operational benefits

(scalability, efficiency, atomicity) as blocks

19

slide-20
SLIDE 20

In General...

  • A dataset of aggregates can be represented in NoAM

databases in many different ways

  • Other examples include:
  • Entry per Aggregate Object (EAO)- each individual

aggregate is represented using a single entry

  • Entry per Top-level Field (ETF)- each aggregate is

represented by multiple entries

20

slide-21
SLIDE 21

EAO vs. ETF

21

EAO ETF

slide-22
SLIDE 22

Aggregate partitioning

  • Aggregate partitioning is usually based on the

following guidelines

  • If an aggregate is small or all/most of its data are accessed or

modified together, it should be represented by a single entry

  • If an aggregate is large and there are operations that access or

modify specific portions of the aggregate, it should be partitioned into multiple entries

  • Data elements should belong to the same entry if they are usually

accessed or modified together

  • Data elements should belong to distinct entries if they are usually

accessed or modified separately

  • Access path, or sequence of steps to reach an element, affects how

data is accessed/modified in relation to one another

22

slide-23
SLIDE 23

General implementation

  • Mapping from the intermediate

representation to specific systems will differ slightly with each type of NoSQL system (Key-Value, Document Extensible Record)

  • NoAM intermediate model for

each example is described in figure 8

23

slide-24
SLIDE 24

Key-Value Store Implementation: Oracle NoSQL

  • In the Oracle NoSQL example,

each entry will be represented by a key-value pair

  • The key is composed of a major

key (collection name & block ID) and a minor key (coding of access path)

  • Major key controls

distribution of sharding

  • The Value can be a simple value
  • r a formatted entry (JSON)

24

slide-25
SLIDE 25

Extensible Record Store Implementation: DynamoDB

  • In DynamoDB example, a distinct table will represent each collection with

individual items representing each block

  • Collection name will be table name, block key id will be primary key for

table, set of entries in block will be used for set of attribute pairs in item

25

slide-26
SLIDE 26

Document Store Implementation: MongoDB

  • In MongoDB example, distinct

MongoDB collections will represent each collection of blocks & individual documents will represent each block

  • Block collection name will be

used for MongoDB collection name, block key id will represent special id field in a document & each entry in a block will fill a field in a document

26

slide-27
SLIDE 27

Experiment on Performance of Different DB Design

  • The paper concludes their focus on NoAM designs by comparing

performance of two different NoAM designs (EAO vs Rounds) on our running example

  • EAO uses a single entry for a game
  • Rounds splits each game into a group of entries, one for each

round along with other relevant fields

27

EAO Rounds Vs.

slide-28
SLIDE 28

Experiment Results

28

  • Experimenters tested 3

different workloads (retrieval of games, round additions & 50-50), measuring runtime milliseconds over DB size (GB)

  • Results showed that both DB’s

were superior in some regard & performed differently based on workload & size

slide-29
SLIDE 29

Experiment Results Takeaways

  • The results in the previous slide emphasize the

importance of the design of NoSQL databases in its effects on performance & consistency of data access

  • perations
  • This methodology provides an effective tool for

choosing among different NoAM alternatives

29

slide-30
SLIDE 30

Related Works

  • Recognize demand for data modeling approaches
  • Proposed solutions only cover:

○ Specific problems ○ Limited scenarios ○ Specific databases ○ Specific systems

  • Not from a general and system independent perspective

30

slide-31
SLIDE 31

Related Works

  • Data aggregate is “application data grouped in atomic units

that are accessed and manipulated together.”

  • Similar notion of aggregates found in related works

○ In Domain Driven Design, related objects are treated as a unit for the purpose of data changes. ○ Entities are “units of distribution and consistency.” ○ In Bigtable, entity groups are manipulated atomically.

31

slide-32
SLIDE 32

Related Works

  • Similar notion of determining data units in related works

○ Vertical partitioning & clustering ○ Relational storage of XML documents

32

slide-33
SLIDE 33

Related Works

  • High-level representation of data makes it possible to use

different systems and technologies.

  • Save Our System (SOS) is a uniform programming interface

for NoSQL systems that allows for simple CRUD operations.

  • The issue of “tools for data access is complementary to data

models and design issues.”

33

slide-34
SLIDE 34

Conclusion

  • The paper proposes:

○ Viewing data from the perspective of aggregates ○ Intermediate data model that is system independent ○ Implementation that considers specific features of specific NoSQL databases

34

slide-35
SLIDE 35

Citation

  • Paolo Atzeni, Francesca Bugiotti, Luca Cabibbo, Riccardo
  • Torlone. Data Modeling in the NoSQL World. Computer

Standards and Interfaces, Elsevier, 2020, 67, pp.103149. ff10.1016/j.csi.2016.10.003ff. ffhal-01611628f

35