NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a - - PowerPoint PPT Presentation

nosql and mongodb
SMART_READER_LITE
LIVE PREVIEW

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a - - PowerPoint PPT Presentation

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What is NoSQL? Not only SQL SQL means Relational model Strong typing ACID compliance Normalization NoSQL means more freedom or flexibility 4


slide-1
SLIDE 1

NoSQL and MongoDB

1

slide-2
SLIDE 2

2

slide-3
SLIDE 3

Introduction to NoSQL

3

Based on a presentation by Traversy Media

slide-4
SLIDE 4

What is NoSQL?

Not only SQL SQL means

Relational model Strong typing ACID compliance Normalization …

NoSQL means more freedom or flexibility

4

slide-5
SLIDE 5

Relevance to Big Data

Data gets bigger Traditional RDBMS cannot scale well RDBMS is tied to its data and query processing models NoSQL relaxes some of the restrictions of RDBMS to provide a better performance

5

slide-6
SLIDE 6

Advantages of NoSQL

Handles Big Data Data Models – No predefined schema Data Structure – NoSQL handles semi- structured data Cheaper to manage Scaling – Scale out / horizonal scaling

6

slide-7
SLIDE 7

Advantages of RDBMS

Better for relational data Data normalization Well-established query language (SQL) Data Integrity ACID Compliance

7

slide-8
SLIDE 8

Types of NoSQL Databases

Document Databases [MongoDB, CouchDB] Column Databases [Apache Cassandra] Key-Value Stores [Redis, Couchbase Server] Cache Systems [Redis, Memcached] Graph Databases [Neo4J] Streaming Systems [FlinkDB, Storm]

8

slide-9
SLIDE 9

Structured/Semi-structured

9

ID Name Email … 1 Jack jack@example.com 2 Jill jill@example.net 3 Alex alex@example.org Document 1 { “id”: 1, “name”:”Jack”, “email”: “jack@example.com”, “address”: {“street”: “900 university ave”, “city”: “Riverside”, state: “CA”}, “friend_ids”: [3, 55, 123]} Document 2 { “id”: 2, “name”: “Jill”, “email”: “jill@example.net”, “hobbies”: [“hiking”, “cooking”]}

slide-10
SLIDE 10

Columnar Data Store

10

ID 1 2 3 Name Jack Jill Alex Email … … … ID Name Email … 1 Jack jack@example.com 2 Jill jill@example.net 3 Alex alex@example.org

slide-11
SLIDE 11

Key-value Stores

11

1 à Jack jack@example.com … 2 à Jill jill@example.net … 3 à Alex alex@example.org … ID Name Email … 1 Jack jack@example.com 2 Jill jill@example.net 3 Alex alex@example.org

slide-12
SLIDE 12

Document Database

MongoDB

12

slide-13
SLIDE 13

Document Data Model

Relational model (RDBMS)

Database

Relation (Table) : Schema

Record (Tuple) : Data

Document Model

Database

Collection : No predefined schema

Document : Schema+data

No need to define/update schema No need to create collections

13

Document 1 { “id”: 1, “name”:”Jack”, “email”: “jack@example.com”, “address”: {“street”: “900 university ave”, “city”: “Riverside”, state: “CA”}, “friend_ids”: [3, 55, 123]}

slide-14
SLIDE 14

Document Format

MongoDB natively works with JSON documents For efficiency, documents are stored in a binary format called BSON (i.e., binary JSON) Like JSON, both schema and data are stored in each document

14

slide-15
SLIDE 15

How to Use MongoDB

15

db.users.insert({name: “Jack”, email: “jack@example.com”}); Install: Check the MongoDB website https://docs.mongodb.com/manual/installation/ db.users.find(); db.users.find({name: “Jack”}); db.users.update({name: "Jack"}, {$set: {hobby: "cooking"}}); updateOne, updateMany, replaceOne db.users.remove({name: "Alex"}); deleteOne, deleteMany

Create collection and insert a document Retrieve all/some documents Update Delete

https://docs.mongodb.com/manual/crud/

slide-16
SLIDE 16

Schema Validation

You can still explicitly create collections and enforce schema validation

16

db.createCollection("students", { validator: { $jsonSchema: { bsonType: "object", required: [ "name", "year", "major", "address" ], properties: { name: { bsonType: "string", description: "must be a string and is required" }, … } }} }

https://docs.mongodb.com/manual/core/schema-validation/

slide-17
SLIDE 17

Storage Layer

Prior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in later versions after it acquired WiredTiger

17

mongod --wiredTigerIndexConfigString "type=lsm,block_compressor=zlib"

Override default configuration

slide-18
SLIDE 18

LSM Vs B-tree

18

https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM

slide-19
SLIDE 19

Indexing

Like RDBMS, document databases use indexes to speed up some queries MongoDB uses B-tree as an index structure

19

https://docs.mongodb.com/manual/indexes/

slide-20
SLIDE 20

Index Types

Default unique _id index Single field index

db.collection.createIndex({name: -1});

Compound index (multiple fields)

db.collection.createIndex( { name: 1, score: -1});

Multikey indexes (for array fields)

Creates an index entry for each value

20

https://docs.mongodb.com/manual/indexes/

slide-21
SLIDE 21

Index Types

Geospatial index (for geospatial points)

Uses geohash to convert two dimensions to one dimension 2d indexes: For Euclidean spaces 2d sphere: spherical (earth) geometry Works with multikey indexes for multiple locations (e.g., pickup and dropoff locations for taxis)

Text Indexes (for string fields)

Automatically removes stop words Stems the works to store the root only

Hashed Indexes (for point lookups)

21

slide-22
SLIDE 22

Geohashes

22

slide-23
SLIDE 23

Additional Index Features

Unique indexes: Rejects duplicate keys Sparse Indexes: Skips documents without the index field

In contrast, non-sparse indexes assume a null value if the index field does not exist

Partial indexes: Indexes only a subset of records based on a filter.

23

db.restaurants.createIndex( { cuisine: 1, name: 1 }, { partialFilterExpression: { rating: { $gt: 5 } } } )

slide-24
SLIDE 24

Comparison of data types

Min key (internal type) Null Numbers (32-bit integer, 64-bit integer, double) Symbol, String Object Array Binary data Object ID Boolean Date, timestamp Regular expression Max key (internal type)

24

https://docs.mongodb.com/v3.6/reference/bson-type-comparison-order/

slide-25
SLIDE 25

Comparison of data types

Numbers: All converted to a common type Strings

Alphabetically (default) Collation (i.e., locale and language)

Arrays

<: Smallest value of the array >: Largest value of the array Empty arrays are treated as null

Object

Compare fields in the order of appearance Compare <name,value> for each field

25

slide-26
SLIDE 26

Distributed Processing

Two methods for distributed processing

Replication (Similar to MySQL) Sharding (True horizontal scaling)

26

Replication

https://docs.mongodb.com/manual/replication/

Sharding

https://docs.mongodb.com/manual/sharding/

slide-27
SLIDE 27

Distributed Index Structure

Log-structured Merge Tree (LSM)

27

slide-28
SLIDE 28

Big Data Indexing

Hadoop and Spark are good in scanning large files We would like to speed up point and range queries on big data for some queries HDFS limitation: Random updates are not allowed Log-structured Merge Tree (LSM-Tree) is adopted to address this problem.

28

slide-29
SLIDE 29

RDBMS Indexing

29

New record Index Log

slide-30
SLIDE 30

Index Update

30

New record

Randomly updated disk page(s) Append a disk page

slide-31
SLIDE 31

LSM Tree

Key idea: Use the log as the index Regularly: Merge the logs to consolidate the index (i.e., remove redundant entries)

31

New records Log Log Log Log Log Flush Merge Bigger log

O’Neil, Patrick, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33, no. 4 (1996): 351-385.

slide-32
SLIDE 32

LSM in Big Data

First major application: BigTable (Google)

32

20 40 60 80 100 120 1 9 9 7 1 9 9 8 1 9 9 9 2 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 1 2 1 1 2 1 2 2 1 3 2 1 4 2 1 5 2 1 6 2 1 7 2 1 8

Citations

Citations

First report from Google mentioning LSM BigTable paper

slide-33
SLIDE 33

LSM in Big Data

Buffer data in memory (memory component) Flush records to disk into an LSM as a disk component (sequential write) Disk components are sorted by key Compact (merge) disk components in the background (sequential read/write)

33

slide-34
SLIDE 34

Conclusion

MongoDB is a document database that is geared towards high update rates and transactional queries It adopts JSON as a data model It provides the flexibility to insert any kind of data without schema definition LSM Tree is used for indexing Weak types are handled using a special comparison method for all types

34