NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a - PowerPoint PPT Presentation

NoSQL and MongoDB 1

Introduction to NoSQL Based on a presentation by Traversy Media 3

What is NoSQL? Not only SQL SQL means Relational model Strong typing ACID compliance Normalization … NoSQL means more freedom or flexibility 4

Relevance to Big Data Data gets bigger Traditional RDBMS cannot scale well RDBMS is tied to its data and query processing models NoSQL relaxes some of the restrictions of RDBMS to provide a better performance 5

Advantages of NoSQL Handles Big Data Data Models – No predefined schema Data Structure – NoSQL handles semi- structured data Cheaper to manage Scaling – Scale out / horizonal scaling 6

Advantages of RDBMS Better for relational data Data normalization Well-established query language (SQL) Data Integrity ACID Compliance 7

Types of NoSQL Databases Document Databases [MongoDB, CouchDB] Column Databases [Apache Cassandra] Key-Value Stores [Redis, Couchbase Server] Cache Systems [Redis, Memcached] Graph Databases [Neo4J] Streaming Systems [FlinkDB, Storm] 8

Structured/Semi-structured ID Name Email … 1 Jack jack@example.com 2 Jill jill@example.net 3 Alex alex@example.org Document 1 { “id”: 1, “name”:”Jack”, “email”: Document 2 “jack@example.com”, “address”: {“street”: “900 university ave”, “city”: “Riverside”, state: { “id”: 2, “name”: “Jill”, “email”: “CA”}, “friend_ids”: [3, 55, 123]} “jill@example.net”, “hobbies”: [“hiking”, “cooking”]} 9

Columnar Data Store ID Name Email … 1 Jack jack@example.com 2 Jill jill@example.net 3 Alex alex@example.org Email ID Name … 1 Jack … 2 Jill … 3 Alex 10

Key-value Stores ID Name Email … 1 Jack jack@example.com 2 Jill jill@example.net 3 Alex alex@example.org 1 à Jack jack@example.com … à 2 Jill jill@example.net … à 3 Alex alex@example.org … 11

Document Database MongoDB 12

Document Data Model Relational model (RDBMS) Database Relation (Table) : Schema Document 1 Record (Tuple) : Data { “id”: 1, “name”:”Jack”, “email”: “jack@example.com”, “address”: {“street”: Document Model “900 university ave”, “city”: “Riverside”, state: “CA”}, “friend_ids”: [3, 55, 123]} Database Collection : No predefined schema Document : Schema+data No need to define/update schema No need to create collections 13

Document Format MongoDB natively works with JSON documents For efficiency, documents are stored in a binary format called BSON (i.e., binary JSON) Like JSON, both schema and data are stored in each document 14

How to Use MongoDB Install: Check the MongoDB website https://docs.mongodb.com/manual/installation/ Create collection and insert a document db.users.insert({name: “Jack”, email: “jack@example.com”}); Retrieve all/some documents db.users.find(); db.users.find({name: “Jack”}); Update db.users.update({name: "Jack"}, {$set: {hobby: "cooking"}}); updateOne, updateMany, replaceOne Delete db.users.remove({name: "Alex"}); deleteOne, deleteMany 15 https://docs.mongodb.com/manual/crud/

Schema Validation You can still explicitly create collections and enforce schema validation db.createCollection("students", { validator: { $jsonSchema: { bsonType: "object", required: [ "name", "year", "major", "address" ], properties: { name: { bsonType: "string", description: "must be a string and is required" }, … } }} } 16 https://docs.mongodb.com/manual/core/schema-validation/

Storage Layer Prior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in later versions after it acquired WiredTiger Override default configuration mongod --wiredTigerIndexConfigString "type=lsm,block_compressor=zlib" 17

LSM Vs B-tree https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM 18

Indexing Like RDBMS, document databases use indexes to speed up some queries MongoDB uses B-tree as an index structure 19 https://docs.mongodb.com/manual/indexes/

Index Types Default unique _id index Single field index db.collection.createIndex({name: -1}); Compound index (multiple fields) db.collection.createIndex( { name: 1, score: -1}); Multikey indexes (for array fields) Creates an index entry for each value 20 https://docs.mongodb.com/manual/indexes/

Index Types Geospatial index (for geospatial points) Uses geohash to convert two dimensions to one dimension 2d indexes: For Euclidean spaces 2d sphere: spherical (earth) geometry Works with multikey indexes for multiple locations (e.g., pickup and dropoff locations for taxis) Text Indexes (for string fields) Automatically removes stop words Stems the works to store the root only Hashed Indexes (for point lookups) 21

Geohashes 22

Additional Index Features Unique indexes: Rejects duplicate keys Sparse Indexes: Skips documents without the index field In contrast, non-sparse indexes assume a null value if the index field does not exist Partial indexes: Indexes only a subset of records based on a filter. db.restaurants.createIndex( { cuisine: 1, name: 1 }, { partialFilterExpression: { rating: { $gt: 5 } } } ) 23

Comparison of data types Min key (internal type) Null Numbers (32-bit integer, 64-bit integer, double) Symbol, String Object Array Binary data Object ID Boolean Date, timestamp Regular expression Max key (internal type) 24 https://docs.mongodb.com/v3.6/reference/bson-type-comparison-order/

Comparison of data types Numbers: All converted to a common type Strings Alphabetically (default) Collation (i.e., locale and language) Arrays <: Smallest value of the array >: Largest value of the array Empty arrays are treated as null Object Compare fields in the order of appearance Compare <name,value> for each field 25

Distributed Processing Two methods for distributed processing Replication (Similar to MySQL) Sharding (True horizontal scaling) Replication Sharding https://docs.mongodb.com/manual/replication/ https://docs.mongodb.com/manual/sharding/ 26

Distributed Index Structure Log-structured Merge Tree (LSM) 27

Big Data Indexing Hadoop and Spark are good in scanning large files We would like to speed up point and range queries on big data for some queries HDFS limitation: Random updates are not allowed Log-structured Merge Tree (LSM-Tree) is adopted to address this problem. 28

RDBMS Indexing New record Index Log 29

Index Update Randomly updated disk page(s) New record Append a disk page 30

LSM Tree Key idea: Use the log as the index Regularly: Merge the logs to consolidate the index (i.e., remove redundant entries) Flush Merge New Log records Log Bigger log Log Log Log 31 O’Neil, Patrick, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33, no. 4 (1996): 351-385.

LSM in Big Data First major application: BigTable (Google) Citations 120 100 80 BigTable paper 60 40 20 0 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 9 9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Citations First report from Google mentioning LSM 32

LSM in Big Data Buffer data in memory (memory component) Flush records to disk into an LSM as a disk component (sequential write) Disk components are sorted by key Compact (merge) disk components in the background (sequential read/write) 33

Conclusion MongoDB is a document database that is geared towards high update rates and transactional queries It adopts JSON as a data model It provides the flexibility to insert any kind of data without schema definition LSM Tree is used for indexing Weak types are handled using a special comparison method for all types 34

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a - PowerPoint PPT Presentation

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What is NoSQL? Not only SQL SQL means Relational model Strong typing ACID compliance Normalization NoSQL means more freedom or flexibility 4

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

MongoDB and Mysql: Which one is a better fit for me? Room 204 - 2:20PM-3:10PM About us

NoSQL : Unleash the Power of MongoDB Abhishek Bagga 24 th September 2019 1 Abhishek Bagga

Mini Course on Epistemic Game Theory Toulouse, June 30 - July 3, 2015 Exercises Part I: Common

Orange Empire Signal Garden Lessons Learned OR Remove spider * before servicing main board. *

HW Breakout - AUGUST 2017 Feedback Francois Kapp www.ska.ac.za H/W Breakout - Agenda

Jack Fried Cold Electronics Review October 13, 2016 10/13/2016 Cold Electronics Review 1

2013 Full Year Result Terry Davis Group Managing Director John Murphy Managing Director Australian

Bsides Vienna 2016 Paul Coggin @PaulCoggin 1 1 OSI and TCP/IP Model OSI Model TCP/IP Model

Automa utomation tion of of Mit MitM M Attac Attack k on on WiFi iFi Netw Networ orks

RMLL 2009 Network virtualisation using Netkit and Dynamips Cedric Foll 07.08.09 Cedric Foll

Sambuz

Useful Links

Newsletter

Mail Us