CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com - PowerPoint PPT Presentation

CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com unless otherwise noted

Agenda 1. Data relationships 2. Accessing embedded documents 2

The big schema design question is whether to embed documents or normalize Embedded vs normalized Embedded Normalized Students Grades for each class taken Each document in a collection has Each entity stands alone embedded documents Query second collection for details on primary collection 3

Embedded data model moves all fields into one document > db.Students.find().pretty() { Also known as a denormalized data model "_id" : ObjectId(”ABC"), "name" : "Alice", Allows applications to store related pieces "year" : 20, of information in the same database record "GPA" : 3.5, "grades" : [ Improves read performance { "class" : "CS1", Result is fewer database queries and "grade" : "A" updates (no joins needed) }, { "class" : "CS10", Writes to documents are atomic "grade" : "A-" } ] } Use when: • Have a “contains” or “has” relationship between entities • 1:M relationship when M ≲ 1000 and when many side will always appear with one side (not stand alone) 4 • Document must be < 16 MB in size

Normalized data model references other documents, like a relational database Normalized data model Grades collection > db.Grades.find().pretty() Like normalized { Students collection tables in RDBMS "_id" : ObjectId(”123"), > "student_id" : ObjectId(”ABC"), db.Students1.find().pretty() "class" : "CS1", { "grade" : "A" "_id" : ObjectId(”ABC"), } "name" : "Alice", "year" : 20 { } "_id" : ObjectId(”124"), "student_id" : ObjectId(”ABC"), Referential integrity "class" : "CS10", is not enforced "grade" : "A-" } Use when: • Embedding would result in duplication of data, but would not Writes are not improve read performance enough to outweigh duplication atomic across collections, but • To represent complex M:N relationships MongoDB has • To model large hierarchical datasets transactions 5 Source: https://docs.mongodb.com/manual/core/data-model-design/

1:1 relationships often suggest using embedded documents 1:1 relationships Normalized Embedded // patron collection { { _id: "joe", _id: "joe", name: "Joe Bookreader", name: "Joe Bookreader" address: { } street: "123 Fake Street", city: "Faketon", // address collection state: "MA", { zip: "12345" patron_id: "joe", //patron } street: "123 Fake Street", } city: "Faketon", Embed address into patron document • state: "MA", Now one database read gets both • zip: "12345" patron and address info vs. two reads } for normalized approach 6 Embedding is the preferred approach • Source: https://docs.mongodb.com/manual/core/data-model-design/

1:1 relationship counter-example is the subset problem, use normalized approach 1:1 relationship subset problem If you normally only { "_id": 1, need summary data "title": "The Arrival of a Train", about a movie, then "year": 1896, having plot and "plot": "A train is seen pulling into a station” fullplot means more "fullplot": "A group of people are standing in a straight line along… disk block reads "type": "movie", "directors": [ "Auguste Lumière", "Louis Lumière" ], Create separate "imdb": { collection for movie "rating": 7.3, "votes": 5043, "id": 12 details }, Easily store multiple "countries": [ "France" ], values in array "genres": [ "Documentary", "Short" ], Leave summary fields "tomatoes": { Would require in main collection "viewer": { multiple tables and "rating": 3.7, "numReviews": 59 JOINs in RDBMS Only read details } when needed } 7 Source: https://docs.mongodb.com/manual/core/data-model-design/

1:M relationships: embed documents if number of embedded document is small 1:M embedded relationships Max document size is 16MB Normalized Embedded // patron collection { "_id": "joe", { _id: "joe", "name": "Joe Bookreader", name: "Joe Bookreader” } "addresses": [ { "street": "123 Fake Street", // address collection "city": "Faketon", { patron_id: "joe", //patron "state": "MA", street: "123 Fake Street", "zip": "12345” }, city: "Faketon", state: "MA", { "street": "1 Some Other Street", zip: "12345” } "city": "Boston", "state": "MA", { patron_id: "joe", //patron "zip": "12345” } street: "1 Some Other Street", ] city: "Boston", } state: "MA", All addresses read in with one read of document • zip: "12345 } No need for a JOIN operation to get addresses • Subset problem applies here too • 8 Use if address does not need to stand alone • Source: https://docs.mongodb.com/manual/core/data-model-design/

1:M relationships: use normalized references to avoid duplication 1:M normalized relationships //publisher collection { _id: "oreilly", //books collection name: "O'Reilly Media", { title: "MongoDB: The Definitive Guide", founded: 1980, author: [ "Kristina Chodorow", "Mike Dirolf"], location: "CA” } published_date: ISODate("2010-09-24"), //books collection pages: 216, { _id: 123456789, language: "English", title: "MongoDB: The Definitive Guide", publisher: { name: "O'Reilly Media", author: [ "Kristina Chodorow", "Mike Dirolf"], founded: 1980, location: "CA” } published_date: ISODate("2010-09-24"), } pages: 216, language: "English", { title: "50 Tips and Tricks for MongoDB ", publisher_id: "oreilly” } author: "Kristina Chodorow", { _id: 234567890, published_date: ISODate("2011-05-06"), title: "50 Tips and Tricks for MongoDB ", pages: 68, author: "Kristina Chodorow", language: "English”, published_date: ISODate("2011-05-06"), publisher: { name: "O'Reilly Media", pages: 68, founded: 1980, location: "CA” } language: "English", } 9 publisher_id: "oreilly"} Source: https://docs.mongodb.com/manual/core/data-model-design/

M:N relationships can be easily implemented with two-way referencing M:N Two collections One person is assigned many tasks db.person.findOne() One task is assigned to many people { _id: ObjectId(”ABC"), name: ”Alice", tasks [ // Alice is assigned three tasks ObjectId(”123"), //write lesson plan below ObjectId(”124"), //another task ObjectId(”125") //Alice’s third task Create array of references ] • Person to task } • Task to person db.tasks.findOne() { _id: ObjectID(”123"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), assigned: [ObjectId(”ABC") // Reference to Alice ObjectId(“DEF”) //Reference to another person assigned to this task ] } Advantage: Easy to find who is assigned to tasks, and which tasks a person is assigned • Disadvantage: If person added to removed from task, must update two tables • 10 Adapted from https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

Sometimes it is useful to denormalize M:N Two collections One person is assigned many tasks db.person.findOne() One task is assigned to many people { _id: ObjectId(”ABC"), name: ”Alice", tasks [ // Alice is assigned three tasks ObjectId(”123"), //write lesson plan below ObjectId(”124"), //another task Denormalize to include person’s name ObjectId(”125") //still another task in tasks collection of assigned people ] Now do not need to look up the } names of people assigned to tasks db.tasks.findOne() { _id: ObjectID(”123"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), assigned: [{person _id: ObjectId(”ABC"), name: “Alice”}, // now have Alice’s name {person_id: ObjectId(“DEF”), name: “Bob”} //also have Bob’s name Use denormaliztion if many ] more reads than writes } Advantage: No need to lookup people’s name when finding tasks • Do not denormalize something that changes frequently! Disadvantage: If Alice’s name changes, must update person collection and all entries in task collection • 11 Adapted from https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com - PowerPoint PPT Presentation

CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com unless otherwise noted Agenda 1. Data relationships 2. Accessing embedded documents 2 The big schema design question is whether to embed documents or normalize Embedded vs

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

National Address Database National Address Database What is a National Address Database?

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Introduction to Database Systems Database Systems Lecture 1 Natasha Alechina www.cs.nott.ac.uk/

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

API Design made easy with the GraphQL Whiteboard Jrgen Albert Data In Motion Consulting GmbH

CS31 Discussion 1E Spring 17: week 08 TA: Bo-Jhang Ho bojhang@cs.ucla.edu Credit to former

CSCI 4250/6250 Fall 2013 Computer and Networks Security Network Security Goodrich, Chapter

Attack Class: Address Spoofing L. Todd Heberlein 23 Oct 1996 Net Squared Inc. todd@NetSQ.com

Informational Presentation 2020 Bond Issue Election Date February 11, 2020 LPS Bond Issue The

Informatik II Tutorial 6 Mihai Bce mihai.bace@inf.ethz.ch Mihai Bce | | 1-Nov-19 1

CSEE 3827: Fundamentals of Computer Systems Single Cycle MIPS Implementation Outline We will

Class Five You havent run screaming yet... Lets do pointers! pointers are one of the