cs 61 database systems
play

CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com - PowerPoint PPT Presentation

CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com unless otherwise noted Agenda 1. Data relationships 2. Accessing embedded documents 2 The big schema design question is whether to embed documents or normalize Embedded vs


  1. CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com unless otherwise noted

  2. Agenda 1. Data relationships 2. Accessing embedded documents 2

  3. The big schema design question is whether to embed documents or normalize Embedded vs normalized Embedded Normalized Students Grades for each class taken Each document in a collection has Each entity stands alone embedded documents Query second collection for details on primary collection 3

  4. Embedded data model moves all fields into one document > db.Students.find().pretty() { Also known as a denormalized data model "_id" : ObjectId(”ABC"), "name" : "Alice", Allows applications to store related pieces "year" : 20, of information in the same database record "GPA" : 3.5, "grades" : [ Improves read performance { "class" : "CS1", Result is fewer database queries and "grade" : "A" updates (no joins needed) }, { "class" : "CS10", Writes to documents are atomic "grade" : "A-" } ] } Use when: • Have a “contains” or “has” relationship between entities • 1:M relationship when M ≲ 1000 and when many side will always appear with one side (not stand alone) 4 • Document must be < 16 MB in size

  5. Normalized data model references other documents, like a relational database Normalized data model Grades collection > db.Grades.find().pretty() Like normalized { Students collection tables in RDBMS "_id" : ObjectId(”123"), > "student_id" : ObjectId(”ABC"), db.Students1.find().pretty() "class" : "CS1", { "grade" : "A" "_id" : ObjectId(”ABC"), } "name" : "Alice", "year" : 20 { } "_id" : ObjectId(”124"), "student_id" : ObjectId(”ABC"), Referential integrity "class" : "CS10", is not enforced "grade" : "A-" } Use when: • Embedding would result in duplication of data, but would not Writes are not improve read performance enough to outweigh duplication atomic across collections, but • To represent complex M:N relationships MongoDB has • To model large hierarchical datasets transactions 5 Source: https://docs.mongodb.com/manual/core/data-model-design/

  6. 1:1 relationships often suggest using embedded documents 1:1 relationships Normalized Embedded // patron collection { { _id: "joe", _id: "joe", name: "Joe Bookreader", name: "Joe Bookreader" address: { } street: "123 Fake Street", city: "Faketon", // address collection state: "MA", { zip: "12345" patron_id: "joe", //patron } street: "123 Fake Street", } city: "Faketon", Embed address into patron document • state: "MA", Now one database read gets both • zip: "12345" patron and address info vs. two reads } for normalized approach 6 Embedding is the preferred approach • Source: https://docs.mongodb.com/manual/core/data-model-design/

  7. 1:1 relationship counter-example is the subset problem, use normalized approach 1:1 relationship subset problem If you normally only { "_id": 1, need summary data "title": "The Arrival of a Train", about a movie, then "year": 1896, having plot and "plot": "A train is seen pulling into a station” fullplot means more "fullplot": "A group of people are standing in a straight line along… disk block reads "type": "movie", "directors": [ "Auguste Lumière", "Louis Lumière" ], Create separate "imdb": { collection for movie "rating": 7.3, "votes": 5043, "id": 12 details }, Easily store multiple "countries": [ "France" ], values in array "genres": [ "Documentary", "Short" ], Leave summary fields "tomatoes": { Would require in main collection "viewer": { multiple tables and "rating": 3.7, "numReviews": 59 JOINs in RDBMS Only read details } when needed } 7 Source: https://docs.mongodb.com/manual/core/data-model-design/

  8. 1:M relationships: embed documents if number of embedded document is small 1:M embedded relationships Max document size is 16MB Normalized Embedded // patron collection { "_id": "joe", { _id: "joe", "name": "Joe Bookreader", name: "Joe Bookreader” } "addresses": [ { "street": "123 Fake Street", // address collection "city": "Faketon", { patron_id: "joe", //patron "state": "MA", street: "123 Fake Street", "zip": "12345” }, city: "Faketon", state: "MA", { "street": "1 Some Other Street", zip: "12345” } "city": "Boston", "state": "MA", { patron_id: "joe", //patron "zip": "12345” } street: "1 Some Other Street", ] city: "Boston", } state: "MA", All addresses read in with one read of document • zip: "12345 } No need for a JOIN operation to get addresses • Subset problem applies here too • 8 Use if address does not need to stand alone • Source: https://docs.mongodb.com/manual/core/data-model-design/

  9. 1:M relationships: use normalized references to avoid duplication 1:M normalized relationships //publisher collection { _id: "oreilly", //books collection name: "O'Reilly Media", { title: "MongoDB: The Definitive Guide", founded: 1980, author: [ "Kristina Chodorow", "Mike Dirolf"], location: "CA” } published_date: ISODate("2010-09-24"), //books collection pages: 216, { _id: 123456789, language: "English", title: "MongoDB: The Definitive Guide", publisher: { name: "O'Reilly Media", author: [ "Kristina Chodorow", "Mike Dirolf"], founded: 1980, location: "CA” } published_date: ISODate("2010-09-24"), } pages: 216, language: "English", { title: "50 Tips and Tricks for MongoDB ", publisher_id: "oreilly” } author: "Kristina Chodorow", { _id: 234567890, published_date: ISODate("2011-05-06"), title: "50 Tips and Tricks for MongoDB ", pages: 68, author: "Kristina Chodorow", language: "English”, published_date: ISODate("2011-05-06"), publisher: { name: "O'Reilly Media", pages: 68, founded: 1980, location: "CA” } language: "English", } 9 publisher_id: "oreilly"} Source: https://docs.mongodb.com/manual/core/data-model-design/

  10. M:N relationships can be easily implemented with two-way referencing M:N Two collections One person is assigned many tasks db.person.findOne() One task is assigned to many people { _id: ObjectId(”ABC"), name: ”Alice", tasks [ // Alice is assigned three tasks ObjectId(”123"), //write lesson plan below ObjectId(”124"), //another task ObjectId(”125") //Alice’s third task Create array of references ] • Person to task } • Task to person db.tasks.findOne() { _id: ObjectID(”123"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), assigned: [ObjectId(”ABC") // Reference to Alice ObjectId(“DEF”) //Reference to another person assigned to this task ] } Advantage: Easy to find who is assigned to tasks, and which tasks a person is assigned • Disadvantage: If person added to removed from task, must update two tables • 10 Adapted from https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

  11. Sometimes it is useful to denormalize M:N Two collections One person is assigned many tasks db.person.findOne() One task is assigned to many people { _id: ObjectId(”ABC"), name: ”Alice", tasks [ // Alice is assigned three tasks ObjectId(”123"), //write lesson plan below ObjectId(”124"), //another task Denormalize to include person’s name ObjectId(”125") //still another task in tasks collection of assigned people ] Now do not need to look up the } names of people assigned to tasks db.tasks.findOne() { _id: ObjectID(”123"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), assigned: [{person _id: ObjectId(”ABC"), name: “Alice”}, // now have Alice’s name {person_id: ObjectId(“DEF”), name: “Bob”} //also have Bob’s name Use denormaliztion if many ] more reads than writes } Advantage: No need to lookup people’s name when finding tasks • Do not denormalize something that changes frequently! Disadvantage: If Alice’s name changes, must update person collection and all entries in task collection • 11 Adapted from https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend