csc 369 distributed computing
play

CSC 369: Distributed Computing Alex Dekhtyar April 17 Day 6: The - PowerPoint PPT Presentation

CSC 369: Distributed Computing Alex Dekhtyar April 17 Day 6: The Algebra Of Data Transformations Part II db.collection.aggregate() Housekeeping Lab 2: Submit from unix1-2-3-4-5 handin dekhtyar lab02 <files> Lab 3:


  1. CSC 369: Distributed Computing Alex Dekhtyar April 17 Day 6: The Algebra Of Data Transformations Part II db.collection.aggregate()

  2. Housekeeping ● Lab 2: ○ Submit from unix1-2-3-4-5 ○ handin dekhtyar lab02 <files> ● Lab 3: ● Lab 4: Python application, teams of 2.

  3. Recall from last class Q1: Find all CSSE faculty with highest {name:”Alex”, total enrollments, report name, teaches:[“CSC 369”, “DATA 452”], number of sections taught, total enrollment department:”CSSE”, enrollments:[28,20], position: “professor”, Q2: Report a list of instructors for each “CSC”, office:{building:14, room:210} “CPE” and “DATA” course. For each instructor, } list name and department. Decomposition into atomic operations in “Generalized” Data Algebra

  4. Q1: Find all CSSE faculty with highest total enrollments, report name, number of sections taught, total enrollment Keep only CSSE instructors Remove unnecessary data Find the total enrollment for each CSSE instructor and number of sections taught Find the largest total enrollment for a CSSE instructor Compare each instructor’s total enrollment to the largest; keep only instructors with largest enrollment

  5. Q2: Report a list of instructors for each Q1: Find all CSSE faculty with highest “CSC”, “CPE” and “DATA” course. For each total enrollments, report name, instructor, list name and department. number of sections taught, total enrollment Deconstruct “teaches” arrays, create one Keep only CSSE instructors object per instructor-course pairing Remove unnecessary data Keep information about only “CSC”, “CPE”, and “DATA” courses. Find the total enrollment for each CSSE instructor and number of sections taught Remove unnecessary data Find the largest total enrollment for a CSSE instructor For each course, combine instructors teaching it into a list Compare each instructor’s total enrollment to the largest; keep only instructors with largest enrollment Sort?

  6. Express using these operations…. Relational Algebra Generalized Algebra Selection Filtering Projection Projection/Transformation Set Operations Join Join Grouping/Aggregation Grouping/Aggregation Sort Sort

  7. Very Tersely Given a condition - keep only objects that satisfy it Filtering Modify the contents of its object based solely on Projection what’s in the object itself Transformation Break collection into groups, each representing objects with same values of some keys Grouping Compute an aggregate value over a set of objects Aggregation Combine objects from two different collections based on matches in values of some keys Join Return objects in a specific order Sort

  8. … and a few more Opposite of grouping - build an object for each Ungrouping element of an array Unwinding Return a specific number of documents Limit Return documents after skipping a specified Skip number Return a random sample of documents Sample

  9. Let’s use our intuition

  10. Q1: Find all CSSE faculty with highest total enrollments, report name, number of sections taught, total enrollment Keep only CSSE instructors Exercise Time!! Remove unnecessary data Find the total enrollment for each CSSE instructor and number of sections taught Find the largest total enrollment for a CSSE instructor File posted to Compare each instructor’s total enrollment to Slack/chat. the largest; keep only instructors with largest enrollment

  11. Q1: Find all CSSE faculty with highest total enrollments, report name, number of sections taught, total enrollment Keep only CSSE instructors Filtering Remove unnecessary data Projection Find the total enrollment for each CSSE Projection instructor and number of sections taught Aggregation Find the largest total enrollment for a CSSE Aggregation instructor Projection Compare each instructor’s total enrollment to the largest; keep only instructors with largest Filtering enrollment

  12. Q2: Report a list of instructors for each “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. Deconstruct “teaches” arrays, create one object per instructor-course pairing Exercise Time!! Keep information about only “CSC”, “CPE”, and “DATA” courses. Remove unnecessary data For each course, combine instructors teaching it into a list Sort?

  13. Q2: Report a list of instructors for each “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. Deconstruct “teaches” arrays, create one Unwinding object per instructor-course pairing Keep information about only “CSC”, Filtering “CPE”, and “DATA” courses. Remove unnecessary data Projection For each course, combine instructors Grouping teaching it into a list Sort? Sort

  14. Now, let’s learn all this for real... db.collection.aggregate(<aggregation pipeline>)

  15. Now, let’s learn all this for real... db.collection.aggregate({$operation:{<doc>}}, {$operation:{<doc>}}, …, })

  16. Now, let’s learn all this for real... db.collection.aggregate({$operation:{<doc>}}, {$operation:{<doc>}}, …, }) In Lecture: basic ideas In Lab : all the syntax you can handle

  17. $operation $redact Filtering $match Projection $project $set $unset $addFields $replaceRoot Aggregation $bucket $group Grouping $bucketAuto $sort $sortByCount Sort $lookup $graphLookup Join

  18. $operation Filtering $match Projection $project Aggregation $group Grouping $sort Sort $lookup Join

  19. $operation Filtering $match Unwinding $unwind Projection $project $limit Limit Aggregation Skip $skip $group Grouping Sample $sample $sort Sort $lookup Join

  20. Let’s Learn By Doing Query 2 first (it is simpler) Query 1 second (it has layers )

  21. Q2: Report a list of instructors for each “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. Deconstruct “teaches” arrays, create one Unwinding object per instructor-course pairing Keep information about only “CSC”, Filtering “CPE”, and “DATA” courses. Remove unnecessary data Projection For each course, combine instructors Grouping teaching it into a list Sort? Sort

  22. Q2: Report a list of instructors for each “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. Deconstruct “teaches” arrays, create one object per instructor-course pairing $unwind Keep information about only “CSC”, “CPE”, and “DATA” courses. $match Remove unnecessary data $project For each course, combine instructors teaching it into a list $group Sort? $sort

  23. Q2: Report a list of instructors for each “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. Deconstruct “teaches” arrays, create one object per instructor-course pairing $unwind Keep information about only “CSC”, “CPE”, and “DATA” courses. $match Remove unnecessary data $project For each course, combine instructors teaching it into a list $group Sort? $sort

  24. Q2: Report a list of instructors for each “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. Deconstruct “teaches” arrays, create one object per instructor-course pairing {$unwind: “$teaches”} $unwind Keep information about only “CSC”, “CPE”, and “DATA” courses. $match Remove unnecessary data $project For each course, combine instructors teaching it into a list $group Sort? $sort

  25. Q2: Report a list of instructors for each “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. Deconstruct “teaches” arrays, create one object per instructor-course pairing {$unwind: “$teaches”} $unwind Keep information about only “CSC”, “CPE”, and “DATA” courses. {$match: $match {teaches: {$in: [“CSC”, Remove unnecessary data “CPE”,”DATA”]} $project }} For each course, combine instructors teaching it into a list $group Sort? $sort

  26. Q2: Report a list of instructors for each “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. Deconstruct “teaches” arrays, create one object per instructor-course pairing {$unwind: “$teaches”} $unwind Keep information about only “CSC”, “CPE”, and “DATA” courses. {$match: $match {teaches: {$in: [/^CSC/, Remove unnecessary data /^CPE/,/^DATA/]} $project }} For each course, combine instructors teaching it into a list $group Regular Expressions!!! Sort? $sort

  27. Q2: Report a list of instructors for each {$unwind: “$teaches”} “CSC”, “CPE” and “DATA” course. For each instructor, list name and department. {$match: Deconstruct “teaches” arrays, create one {teaches: {$in: [/^CSC/, object per instructor-course pairing /^CPE/,/^DATA/]}}} $unwind Keep information about only “CSC”, “CPE”, and “DATA” courses. $match {"name" : "Alex", Remove unnecessary data "teaches" : "CSC 369", "department" : "CSSE", $project "enrollments" : [28,20], For each course, combine instructors "position" : "professor", "office" : {"building" : 14, teaching it into a list "room" : 210 } $group } Sort? $sort

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend