comp9313 big data management
play

COMP9313: Big Data Management Course Introduction Lecture in - PowerPoint PPT Presentation

COMP9313: Big Data Management Course Introduction Lecture in Charge Lecturer: Yifang Sun office: used to be K17-208, at home now email: yifangs@cse.unsw.edu.au use [comp9313] in subject Research interests Database


  1. COMP9313: Big Data Management Course Introduction

  2. Lecture in Charge • Lecturer: Yifang Sun • office: used to be K17-208, at home now… • email: yifangs@cse.unsw.edu.au • use [comp9313] in subject • Research interests • Database • High dimensional data • Machine learning (Natural language processing) • Integration of DB and AI 2

  3. Course Aims • Introduce the concepts behind Big Data • Introduce the core technologies used in managing large-scale data sets • MapReduce • Spark • … • Introduce technologies for developing solutions to large-scale data analytics problems • nearest neighbor search • machine learning with big data • … 3

  4. Course Aims - cont. • Not possible to cover every aspect of big data management • We will focus on • concepts • algorithms • principles • We will not focus on • programming languages and API • specific platforms • Make use of tutorials and documents on the Internet 4

  5. Lectures • Delivered through pre-recorded videos • location: anywhere you like • time: anytime you like • links to videos available on Piazza every Mon and Wed • email LiC ASAP if you have no access to Piazza • Slides on course website • No QA sessions during lectures • Ask in Piazza or online consultations • Schedule and length of lectures may vary based on the progress of the course • Note: watching every lecture is assumed. 5

  6. Resources • Books • Hadoop: The Definitive Guide. Tom White. 4th Edition - O’Reilly Media • Learning PySpark. Tomasz Drabas and Denny Lee. O’Reilly Media • Data-Intensive Text Processing with MapReduce. Jimmy Lin and Chris Dyer. University of Maryland, College Park. • Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeff Ullman. 3rd edition - Cambridge University Press • Online resources: • PySpark Tutorial • Spark Python API Docs • Online courses/tutorials in Youtube, coursera , … 6

  7. Pre-requisite • Official prerequisite • Data Structures and Algorithms • Database Systems • Before commencing this course, you should • have experiences and good knowledge of algorithm design • have solid background in database systems • have solid programming skills in Python • be familiar with Linux operating systems • have basic knowledge of linear algebra, probability theory and statistics • No previous experience necessary in • MapReduce/Spark • Parallel and distributed programming 7

  8. Please do not enrol if you… • Don’t have COMP9024/9311 knowledge • Cannot produce correct Python program on your own • Have poor time management • Are too busy to watch lecture videos/labs • Otherwise, you are likely to perform badly in this subject 8

  9. Assessment • One written assignment (20%) • Two programming projects (25% each) • Final exam (30%) • There’s no hurdle for any of the above components • All are individual tasks • All are submitted through give 9

  10. Written Assignment • Exam-style questions • Computational, short answer • no essay, no multiple choice • Regarding the lecture contents • algorithms, principles, … • to assess your understanding, not memory • Late penalty • firm deadline • zero mark for late submission 10

  11. Programming projects • Tentative topics • One on MapReduce + nearest neighbor search • One on PySpark + machine learning • Both results and source codes will be checked. • Zero mark if your codes cannot be run due to some bugs. • Late penalty • 10% reduction of raw marks for the 1 st day, 30% reduction per day for the following 3 days 11

  12. Final exam • Open book exam • Firm deadline • No supplementary exam will be given • Special consideration must be submitted prior to the start of the exam • More details on the way 12

  13. Academic honesty and plagiarism • Zero tolerance to plagiarism • You will get 0 marks • Examples of misconduct: • Copy other students’ work • Let other students copy your work • Copy from GitHub • Find a ghost writer • … • I will not accept the following excuses: • “I’ve left the lab with my screen unlocked” • “He stole it from my computer” • “I only gave my code to A. A didn’t use it but gave it to B” • … 13

  14. Tentative course schedule Week Topic Assignment/Project 1 Course Introduction and Introduction to Big Data 2 Hadoop MapReduce 3 Hadoop MapReduce 4 Nearest Neighbor Search Project 1 5 Spark Assignment 6 Flexibility Week (no lecture) 7 Spark Project 2 8 Machine Learning with PySpark Data Stream + NoSQL 9 10 Revision and Exam Preparation 14

  15. Labs • Labs to help you with programming and projects • nothing to submit, no mark • using ipython notebooks • Contents • 1 lab on setting the environment • 1 lab on PySpark and MapReduce • 1 lab on NNS with MapReduce • 1 lab on Machine learning with PySpark 15

  16. Consultations • Online QA discussions in Piazza • encourage you all to participant • Online consultation with tutor • 1pm – 2pm every Friday • using Zoom • room number and password in Piazza • Private online consultation with me • please book an appointment with me with a brief description of your questions, with [comp9313] in subject 16

  17. General Recommendations • Make use of LiC and tutors • don’t hesitate to ask questions • Make use of Piazza • read the notices in course website and Piazza • participate in the discussions in Piazza • Make use of course materials • understand lecture slides • read specifications carefully • try the labs although they are not compulsory • Do not misconduct 17

  18. Your Feedbacks are Always Welcome • Please advice where I can improve after each lecture, through Piazza or by email • myExperience system 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend