COMP9313: Big Data Management Course Introduction Lecture in - - PowerPoint PPT Presentation

comp9313 big data management
SMART_READER_LITE
LIVE PREVIEW

COMP9313: Big Data Management Course Introduction Lecture in - - PowerPoint PPT Presentation

COMP9313: Big Data Management Course Introduction Lecture in Charge Lecturer: Yifang Sun office: used to be K17-208, at home now email: yifangs@cse.unsw.edu.au use [comp9313] in subject Research interests Database


slide-1
SLIDE 1

COMP9313: Big Data Management

Course Introduction

slide-2
SLIDE 2

Lecture in Charge

  • Lecturer: Yifang Sun
  • office: used to be K17-208, at home now…
  • email: yifangs@cse.unsw.edu.au
  • use [comp9313] in subject
  • Research interests
  • Database
  • High dimensional data
  • Machine learning (Natural language processing)
  • Integration of DB and AI

2

slide-3
SLIDE 3

Course Aims

  • Introduce the concepts behind Big Data
  • Introduce the core technologies used in

managing large-scale data sets

  • MapReduce
  • Spark
  • Introduce technologies for developing solutions

to large-scale data analytics problems

  • nearest neighbor search
  • machine learning with big data

3

slide-4
SLIDE 4

Course Aims - cont.

  • Not possible to cover every aspect of big data

management

  • We will focus on
  • concepts
  • algorithms
  • principles
  • We will not focus on
  • programming languages and API
  • specific platforms
  • Make use of tutorials and documents on the

Internet

4

slide-5
SLIDE 5

Lectures

  • Delivered through pre-recorded videos
  • location: anywhere you like
  • time: anytime you like
  • links to videos available on Piazza every Mon and

Wed

  • email LiC ASAP if you have no access to Piazza
  • Slides on course website
  • No QA sessions during lectures
  • Ask in Piazza or online consultations
  • Schedule and length of lectures may vary based
  • n the progress of the course
  • Note: watching every lecture is assumed.

5

slide-6
SLIDE 6

Resources

  • Books
  • Hadoop: The Definitive Guide. Tom White. 4th Edition -

O’Reilly Media

  • Learning PySpark. Tomasz Drabas and Denny Lee. O’Reilly

Media

  • Data-Intensive Text Processing with MapReduce. Jimmy Lin

and Chris Dyer. University of Maryland, College Park.

  • Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman,

Jeff Ullman. 3rd edition - Cambridge University Press

  • Online resources:
  • PySpark Tutorial
  • Spark Python API Docs
  • Online courses/tutorials in Youtube, coursera, …

6

slide-7
SLIDE 7

Pre-requisite

  • Official prerequisite
  • Data Structures and Algorithms
  • Database Systems
  • Before commencing this course, you should
  • have experiences and good knowledge of algorithm

design

  • have solid background in database systems
  • have solid programming skills in Python
  • be familiar with Linux operating systems
  • have basic knowledge of linear algebra, probability

theory and statistics

  • No previous experience necessary in
  • MapReduce/Spark
  • Parallel and distributed programming

7

slide-8
SLIDE 8

Please do not enrol if you…

  • Don’t have COMP9024/9311 knowledge
  • Cannot produce correct Python program on

your own

  • Have poor time management
  • Are too busy to watch lecture videos/labs
  • Otherwise, you are likely to perform badly in

this subject

8

slide-9
SLIDE 9

Assessment

  • One written assignment (20%)
  • Two programming projects (25% each)
  • Final exam (30%)
  • There’s no hurdle for any of the above

components

  • All are individual tasks
  • All are submitted through give

9

slide-10
SLIDE 10

Written Assignment

  • Exam-style questions
  • Computational, short answer
  • no essay, no multiple choice
  • Regarding the lecture contents
  • algorithms, principles, …
  • to assess your understanding, not memory
  • Late penalty
  • firm deadline
  • zero mark for late submission

10

slide-11
SLIDE 11

Programming projects

  • Tentative topics
  • One on MapReduce + nearest neighbor search
  • One on PySpark + machine learning
  • Both results and source codes will be

checked.

  • Zero mark if your codes cannot be run due to

some bugs.

  • Late penalty
  • 10% reduction of raw marks for the 1st day, 30%

reduction per day for the following 3 days

11

slide-12
SLIDE 12

Final exam

  • Open book exam
  • Firm deadline
  • No supplementary exam will be given
  • Special consideration must be submitted prior

to the start of the exam

  • More details on the way

12

slide-13
SLIDE 13

Academic honesty and plagiarism

  • Zero tolerance to plagiarism
  • You will get 0 marks
  • Examples of misconduct:
  • Copy other students’ work
  • Let other students copy your work
  • Copy from GitHub
  • Find a ghost writer
  • I will not accept the following excuses:
  • “I’ve left the lab with my screen unlocked”
  • “He stole it from my computer”
  • “I only gave my code to A. A didn’t use it but gave it

to B”

13

slide-14
SLIDE 14

Tentative course schedule

14

Week Topic Assignment/Project 1 Course Introduction and Introduction to Big Data 2 Hadoop MapReduce 3 Hadoop MapReduce 4 Nearest Neighbor Search Project 1 5 Spark Assignment 6 Flexibility Week (no lecture) 7 Spark Project 2 8 Machine Learning with PySpark 9 Data Stream + NoSQL 10 Revision and Exam Preparation

slide-15
SLIDE 15

Labs

  • Labs to help you with programming and

projects

  • nothing to submit, no mark
  • using ipython notebooks
  • Contents
  • 1 lab on setting the environment
  • 1 lab on PySpark and MapReduce
  • 1 lab on NNS with MapReduce
  • 1 lab on Machine learning with PySpark

15

slide-16
SLIDE 16

Consultations

  • Online QA discussions in Piazza
  • encourage you all to participant
  • Online consultation with tutor
  • 1pm – 2pm every Friday
  • using Zoom
  • room number and password in Piazza
  • Private online consultation with me
  • please book an appointment with me with a brief

description of your questions, with [comp9313] in subject

16

slide-17
SLIDE 17

General Recommendations

  • Make use of LiC and tutors
  • don’t hesitate to ask questions
  • Make use of Piazza
  • read the notices in course website and Piazza
  • participate in the discussions in Piazza
  • Make use of course materials
  • understand lecture slides
  • read specifications carefully
  • try the labs although they are not compulsory
  • Do not misconduct

17

slide-18
SLIDE 18

Your Feedbacks are Always Welcome

  • Please advice where I can improve after each

lecture, through Piazza or by email

  • myExperience system

18