CS535 Big Data 1/30/2019 Week 2- B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2019 Colorado State University 1
Week 2-A-0
CS535 BIG DATA
PART A. BIG DATA TECHNOLOGY
- 3. DISTRIBUTED COMPUTING
MODELS FOR SCALABLE BATCH COMPUTING
Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535
FAQs
- Term project deliverable 0
- Item 1: Your team members
- Item 2: Tentative project titles (up to 3)
- Submission deadline: Feb. 1
- Via email or canvas
- PA1
- Hadoop and Spark installation guides are posted
- If you would like to start your homework, please send me an email with your team information. I will
assign the port range for your team.
- Quiz 1: February 4. 2019 in class
1/30/2019 Colorado State University, Spring 2019 Week 2-A-1
Topics of Todays Class
- Overview of the Programing Assignment 1
- 3. Distributed Computing Models for Scalable Batch Computing
- MapReduce
1/30/2019 Colorado State University, Spring 2019 Week 2-A-2
Programming Assignment 1 Hyperlink-Induced Topic Search (HITS)
1/30/2019 Colorado State University, Spring 2019 Week 2-A-3
This material is built based on
- Kleinberg, Jon. "Authoritative sources in a hyperlinked environment". Journal of the
- ACM. 46 (5): 604–632
1/30/2019 Colorado State University, Spring 2019 Week 2-A-4
Types of Web queries
- Yes/No queries
- Does Chrome support .ogv video format?
- Broad topic queries
- Find information about “polar vortex”
- Similar-page query
- Find pages similar to ‘https://stackoverflow.com’
Image credit: https://www.cnn.com/2019/01/30/weather/winter-weather-wednesday-wxc/index.html 1/30/2019 Colorado State University, Spring 2019 Week 2-A-5