CS535 Big Data 1/27/2020 Week 2-A Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, Page 1
CS535 BIG DATA
PART A. BIG DATA TECHNOLOGY
- 2. DATA PROCESSING PARADIGMS
FOR BIG DATA
Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535
FAQs
- Slides are available on the course web
- Canvas Discussion Board is available: Find
Find your your teammates! teammates!
- PA1
- Hadoop and Spark installation guides are posted
- Questions/need helps? Send an email to cs535@cs.colostate.edu or post your question on Piazza!
CS535 Big Data | Computer Science | Colorado State University
Overview of Part A
- Duration: Week 1 ~ Week 4
- 1. Introduction to Big Data (W1)
- 2. Data Process Paradigms for Big Data (W2)
- 3. Distributed Computing Models for Scalable Batch Computing
Part 1. MapReduce (W2) Part 2. In-Memory Cluster Computing Model: Apache Spark (W3, W4)
- 4. Real-time Streaming Computing Models (W4)
Apache Storm and Twitter Heron
CS535 Big Data | Computer Science | Colorado State University
- 2. Data Processing Paradigms For Big data
Lambda Architecture
CS535 Big Data | Computer Science | Colorado State University
This material is built based on
- Nathan Marz and James Warren, “Big Data, Principles and Best Practices of Scalable
Real-Time Data System”, 2015, Manning Publications, ISBN 9781617290343
CS535 Big Data | Computer Science | Colorado State University
Why do we need Big Data Technologies?
- To perform large-scale analytics over voluminous data, we need a high-
level architecture that provides,
- Robustness
- Fault-tolerant: Both against hardware failures and human mistakes
- Support for a wide range of workloads and use cases
- Low-latency reads and updates
- Batch analytics jobs
- Scalability
- Scale-out capabilities with minimal maintenance
CS535 Big Data | Computer Science | Colorado State University