CS226 Big-Data Management
Instructor: Ahmed Eldawy
1
CS226 Big-Data Management Instructor: Ahmed Eldawy 1 Welcome - - PowerPoint PPT Presentation
CS226 Big-Data Management Instructor: Ahmed Eldawy 1 Welcome (back) to UCR! 2 Class information Classes: Monday, Wednesday, Friday 1:00 1:50 PM at Humanities and Social Sciences1501 Instructor: Ahmed Eldawy TA: Saheli Ghosh Office
1
2
http://www.cs.ucr.edu/~eldawy/19FCS226/ iLearn (Any UCRX students?)
Subject: “[CS226] …”
3
4
Group Selection Project proposal (5%) Literature survey (10%) Report outline (5%) Class presentation (5%) Final report (15%) Poster presentation (10%)
5
6
7
8
9
10
11
12
13
unveils BIG DATA initiative: $200 Million in R&D investment
Washington Post is calling Obama “The Big Data President”
14
March 2014: David Cameron and Angela Merkel talking about Big Data in a Computer Expo in Hannover, Germany
15
16
17
18
Web search Marketing and advertising Data cleaning Knowledge base Information retrieval Internet of Things (IoT) Visualization Behavioral studies
19
20
http://mattturck.com/2012/06/29/a-chart-of-the-big-data-ecosystem/
21
http://mattturck.com/2014/05/11/the-state-of-big-data-in-2014-a-chart/
22
http://mattturck.com/2016/02/01/big-data-landscape/
23
24
25
26
(HDFS)
128MB 128MB 128MB 128MB 128MB 128MB …
27
Big volume HDFS limitation New programming paradigms Ad-hoc indexes
Global index Local indexes
28
29
…1000100010101011101110101010110111010111011101110100… Processing window
30
Map-Shuffle- Reduce Resiliency through materialization
Directed-Acyclic-Graph (DAG) In-memory processing Resiliency through lineages
M1 M2 … Mm R1 R2 Rn
31
Agg Agg Agg Merge Merge Partition Partition Partition Agg Agg
32
33
34
Hadoop Distributed File System (HDFS) Yet Another Resource Negotiator (YARN) MapReduce Query Engine Administration Pig
35
Hadoop Distributed File System (HDFS) Yet Another Resource Negotiator (YARN) Resilient Distributed Dataset (RDD) a.k.a Spark Core Data Frames MLlib GraphX SparkR Spark Streaming Spark SQL
36
Kubernetes
Hyracks Data-parallel Platform Algebricks Algebra Layer Hadoop MapReduce Compatibility Pregelix HiveSterix AsteixDB Other compilers Hyracks jobs Pregel Jobs MapReduce Jobs PigLatin HiveQL AsterixQL
37
Hadoop Distributed File System (HDFS) Yet Another Resource Negotiator (YARN) Query Executor Query Planner Query Parser
38
Hadoop Distributed File System (HDFS) + Spatial Indexing Yet Another Resource Negotiator (YARN) MapReduce Processing + Spatial Query Processing Spatial Visualization Pig Latin + Pigeon
39
40