Big Data I: Graph Processing, Distributed Machine Learning CS 240: - PowerPoint PPT Presentation

Big Data I: Graph Processing, Distributed Machine Learning CS 240: Computing Systems and Concurrency Lecture 21 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected content adapted from J. Gonzalez.

Patient ate which Also sold contains Diagnoses Patient presents to abdominal purchased pain. from Diagnosis? with E. Coli infection

Big Data is Everywhere 6 Billion 900 Million 72 Hours a Minute 28 Million Flickr Photos Facebook Users YouTube Wikipedia Pages • Machine learning is a reality • How will we design and implement “Big Learning” systems? 3

We could use …. Threads, Locks, & Messages “Low-level parallel primitives”

Shift Towards Use Of Parallelism in ML GPUs Multicore Clusters Clouds Supercomputers • Programmers repeatedly solve the same parallel design challenges: – Race conditions, distributed state, communication… • Resulting code is very specialized : – Difficult to maintain, extend, debug… Idea: Avoid these problems by using high-level abstractions 5

... a better answer: MapReduce / Hadoop Build learning algorithms on top of high-level parallel abstractions

MapReduce – Map Phase 4 2 2 1 2 1 5 CPU 1 CPU 2 CPU 3 CPU 4 2 . . . . 3 3 8 9 Embarrassingly Parallel independent computation No Communication needed 7

MapReduce – Map Phase 8 1 8 2 4 8 4 CPU 1 CPU 2 CPU 3 CPU 4 4 . . . . 3 4 4 1 1 4 2 2 2 2 1 5 . . . . 9 3 3 8 Image Features 8

MapReduce – Map Phase 6 1 3 1 7 4 4 CPU 1 CPU 2 CPU 3 CPU 4 7 . . . . 5 9 3 5 8 1 8 1 2 4 2 2 4 8 4 2 4 2 1 5 . . . . . . . . 3 4 4 9 1 3 3 8 Embarrassingly Parallel independent computation 9

MapReduce – Reduce Phase Outdoor Picture Indoor Statistics Picture Statistics 17 22 Outdoor Indoor 26 CPU 1 26 CPU 2 Pictures Pictures . . 31 26 1 2 1 4 8 6 2 1 1 2 8 3 2 4 7 2 4 7 1 8 4 5 4 4 . . . . . . . . . . . . 9 1 5 3 3 5 3 4 9 8 4 3 I O O I I I O O I O I I Image Features 10

Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Is there more to Machine Learning Map Reduce ? Label Propagation Feature Algorithm Lasso Belief Extraction Tuning Kernel Propagation Methods Basic Data Processing Tensor PageRank Factorization Neural Deep Belief Networks Networks 11

Exploiting Dependencies

Graphs are Everywhere Collaborative Filtering Social Network Users Netflix Movies Probabilistic Analysis Text Analysis Wiki Docs Words

Concrete Example Label Propagation

Label Propagation Algorithm • Social Arithmetic: Sue Ann 50% What I list on my profile 80% Cameras 40% 40% Sue Ann Likes 20% Biking + 10% Carlos Like I Like: 60% Cameras, 40% Biking Profile 50% • Recurrence Algorithm: 50% Cameras Me 50% Biking ∑ Likes [ i ] = W ij × Likes [ j ] j ∈ Friends [ i ] Carlos – iterate until convergence 30% Cameras 10% 70% Biking • Parallelism: – Compute all Likes[i] in parallel

Properties of Graph Parallel Algorithms Dependency Factored Iterative Graph Computation Computation What I Like What My Friends Like

Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel MapReduce MapReduce? Label Propagation Feature Algorithm Lasso Belief Extraction Tuning Kernel Propagation Methods Basic Data Processing Tensor PageRank Factorization Neural Deep Belief Networks Networks 17

Problem: Data Dependencies • MapReduce doesn’t efficiently express data dependencies – User must code substantial data transformations – Costly data replication Independent Data Rows

Iterative Algorithms • MR doesn’t efficiently express iterative algorithms: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data Processor Slow CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

MapAbuse: Iterative MapReduce • Only a subset of data needs computation: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

MapAbuse: Iterative MapReduce • System is not optimized for iteration: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Startup Penalty Startup Penalty Startup Penalty Disk Penalty Disk Penalty Disk Penalty Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data

ML Tasks Beyond Data-Parallelism Data-Parallel Graph-Parallel Map Reduce ? Feature Cross Graphical Models Semi-Supervised Extraction Validation Gibbs Sampling Learning Computing Sufficient Belief Propagation Label Propagation Statistics Variational Opt. CoEM Collaborative Graph Analysis Filtering PageRank Triangle Counting Tensor Factorization 22

ML Tasks Beyond Data-Parallelism Data-Parallel Graph-Parallel Map Reduce Feature Cross Extraction Validation Pregel Computing Sufficient Statistics 23

• Limited CPU Power • Limited Memory • Limited Scalability 24

Distributed Cloud Scale up computational resources! Challenges: - Distribute state - Keep data consistent - Provide fault tolerance 25

The GraphLab Framework Graph Based Update Functions Data Representation User Computation Consistency Model 26

Data Graph Data is associated with both vertices and edges Graph: • Social Network Vertex Data: • User profile • Current interests estimates Edge Data: • Relationship (friend, classmate, relative) 27

Distributed Data Graph Partition the graph across multiple machines: 28

Distributed Data Graph • Ghost vertices maintain adjacency structure and replicate remote data. “ghost” vertices 29

Distributed Data Graph • Cut efficiently using HPC Graph partitioning tools (ParMetis / Scotch / …) “ghost” vertices 30

Update Function A user-defined program, applied to a vertex ; transforms data in scope of vertex Pagerank(scope){ // Update the current vertex data Update function applied (asynchronously) vertex.PageRank = α in parallel until convergence ForEach inPage: vertex.PageRank += (1 − α ) × inPage.PageRank Many schedulers available to prioritize computation // Reschedule Neighbors if needed if vertex.PageRank changes then reschedule_all_neighbors; } Selectively triggers computation at neighbors 32

Distributed Scheduling Each machine maintains a schedule over the vertices it owns a f b a b c d c h g e f g i j k h i j Distributed Consensus used to identify completion 33

Ensuring Race-Free Code • How much can computation overlap? 34

PageRank Revisited Pagerank(scope) { vertex.PageRank = α ForEach inPage: vertex.PageRank += (1 − α ) × inPage.PageRank vertex.PageRank = tmp … } 36

PageRank data races confound convergence 37

Racing PageRank: Bug Pagerank(scope) { vertex.PageRank = α ForEach inPage: vertex.PageRank += (1 − α ) × inPage.PageRank vertex.PageRank = tmp … } 38

Racing PageRank: Bug Fix Pagerank(scope) { tmp vertex.PageRank = α ForEach inPage: tmp vertex.PageRank += (1 − α ) × inPage.PageRank vertex.PageRank = tmp … } 39

Throughput != Performance Higher Throughput (#updates/sec) No Consistency Potentially Slower Convergence of ML 40

Serializability For every parallel execution , there exists a sequential execution of update functions which produces the same result. time CPU 1 Parallel CPU 2 Single Sequential CPU 41

Serializability Example Write Stronger / Weaker consistency levels available Read User-tunable consistency levels trades off parallelism & consistency Overlapping regions are only read. Update functions one vertex apart can be run in parallel. Edge Consistency 42

Distributed Consistency • Solution 1: Chromatic Engine – Edge Consistency via Graph Coloring • Solution 2: Distributed Locking

Chromatic Distributed Engine Execute tasks Execute tasks on all vertices of on all vertices of color 0 color 0 Ghost Synchronization Completion + Barrier Time Execute tasks on all vertices of Execute tasks color 1 on all vertices of color 1 Ghost Synchronization Completion + Barrier 44

Matrix Factorization • Netflix Collaborative Filtering – Alternating Least Squares Matrix Factorization Model: 0.5 million nodes, 99 million edges Users Movies Users Netflix D D Movies 45

Big Data I: Graph Processing, Distributed Machine Learning CS 240: - PowerPoint PPT Presentation

Big Data I: Graph Processing, Distributed Machine Learning CS 240: Computing Systems and Concurrency Lecture 21 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected content adapted from J.

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

The boundary Flows-3 conditions A Simple Solver for Variable Density Flow (3 of 3) Gretar

Enzo-E/Cello astrophysics and cosmology Adaptive mesh refinement astrophysics using Charm++ James

Low-Level Code and High-Level Theorems Sascha Bhme Technische Universitt Mnchen, Germany

Centralizing the Mink Survey at Centralizing the Mink Survey at the National Agricultural

Efciently combining, counting, and iterating W RITIN G EF F ICIEN T P YTH ON CODE Logan

HDR images acquisition: artifacts removal dr. Francesco Banterle francesco.banterle@isti.cnr.it

ADVANCES IN MONOLITHIC QUANTUM PHOTONICS FOR SENSING AMR S HELMY MIO 2018 @ TUM MUNICH,

Reconstruction de volumes ` a partir de coupes Simon Masnou Institut Camille Jordan Universit

Big Data I: Graph Processing, Distributed Machine Learning CS 240: - PowerPoint PPT Presentation

Big Data I: Graph Processing, Distributed Machine Learning CS 240: Computing Systems and Concurrency Lecture 21 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected content adapted from J.

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Batch &amp; Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

The boundary Flows-3 conditions A Simple Solver for Variable Density Flow (3 of 3) Gretar

Enzo-E/Cello astrophysics and cosmology Adaptive mesh refinement astrophysics using Charm++ James

Low-Level Code and High-Level Theorems Sascha Bhme Technische Universitt Mnchen, Germany

Centralizing the Mink Survey at Centralizing the Mink Survey at the National Agricultural

Efciently combining, counting, and iterating W RITIN G EF F ICIEN T P YTH ON CODE Logan

HDR images acquisition: artifacts removal dr. Francesco Banterle francesco.banterle@isti.cnr.it

ADVANCES IN MONOLITHIC QUANTUM PHOTONICS FOR SENSING AMR S HELMY MIO 2018 @ TUM MUNICH,

Reconstruction de volumes ` a partir de coupes Simon Masnou Institut Camille Jordan Universit

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri