Apache SystemML Declarative Machine Learning Luciano Resende IBM | - PowerPoint PPT Presentation

Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende IBM | Spark Technology Center IBM Spark Technology Center

About Me Luciano Resende (lresende@apache.org) • Architect and community liaison at IBM – Spark Technology Center • Have been contributing to open source at ASF for over 10 years • Currently contributing to : Apache Bahir, Apache Spark, Apache Zeppelin and Apache SystemML (incubating) projects @lresende1975 lresende http://lresende.blogspot.com/ http://slideshare.net/luckbr1975 https://www.linkedin.com/in/lresende IBM Spark Technology Center 2

Origins of the SystemML Project 2007-2008: Multiple projects at IBM Research – Almaden involving machine learning on Hadoop. 2009: A dedicated team for scalable ML was created. 2009-2010: Through engagements with customers, we observe how data scientists create machine learning algorithms. IBM Spark Technology Center

State-of-the-Art: Small Data Data Data Scientist R or Python Personal Computer Results IBM Spark Technology Center

State-of-the-Art: Big Data Systems Data Programmer Scientist R or Python Scala Results IBM Spark Technology Center

State-of-the-Art: Big Data Systems Data Programmer Scientist 😟 Days or weeks per iteration 😟 Errors while translating R or Python Scala algorithms Results IBM Spark Technology Center

The SystemML Vision Data Scientist R or Python SystemML Results IBM Spark Technology Center

The SystemML Vision 😄 Fast iteration Data 😄 Same answer Scientist R or Python SystemML Results IBM Spark Technology Center

Running Example: Alternating Least Squares Movies Factor sparse matrix . × Multiply these j two factors to Movies produce a less- User i liked Problem: Movie i movie j. Recommendations Users Factor Users New nonzero values become movies suggestions. IBM Spark Technology Center

Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } IBM Spark Technology Center

Alternating Least Squares (in R) 1 U = rand(nrow(X), r, min = -1.0, max = 1.0); 1. Start with random factors. V = rand(r, ncol(X), min = -1.0, max = 1.0); 4 while(i < mi) { i = i + 1; ii = 1; 2. Hold the Movies factor constant and if (is_U) 2 G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else find the best value for the Users factor. 3 G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; (Value that most closely approximates the original matrix) 4 while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { 3. Hold the Users factor constant and find HS = (W * (S %*% V)) %*% t(V) + lambda * S; 2 alpha = norm_R2 / sum (S * HS); U = U + alpha * S; the best value for the Movies factor. } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; 3 alpha = norm_R2 / sum (S * HS); 4. Repeat steps 2-3 until convergence. V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; 4 ii = ii + 1; Every line has a clear purpose! } is_U = ! is_U; } IBM Spark Technology Center

Alternating Least Squares (spark.ml) IBM Spark Technology Center

25 lines’ worth of algorithm… …mixed with 800 lines of performance code IBM Spark Technology Center

Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } IBM Spark Technology Center

Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); (in SystemML’s V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) subset of R) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { SystemML can compile and run this HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; algorithm at scale } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; No additional performance code } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); needed! S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } IBM Spark Technology Center

How fast does it run? Running time comparisons between machine learning algorithms are problematic • Different, equally-valid answers • Different convergence rates on different data • But we’ll do one anyway IBM Spark Technology Center

Performance Comparison: ALS >24h >24h 20000 15000 Running Time (sec) R MLLib SystemML 10000 OOM OOM 5000 0 1.2GB (sparse binary) 12GB 120GB Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data, IBM Spark Technology Center Details: sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations tuned so that all algorithms produce comparable result quality.

Takeaway Points SystemML runs the R script in parallel • Same answer as original R script • Performance is comparable to a low-level RDD-based implementation How does SystemML achieve this result? IBM Spark Technology Center

The SystemML Runtime for Spark Automates critical performance decisions • Distributed or local computation? • How to partition the data? • To persist or not to persist? High-level language front-ends Distributed vs local: Hybrid runtime High-Level Operations (HOPs) General representation of statements in the data Cost analysis language Based Optimizer • Multithreaded computation in Spark Driver Low-Level Operations (LOPs) General representation of operations in the runtime framework • Distributed computation in Spark Executors Multiple execution • Optimizer makes a cost-based choice environments IBM Spark Technology Center 22

But wait, there’s more! Many other rewrites Cost-based selection of physical operators Dynamic recompilation for accurate stats Parallel FOR (ParFor) optimizer Direct operations on RDD partitions YARN and MapReduce support IBM Spark Technology Center

Summary Cost-based compilation of machine learning algorithms generates execution plans • for single-node in-memory, cluster, and hybrid execution • for varying data characteristics: – varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data • for varying cluster characteristics (memory configurations, degree of parallelism) Out-of-the-box, scalable machine learning algorithms • e.g. descriptive statistics, regression, clustering, and classification "Roll-your-own" algorithms • Enable programmer productivity (no worry about scalability, numeric stability, and optimizations) • Fast turn-around for new algorithms Higher-level language shields algorithm development investment from platform progression • Yarn for resource negotiation and elasticity • Spark for in-memory, iterative processing IBM Spark Technology Center

Apache SystemML Declarative Machine Learning Luciano Resende IBM | - PowerPoint PPT Presentation

Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende IBM | Spark Technology Center IBM Spark Technology Center About Me Luciano Resende (lresende@apache.org) Architect and community liaison at IBM

SystemML: Declarative Machine Learning on Spark 05/03/19 Presented by: Juan Carrillo Candidate

Apache SystemML - Declarative Large-Scale Machine Learning Romeo Kienzler (IBM Waston IoT)

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache PredictionIO End-to-End Machine Learning Server with Apache Spark What is Machine

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

Experimental Analysis of Mode Switching Techniques, Yang Li et al. Premise: Mode switching is

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models Linear

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

Variance stabilization and simple GARCH models Erik Lindstrm Simulation, GBM Standard model in

Biospecimen Assessment Michelle Danaher University of Maryland, Baltimore County and Eunice

Engineering, Georgia Tech Chaos & non-linear forecasting Reference: [ Deepay Chakrabarti

= + + f( f( ) a ) a D m , c e 1 ij new 0 ij expt jk k

Functional Equations & Neural Networks for Time Series Interpolation Lars Kindermann, AWI

Apache SystemML Declarative Machine Learning Luciano Resende IBM | - PowerPoint PPT Presentation

Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende IBM | Spark Technology Center IBM Spark Technology Center About Me Luciano Resende (lresende@apache.org) Architect and community liaison at IBM

SystemML: Declarative Machine Learning on Spark 05/03/19 Presented by: Juan Carrillo Candidate

Apache SystemML - Declarative Large-Scale Machine Learning Romeo Kienzler (IBM Waston IoT)

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Multi-tenant Machine Learning Apache Aurora &amp; Apache Mesos Stephan Erb

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache PredictionIO End-to-End Machine Learning Server with Apache Spark What is Machine

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC &amp; Apache Software Foundation

Experimental Analysis of Mode Switching Techniques, Yang Li et al. Premise: Mode switching is

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models Linear

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

Variance stabilization and simple GARCH models Erik Lindstrm Simulation, GBM Standard model in

Biospecimen Assessment Michelle Danaher University of Maryland, Baltimore County and Eunice

Engineering, Georgia Tech Chaos &amp; non-linear forecasting Reference: [ Deepay Chakrabarti

= + + f( f( ) a ) a D m , c e 1 ij new 0 ij expt jk k

Functional Equations &amp; Neural Networks for Time Series Interpolation Lars Kindermann, AWI

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

Engineering, Georgia Tech Chaos & non-linear forecasting Reference: [ Deepay Chakrabarti

Functional Equations & Neural Networks for Time Series Interpolation Lars Kindermann, AWI