apache systemml declarative machine learning
play

Apache SystemML Declarative Machine Learning Luciano Resende IBM | - PowerPoint PPT Presentation

Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende IBM | Spark Technology Center IBM Spark Technology Center About Me Luciano Resende (lresende@apache.org) Architect and community liaison at IBM


  1. Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende IBM | Spark Technology Center IBM Spark Technology Center

  2. About Me Luciano Resende (lresende@apache.org) • Architect and community liaison at IBM – Spark Technology Center • Have been contributing to open source at ASF for over 10 years • Currently contributing to : Apache Bahir, Apache Spark, Apache Zeppelin and Apache SystemML (incubating) projects @lresende1975 lresende http://lresende.blogspot.com/ http://slideshare.net/luckbr1975 https://www.linkedin.com/in/lresende IBM Spark Technology Center 2

  3. Origins of the SystemML Project 2007-2008: Multiple projects at IBM Research – Almaden involving machine learning on Hadoop. 2009: A dedicated team for scalable ML was created. 2009-2010: Through engagements with customers, we observe how data scientists create machine learning algorithms. IBM Spark Technology Center

  4. State-of-the-Art: Small Data Data Data Scientist R or Python Personal Computer Results IBM Spark Technology Center

  5. State-of-the-Art: Big Data Systems Data Programmer Scientist R or Python Scala Results IBM Spark Technology Center

  6. State-of-the-Art: Big Data Systems Data Programmer Scientist 😟 Days or weeks per iteration 😟 Errors while translating R or Python Scala algorithms Results IBM Spark Technology Center

  7. The SystemML Vision Data Scientist R or Python SystemML Results IBM Spark Technology Center

  8. The SystemML Vision 😄 Fast iteration Data 😄 Same answer Scientist R or Python SystemML Results IBM Spark Technology Center

  9. Running Example: Alternating Least Squares Movies Factor sparse matrix . × Multiply these j two factors to Movies produce a less- User i liked Problem: Movie i movie j. Recommendations Users Factor Users New nonzero values become movies suggestions. IBM Spark Technology Center

  10. Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } IBM Spark Technology Center

  11. Alternating Least Squares (in R) 1 U = rand(nrow(X), r, min = -1.0, max = 1.0); 1. Start with random factors. V = rand(r, ncol(X), min = -1.0, max = 1.0); 4 while(i < mi) { i = i + 1; ii = 1; 2. Hold the Movies factor constant and if (is_U) 2 G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else find the best value for the Users factor. 3 G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; (Value that most closely approximates the original matrix) 4 while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { 3. Hold the Users factor constant and find HS = (W * (S %*% V)) %*% t(V) + lambda * S; 2 alpha = norm_R2 / sum (S * HS); U = U + alpha * S; the best value for the Movies factor. } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; 3 alpha = norm_R2 / sum (S * HS); 4. Repeat steps 2-3 until convergence. V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; 4 ii = ii + 1; Every line has a clear purpose! } is_U = ! is_U; } IBM Spark Technology Center

  12. Alternating Least Squares (spark.ml) IBM Spark Technology Center

  13. Alternating Least Squares (spark.ml) IBM Spark Technology Center

  14. Alternating Least Squares (spark.ml) IBM Spark Technology Center

  15. Alternating Least Squares (spark.ml) IBM Spark Technology Center

  16. 25 lines’ worth of algorithm… …mixed with 800 lines of performance code IBM Spark Technology Center

  17. Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } IBM Spark Technology Center

  18. Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); (in SystemML’s V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) subset of R) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { SystemML can compile and run this HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; algorithm at scale } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; No additional performance code } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); needed! S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } IBM Spark Technology Center

  19. How fast does it run? Running time comparisons between machine learning algorithms are problematic • Different, equally-valid answers • Different convergence rates on different data • But we’ll do one anyway IBM Spark Technology Center

  20. Performance Comparison: ALS >24h >24h 20000 15000 Running Time (sec) R MLLib SystemML 10000 OOM OOM 5000 0 1.2GB (sparse binary) 12GB 120GB Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data, IBM Spark Technology Center Details: sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations tuned so that all algorithms produce comparable result quality.

  21. Takeaway Points SystemML runs the R script in parallel • Same answer as original R script • Performance is comparable to a low-level RDD-based implementation How does SystemML achieve this result? IBM Spark Technology Center

  22. The SystemML Runtime for Spark Automates critical performance decisions • Distributed or local computation? • How to partition the data? • To persist or not to persist? High-level language front-ends Distributed vs local: Hybrid runtime High-Level Operations (HOPs) General representation of statements in the data Cost analysis language Based Optimizer • Multithreaded computation in Spark Driver Low-Level Operations (LOPs) General representation of operations in the runtime framework • Distributed computation in Spark Executors Multiple execution • Optimizer makes a cost-based choice environments IBM Spark Technology Center 22

  23. But wait, there’s more! Many other rewrites Cost-based selection of physical operators Dynamic recompilation for accurate stats Parallel FOR (ParFor) optimizer Direct operations on RDD partitions YARN and MapReduce support IBM Spark Technology Center

  24. Summary Cost-based compilation of machine learning algorithms generates execution plans • for single-node in-memory, cluster, and hybrid execution • for varying data characteristics: – varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data • for varying cluster characteristics (memory configurations, degree of parallelism) Out-of-the-box, scalable machine learning algorithms • e.g. descriptive statistics, regression, clustering, and classification "Roll-your-own" algorithms • Enable programmer productivity (no worry about scalability, numeric stability, and optimizations) • Fast turn-around for new algorithms Higher-level language shields algorithm development investment from platform progression • Yarn for resource negotiation and elasticity • Spark for in-memory, iterative processing IBM Spark Technology Center

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend