SystemML: Declarative Machine Learning on Spark 05/03/19 Presented - - PowerPoint PPT Presentation

systemml declarative machine learning on spark
SMART_READER_LITE
LIVE PREVIEW

SystemML: Declarative Machine Learning on Spark 05/03/19 Presented - - PowerPoint PPT Presentation

SystemML: Declarative Machine Learning on Spark 05/03/19 Presented by: Juan Carrillo Candidate for MASc. in Computer Software Department of Electrical & Computer Engineering University of Waterloo Agenda 1. Introduction 2. SystemML core


slide-1
SLIDE 1

SystemML: Declarative Machine Learning on Spark

Presented by: Juan Carrillo Candidate for MASc. in Computer Software Department of Electrical & Computer Engineering University of Waterloo

05/03/19

slide-2
SLIDE 2

Agenda

  • 1. Introduction
  • 2. SystemML core features
  • 3. Experiments
  • 4. Conclusions
  • 5. Discussion

SystemML: Declarative Machine Learning on Spark

PAGE 2

slide-3
SLIDE 3

SystemML: Declarative Machine Learning on Spark

PAGE 3

Introduction

1

slide-4
SLIDE 4
  • 1. Introduction

SystemML: Declarative Machine Learning on Spark

PAGE 4

Machine Learning for Big Data Analytics

slide-5
SLIDE 5
  • 1. Introduction

SystemML: Declarative Machine Learning on Spark

PAGE 5

The problem, and the SystemML approach

Usual workflow SystemML approach

Time consuming Error prone Accelerates model development Simplifies deployment DML

Source: Spark Summit. Inside Apache SystemML

slide-6
SLIDE 6
  • 1. Introduction

SystemML: Declarative Machine Learning on Spark

PAGE 6

SystemML background

2010

Creation

By researchers at the IBM Almaden Research Center 2015

Open-source

Spark Summit in San Francisco 2017

Top Level Project

Apache Software Foundation Board 2018

Current release 1.2

Deep learning functions Ultra-sparse data

slide-7
SLIDE 7

SystemML: Declarative Machine Learning on Spark

PAGE 7

SystemML core features

2

slide-8
SLIDE 8
  • 2. SystemML core features

SystemML: Declarative Machine Learning on Spark

PAGE 8

Optimizer integration

Source: Spark Summit. Inside Apache SystemML

slide-9
SLIDE 9
  • 2. SystemML core features

SystemML: Declarative Machine Learning on Spark

PAGE 9

Optimizer integration

Source: Spark Summit. Inside Apache SystemML

slide-10
SLIDE 10
  • 2. SystemML core features

SystemML: Declarative Machine Learning on Spark

PAGE 10

Optimizer integration

Source: Spark Summit. Inside Apache SystemML

slide-11
SLIDE 11
  • 2. SystemML core features

Distributed Matrix Representation

SystemML: Declarative Machine Learning on Spark

PAGE 11

Runtime integration

Buffer Pool Integration

slide-12
SLIDE 12
  • 2. SystemML core features

SystemML: Declarative Machine Learning on Spark

PAGE 12

Runtime integration

Specific Runtime Optimizations

  • Lazy Spark-Context Creation
  • Short-Circuit Read
  • Short-Circuit Collect

+

Dynamic recompilation

  • Adapt the runtime plan to changing or

initially unknown data characteristics

+

Partitioning Operations

  • Partitioning-Preserving Operations
  • Partitioning-Exploiting Operations

+

slide-13
SLIDE 13

SystemML: Declarative Machine Learning on Spark

PAGE 13

Experiments

3

slide-14
SLIDE 14
  • 3. Experiments

SystemML: Declarative Machine Learning on Spark

PAGE 14

End-to-End Performance

slide-15
SLIDE 15
  • 3. Experiments

SystemML: Declarative Machine Learning on Spark

PAGE 15

Runtime per Iteration

slide-16
SLIDE 16

SystemML: Declarative Machine Learning on Spark

PAGE 16

Conclusions

4

slide-17
SLIDE 17
  • 4. Conclusions

✓ Importance of DML as a high-level language to improve

interoperability and scalability of Machine Learning models on Spark

✓ Multiple layers of abstraction and optimizations make SystemML a

powerful tool for accelerating the development of Machine Learning models over Big Data

✓ Experimental evaluation on multiple ML models and datasets

SystemML: Declarative Machine Learning on Spark

PAGE 17

Takeaways and paper contributions

slide-18
SLIDE 18

SystemML: Declarative Machine Learning on Spark

PAGE 18

Thanks for your attention

slide-19
SLIDE 19

SystemML: Declarative Machine Learning on Spark

PAGE 19

Discussion

5

slide-20
SLIDE 20
  • 5. Discussion
  • 1. Optimizer. How to optimize ML models over data streams?
  • 2. Runtime. In dynamic recompilation, what could be unknown data

characteristics?

  • 3. Experiments. How SystemML might perform for the KNN algorithm?

SystemML: Declarative Machine Learning on Spark

PAGE 20

Research Industry

  • 5. Current capabilities compared to other tools such as Numpy, Scikit

Learn, or TensorFlow?

  • 6. Adoption in the current ML and Big Data user base?
  • 7. SystemML in Cloud computing infrastructure. Beyond IBM?