Health Cloud Project Integrated Media Systems Center University of - - PowerPoint PPT Presentation

health cloud project
SMART_READER_LITE
LIVE PREVIEW

Health Cloud Project Integrated Media Systems Center University of - - PowerPoint PPT Presentation

Health Cloud Project Integrated Media Systems Center University of Southern California Dimitrios Stripelis stripeli@usc.edu 1 Purpose Compute Machine Learning models from independent Spark clusters Combine partial models to construct


slide-1
SLIDE 1

1

Integrated Media Systems Center University of Southern California Dimitrios Stripelis stripeli@usc.edu

Health Cloud Project

slide-2
SLIDE 2

2

  • Compute Machine Learning models from independent

Spark clusters

  • Combine partial models to construct a unified ML

model

Purpose

slide-3
SLIDE 3

3

Framework Schematically

  • 1 Main Portal for submitting requests
  • Independent Spark clusters each

residing on a remote hospital network

slide-4
SLIDE 4

4

  • User accesses Portal (Server 1) and

requests the construction of a ML model from each remote Spark Cluster

  • The Cluster receives the request and

computes the model through Spark MLlib

  • Once computation finishes every model

along with algorithmic-specific auxiliary data are returned to the Portal in jSON format for unification

Framework Operations

slide-5
SLIDE 5

5

Currently the Framework supports two principal Algorithms: Naive Bayes Linear Regression with Stochastic Gradient Descent (SGD) Extensible for: classification & regression: SVM, decision trees collaborative filtering: alternating least squares (ALS) clustering: k-means, Gaussian Mixture, Latent Dirichlet Allocation (LDA)

  • ptimization: limited-memory BFGS (L-BFGS)

ML Algorithms

slide-6
SLIDE 6

6

We evaluated the Framework’s efficiency against Medical datasets available at the UCI Machine Learning repository. The datasets were related to:

  • Single Proton Emission Computed Tomography images
  • Diabetes 130
  • Parkinsons Telemonitoring Data Set

Datasets

slide-7
SLIDE 7

7

We developed the Health Cloud Framework’s infrastructure on Microsoft Azure Service on three type D1 servers. Portal Role We use the main Portal (server 1) to submit the machine learning computation requests on each remote server (servers 2, 3) by passing the following arguments: 1. Accessible External Hostname for each server 2. Name of the Machine Learning Algorithm to be computed 3. Path to the training data file in the remote server 4. Path to the testing data file in the remote server 5. Aglorithmic-specific parameters for model computation

Implementation

slide-8
SLIDE 8

8

  • After we have submitted the request to the Framework, we initialize a Spark cluster on

each server, i.e. a single Master and a single Worker on top of each machine, and we execute the appropriate jar file for the Machine Learning Algorithm (currently NaiveBayes or LinearRegression) we need to compute.

  • Synchronous Execution

Once the model is computated, the jSON file is constructed and sent to the main Portal. Thereinafter, we terminate the Spark cluster operation on the server and we proceed with the computation of the ML model in the second machine.

Implementation

slide-9
SLIDE 9

9

  • One of the main contributions of the Framework is that we can configure separately on

each server the computation of an ML Algorithm using different training and testing datasets and experiment with the algorithmic specific parameters so that we can

  • ptimize the requested results without tranfering any data between the servers.
  • Furthermore, this implementation gives us the flexibility to combine same or even

different Machine Learning models that can be produced from dissimilar datasets and domains in order to construct a unified model which can in turn lead us to a more generic ML model with almost the same accuracy as the initial models.

Significance

slide-10
SLIDE 10

10

Real Execution

Server 2 - NaiveBayes parameters: Type: Bernoulli – dataset was 0s,1s Additive Smoothing: 0.01 Server 3 - LinearRegression parameters: Number of Iterations: 3 Step Size of Gradient Descent: 3.0 We call the following script from the Portal (Server 1) for NaiveBayes and LinearRegression computation and we receive the subsequent jSON files

./ml_cluster_exec.sh --server instance-trans2.cloudapp.net --algorithm NaiveBayes

  • -training-file /u01/health_data/2servers_data/SPECT.train.part1.csv
  • -testing-file /u01/health_data/SPECT.test.csv
  • -parameters type=bernoulli smoothing=0.01
  • -server instance-trans3.cloudapp.net --algorithm LinearRegression
  • -training-file /u01/health_data/2servers_data/parkinsons_updrs.data.part1.csv
  • -testing-file /u01/health_data/2servers_data/parkinsons_updrs.data.test.csv
  • -parameters iterations=3 stepsize=3
slide-11
SLIDE 11

11

Development

  • Distribute requests and retrieve results asynchronously
  • Extend Health Cloud Framework to support all the spectrum of the

Spark MLlib Algorithms Research Oriented

  • Based on current experimental features continue exploring novel

ML models by combining information derived from intermediate

  • nes

Future Work