adventures in hpc and r going parallel
play

Adventures in HPC and R: Going Parallel What is Parallel Computing? - PowerPoint PPT Presentation

What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks Outline Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &


  1. What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks Outline Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington & Matias Salibian-Barrera Implementation Examples U NIVERSITY OF B RITISH C OLUMBIA Closing Remarks The R User Conference 2006 What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? What is Parallel Computing? • From Wikipedia: • From Wikipedia: “Parallel computing is the simultaneous execution of the “Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple same task (split up and specially adapted) on multiple processors in order to obtain faster results.” processors in order to obtain faster results.” • Two specific situations: • Two specific situations: • A multiprocessor machine • A multiprocessor machine • A cluster of (homogeneous or heterogeneous) computers. • A cluster of (homogeneous or heterogeneous) computers. • R is inherently concurrent, even on a multiprocessor machine. • R is inherently concurrent, even on a multiprocessor machine. • S-Plus does have one function for multiprocessor machines. • S-Plus does have one function for multiprocessor machines. Goal for todays talk: Goal for todays talk: To demonstrate the potential of incorporating parallel processing in To demonstrate the potential of incorporating parallel processing in tasks for which it is appropriate. tasks for which it is appropriate.

  2. What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? What is Parallel Computing? Example - Multiprocessor Machine Example - Heterogeneous Cluster of Machines Features: Features: • Each process may not have • Each process has the same same home directory. home directory. • Architecture might be • Architecture is identical. different. • R has the same libraries in • R may not have the same the same locations. libraries in the same locations. • Data is passed through • Data is passed through the resident memory. network. What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks Implementation Implementation PVM & MPI • There are two common libraries: • Tasks have to be appropriate. • PVM: Parallel Virtual Machine • MPI: Message Passing Interface • Concurrent, not sequential. • It is possible sometimes to take a process inherently sequential, • Both are available through open-source for different and approximate with a concurrent process e.g. simulated architectures. annealing. • Which to use? From Geist, Kohl & Papadopoulos (1996): • In order to do parallel computation, two things are required: • MPI is expected to be faster within a large multiprocessor. • An interface on the O/S that can receive and distribute tasks; and • PVM is better when applications will be run over heterogeneous • A means of communicating with that program from within R. networks. • One of these programs need to be running on the host computer before R can send them tasks.

  3. What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks Implementation Implementation R Commands in snow Administrative Routines create a new cluster of nodes makeCluster shut down a cluster stopCluster initialize random number streams • In R there are three relevant packages: clusterSetupSPRNG • Rmpi - the interface to MPI; High Level Routines • rpvm - the interface to PVM; parallel lapply parLapply • snow - a “meta-package” with standardized functions. parallel sapply parSapply • snow is an excellent introduction to parallel computation, and parallel apply parApply appropriate for “embarrassingly parallel” problems. Basic Routines • All of these packages are available from CRAN. export variables to nodes clusterExport • In a environment where the home directories are not the same, call function to each node clusterCall the required libraries have to be available on each host. apply function to arguments on nodes clusterApply load balanced clusterApply clusterApplyLB evaluate expression on nodes clusterEvalQ split vector into pieces for nodes clusterSplit What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks Implementation Implementation Commands in snow Commands in snow Administrative Routines Administrative Routines create a new cluster of nodes create a new cluster of nodes makeCluster makeCluster shut down a cluster shut down a cluster stopCluster stopCluster initialize random number streams initialize random number streams clusterSetupSPRNG clusterSetupSPRNG High Level Routines High Level Routines parallel lapply parallel lapply parLapply parLapply parallel sapply parallel sapply parSapply parSapply parallel apply parallel apply parApply parApply Basic Routines Basic Routines export variables to nodes export variables to nodes clusterExport clusterExport call function to each node call function to each node clusterCall clusterCall apply function to arguments on nodes apply function to arguments on nodes clusterApply clusterApply load balanced clusterApply load balanced clusterApply clusterApplyLB clusterApplyLB evaluate expression on nodes evaluate expression on nodes clusterEvalQ clusterEvalQ split vector into pieces for nodes split vector into pieces for nodes clusterSplit clusterSplit

  4. What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks Implementation Example Commands in snow Bootstrapping MM-regression estimators Administrative Routines create a new cluster of nodes makeCluster • The function roblm (from the library of the same name) shut down a cluster stopCluster calculates the MM-regression estimators. initialize random number streams clusterSetupSPRNG • Is also available in the library robustbase (see talk by Martin High Level Routines Mächler and Andreas Ruckstuhl). parallel lapply parLapply • Can use bootstrapping to calculate the empirical density of ˆ β . parallel sapply parSapply parallel apply parApply library(roblm) Basic Routines X <- data.frame(y=rnorm(500), export variables to nodes clusterExport x=matrix(rnorm(500*20), 500, 20)) call function to each node clusterCall samples <- list() apply function to arguments on nodes clusterApply for (i in 1:200) load balanced clusterApply clusterApplyLB samples[[i]] <- X[sample(1:500, replace=TRUE),] evaluate expression on nodes clusterEvalQ rdctrl <- roblm.control(compute.rd=FALSE) split vector into pieces for nodes clusterSplit What is Parallel Computing? Implementation Examples Closing Remarks What is Parallel Computing? Implementation Examples Closing Remarks Example Example Bootstrapping MM-regression estimators Bootstrapping MM-regression estimators Non-parallel - Takes 196.53 seconds • The function roblm (from the library of the same name) calculates the MM-regression estimators. lapply(samples, • Is also available in the library robustbase (see talk by Martin function(x,z) roblm(y~., data=x, control=z), z=rdctrl) Mächler and Andreas Ruckstuhl). • Can use bootstrapping to calculate the empirical density of ˆ β . Parallel - 4 CPUS - Takes 54.52 seconds library(roblm) X <- data.frame(y=rnorm(500), cl <- makeCluster(4) x=matrix(rnorm(500*20), 500, 20)) clusterEvalQ(cl, library(roblm)) samples <- list() clusterApplyLB(cl, samples, for (i in 1:200) function(x, z) samples[[i]] <- X[sample(1:500, replace=TRUE),] roblm(y~., data=x, control=z), z=rdctrl) rdctrl <- roblm.control(compute.rd=FALSE) stopCluster(cl)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend