Easy Deployment for Jungle Computing Niels Drost Computer Systems - - PowerPoint PPT Presentation

easy deployment for jungle computing
SMART_READER_LITE
LIVE PREVIEW

Easy Deployment for Jungle Computing Niels Drost Computer Systems - - PowerPoint PPT Presentation

Easy Deployment for Jungle Computing Niels Drost Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands Requirements Resource independence Transparent / easy deployment Middleware


slide-1
SLIDE 1

Easy Deployment for Jungle Computing

Niels Drost

Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands

slide-2
SLIDE 2

Requirements

  • Resource independence
  • Transparent / easy deployment
  • Middleware independence & interoperability
  • Jungle-aware middleware
  • Jungle-aware communication
  • Robust connectivity
  • Globally unique naming
  • System-support for malleability and fault-tolerance
  • Transparent parallelism & application-level fault-tolerance
  • Easy integration with external software
  • MPI, OpenCL, CUDA, C, C++, scripts, …

ComplexHPC Spring School 2011 2

slide-3
SLIDE 3

Requirements

  • Resource independence
  • Transparent / easy deployment
  • Middleware independence & interoperability
  • Jungle-aware middleware
  • Jungle-aware communication
  • Robust connectivity
  • Globally unique naming
  • System-support for malleability and fault-tolerance
  • Transparent parallelism & application-level fault-tolerance
  • Easy integration with external software
  • MPI, OpenCL, CUDA, C, C++, scripts, …

ComplexHPC Spring School 2011 3

slide-4
SLIDE 4

Deployment

  • How to get your application running in the Jungle
  • For each resource used:
  • Find resource
  • Reserve resource
  • Copy input files (and possibly application itself)
  • Configure/Compile application
  • Run application
  • Copy back output files

ComplexHPC Spring School 2011 4

slide-5
SLIDE 5

Middleware

  • Resources invariable use some sort of

Middleware

  • Provide remote access to resources
  • File copy, running applications, etc
  • Many different middleware available:
  • Globus (de facto standard, in 4 Flavors)
  • gLite, NAREGI, UNICORE, Legion
  • SSH (poor man’s middleware)

ComplexHPC Spring School 2011 5

slide-6
SLIDE 6

Problems (1): Too little Middleware

  • All resources need to have some middleware
  • Hard to install
  • Hard to maintain
  • Low Fault-Tolerance
  • Assume very static setup

A full fledged middleware on a resource may require an almost full-time maintainer

ComplexHPC Spring School 2011 6

slide-7
SLIDE 7

Problems (2): Too much Middleware

  • Jungle computing applications use multiple

different resources

  • With different middleware
  • With wildly different interfaces
  • Which are too low level

Using multiple different resources at the same time is neigh impossible using middleware directly

ComplexHPC Spring School 2011 7

slide-8
SLIDE 8

Problems (3): Too much everything

  • Large number of steps required to deploy an

application

  • Middleware level interface too low level for users
  • Deploying an application requires the user to

write another application!

  • Users want to simply “press a button” to deploy

Deployment is not very user friendly

ComplexHPC Spring School 2011 8

slide-9
SLIDE 9

Ibis Software Stack

ComplexHPC Spring School 2011 9

3 2 1

slide-10
SLIDE 10

Zorilla: A P2P Middleware

ComplexHPC Spring School 2011 10

slide-11
SLIDE 11

Current middleware

  • Hard to install and maintain
  • Centralized implementation (not very fault
  • tolerant)
  • Usually no global functionality
  • No global file system
  • No co-allocation (though Koala could also fix this)
  • Not even possible unless exactly the same

middleware everywhere

ComplexHPC Spring School 2011 11

slide-12
SLIDE 12

Zorilla

  • Alternative middleware developed at the VU
  • Based on Peer-to-Peer (P2P) technology
  • Little to no configuration 
  • Highly fault-tolerant 
  • Trust issues 
  • Hardly any requirements (JVM)
  • Easy to install, little to no maintenance
  • Explicitly supports Jungle computing applications
  • Plays nice with existing middleware
  • Prototype

ComplexHPC Spring School 2011 12

slide-13
SLIDE 13

Life of a Job (1)

ComplexHPC Spring School 2011 13

slide-14
SLIDE 14

Life of a Job (2)

ComplexHPC Spring School 2011 14

slide-15
SLIDE 15

Life of a Job (3)

ComplexHPC Spring School 2011 15

slide-16
SLIDE 16

Life of a Job (4)

ComplexHPC Spring School 2011 16

slide-17
SLIDE 17

Zorilla Overview

ComplexHPC Spring School 2011 17

Clouds

slide-18
SLIDE 18

Zorilla Components (1)

  • Bootstrap
  • Initial set of contact points
  • UDP broadcast or provided by user
  • Gossip overlay network
  • Actualized Robust Random Gossip (ARRG)
  • Withstands Firewalls et al.
  • Clustering
  • Nearest neighbor list

ComplexHPC Spring School 2011 18

slide-19
SLIDE 19

Zorilla Components (2)

  • Flood scheduling
  • Incrementally search for resources at more and more

distant nodes

  • Job Management
  • Status (scheduling, running, done, etc)
  • File transfers
  • Malleability / crashes

ComplexHPC Spring School 2011 19

slide-20
SLIDE 20

Resource Discovery: ARRG

ComplexHPC Spring School 2011 20

slide-21
SLIDE 21

Resource Discovery: Clustering

ComplexHPC Spring School 2011 21

slide-22
SLIDE 22

Resource Discovery: Flood scheduling

ComplexHPC Spring School 2011 22

slide-23
SLIDE 23

Conclusions

  • Current Middleware are hard to install and

maintain.

  • …and do not offer the global functionality

required by Jungle Computing applications

  • Zorilla is a light-weight P2P alternative, offering

zero maintenance, easy install, and explicit support for parallel applications.

ComplexHPC Spring School 2011 23

slide-24
SLIDE 24

JavaGAT: Middleware independent API

ComplexHPC Spring School 2011 24

slide-25
SLIDE 25

Requirements

  • Resource independence
  • Transparent / easy deployment
  • Middleware independence & interoperability
  • Jungle-aware middleware
  • Jungle-aware communication
  • Robust connectivity
  • Globally unique naming
  • System-support for malleability and fault-tolerance
  • Transparent parallelism & application-level fault-tolerance
  • Easy integration with external software
  • MPI, OpenCL, CUDA, C, C++, scripts, …

ComplexHPC Spring School 2011 25

slide-26
SLIDE 26

Requirements

  • Resource independence
  • Transparent / easy deployment
  • Middleware independence & interoperability
  • Jungle-aware middleware
  • Jungle-aware communication
  • Robust connectivity
  • Globally unique naming
  • System-support for malleability and fault-tolerance
  • Transparent parallelism & application-level fault-tolerance
  • Easy integration with external software
  • MPI, OpenCL, CUDA, C, C++, scripts, …

ComplexHPC Spring School 2011 26

slide-27
SLIDE 27

Typical Grid/Cloud Application

Application

File.copy(...) submitJob(...)

slide-28
SLIDE 28

Typical Grid/Cloud Application

Application

File.copy(...) submitJob(...)

cp ftp gridftp scp http fork pbs condor unicore globus

slide-29
SLIDE 29

Typical Grid/Cloud Application

Application

File.copy(...) submitJob(...)

cp ftp gridftp scp http fork pbs condor unicore globus

? ?

slide-30
SLIDE 30

Which Middleware do I use?

  • A lot to choose from
  • Some may not work on all sites
  • Most are hard to use
  • Interfaces change often
  • Globus? (Obvious choice 3 years ago)

ComplexHPC Spring School 2011 30

slide-31
SLIDE 31

Globus File Copy (C++)

package org.globus.ogsa.gui; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.net.URL; import java.util.Date; import java.util.Vector; import javax.xml.rpc.Stub; import org.apache.axis.message.MessageElement; import org.apache.axis.utils.XMLUtils; import org.globus.* import org.gridforum.ogsi.* import org.gridforum.ogsi.holders.TerminationTimeTypeHolder; import org.w3c.dom.Document; import org.w3c.dom.Element; public class RFTClient { public static void copy (String source_url, String target_url) { try { File requestFile = new File (source_url); BufferedReader reader = null; try { reader = new BufferedReader (new FileReader (requestFile)); } catch (java.io.FileNotFoundException fnfe) { } Vector requestData = new Vector (); requestData.add (target_url); TransferType[] transfers1 = new TransferType[transferCount]; RFTOptionsType multirftOptions = new RFTOptionsType (); multirftOptions.setBinary (Boolean.valueOf ( (String)requestData.elementAt (0)).booleanValue ()); multirftOptions.setBlockSize (Integer.valueOf ( (String)requestData.elementAt (1)).intValue ()); multirftOptions.setTcpBufferSize (Integer.valueOf ( (String)requestData.elementAt (2)).intValue ()); multirftOptions.setNotpt (Boolean.valueOf ( (String)requestData.elementAt (3)).booleanValue ()); multirftOptions.setParallelStreams (Integer.valueOf ( (String)requestData.elementAt (4)).intValue ()); multirftOptions.setDcau(Boolean.valueOf( (String)requestData.elementAt (5)).booleanValue ()); int i = 7; for (int j = 0; j < transfers1.length; j++) { transfers1[j] = new TransferType (); transfers1[j].setTransferId (j); transfers1[j].setSourceUrl ((String)requestData.elementAt (i++)); transfers1[j].setDestinationUrl ((String)requestData.elementAt (i++)); transfers1[j].setRftOptions (multirftOptions); } TransferRequestType transferRequest = new TransferRequestType (); transferRequest.setTransferArray (transfers1); int concurrency = Integer.valueOf ((String)requestData.elementAt(6)).intValue(); if (concurrency > transfers1.length) { System.out.println ("Concurrency should be less than the number" "of transfers in the request"); System.exit (0); } transferRequest.setConcurrency (concurrency); TransferRequestElement requestElement = new TransferRequestElement (); requestElement.setTransferRequest (transferRequest); ExtensibilityType extension = new ExtensibilityType (); extension = AnyHelper.getExtensibility (requestElement); OGSIServiceGridLocator factoryService = new OGSIServiceGridLocator (); Factory factory = factoryService.getFactoryPort (new URL (source_url)); GridServiceFactory gridFactory = new GridServiceFactory (factory); LocatorType locator = gridFactory.createService (extension); System.out.println ("Created an instance of Multi-RFT"); MultiFileRFTDefinitionServiceGridLocator loc = new MultiFileRFTDefinitionServiceGridLocator(); RFTPortType rftPort = loc.getMultiFileRFTDefinitionPort (locator); ((Stub)rftPort)._setProperty (Constants.AUTHORIZATION, NoAuthorization.getInstance()); ((Stub)rftPort)._setProperty (GSIConstants.GSI_MODE, GSIConstants.GSI_MODE_FULL_DELEG); ((Stub)rftPort)._setProperty (Constants.GSI_SEC_CONV, Constants.SIGNATURE); ((Stub)rftPort)._setProperty (Constants.GRIM_POLICY_HANDLER, new IgnoreProxyPolicyHandler ()); int requestid = rftPort.start (); System.out.println ("Request id: " + requestid); } catch (Exception e) { System.err.println (MessageUtils.toString (e)); } }
slide-32
SLIDE 32

JavaGAT

  • Java Grid Application Toolkit
  • Layer between the application and the

Middleware

  • Simple API
  • File Copy, Job submission, Monitoring
  • Functionality provided by Adaptors
  • All major (and most minor) middleware supported
  • Assumes middleware is buggy, can fail at all

time, is not configured properly, etc

ComplexHPC Spring School 2011 32

slide-33
SLIDE 33

JavaGAT

Application

File.copy(...) submitJob(...)

GAT Engine Remote Files Monitoring Info service Resource Management GridLab Globus Unicore SSH P2P Local

JavaGAT

gridftp globus

slide-34
SLIDE 34

JavaGAT Engine

Application

File.copy(...) submitJob(...)

Files Monitoring Info service

Resource Management

slide-35
SLIDE 35

JavaGAT Engine

Application

File.copy(...) submitJob(...)

Files Monitoring Info service

Resource Management GridLab

Globus

GLite SSH

Zorilla Local

slide-36
SLIDE 36

JavaGAT Engine

Application

File.copy(...) submitJob(...)

GAT Engine

Files Monitoring Info service

Resource Management GridLab

Globus

GLite SSH

Zorilla Local

slide-37
SLIDE 37

JavaGAT Engine

Application

File.copy(...) submitJob(...)

GAT Engine

Files Monitoring Info service

Resource Management GridLab

Globus

GLite SSH

Zorilla Local

slide-38
SLIDE 38

JavaGAT Engine

Application

File.copy(...) submitJob(...)

GAT Engine

Files Monitoring Info service

Resource Management GridLab

Globus

GLite SSH

Zorilla Local

slide-39
SLIDE 39

JavaGAT Engine

Application

File.copy(...) submitJob(...)

GAT Engine

Files Monitoring Info service

Resource Management GridLab

Globus

GLite SSH

Zorilla Local

slide-40
SLIDE 40

JavaGAT Engine

Application

File.copy(...) submitJob(...)

GAT Engine

Files Monitoring Info service

Resource Management GridLab

Globus

GLite SSH

Zorilla Local

slide-41
SLIDE 41

JavaGAT Engine

Application

File.copy(...) submitJob(...)

GAT Engine

Files Monitoring Info service

Resource Management GridLab

Globus

GLite SSH

Zorilla Local

slide-42
SLIDE 42

Globus File Copy (C++) (Revisited)

package org.globus.ogsa.gui; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.net.URL; import java.util.Date; import java.util.Vector; import javax.xml.rpc.Stub; import org.apache.axis.message.MessageElement; import org.apache.axis.utils.XMLUtils; import org.globus.* import org.gridforum.ogsi.* import org.gridforum.ogsi.holders.TerminationTimeTypeHolder; import org.w3c.dom.Document; import org.w3c.dom.Element; public class RFTClient { public static void copy (String source_url, String target_url) { try { File requestFile = new File (source_url); BufferedReader reader = null; try { reader = new BufferedReader (new FileReader (requestFile)); } catch (java.io.FileNotFoundException fnfe) { } Vector requestData = new Vector (); requestData.add (target_url); TransferType[] transfers1 = new TransferType[transferCount]; RFTOptionsType multirftOptions = new RFTOptionsType (); multirftOptions.setBinary (Boolean.valueOf ( (String)requestData.elementAt (0)).booleanValue ()); multirftOptions.setBlockSize (Integer.valueOf ( (String)requestData.elementAt (1)).intValue ()); multirftOptions.setTcpBufferSize (Integer.valueOf ( (String)requestData.elementAt (2)).intValue ()); multirftOptions.setNotpt (Boolean.valueOf ( (String)requestData.elementAt (3)).booleanValue ()); multirftOptions.setParallelStreams (Integer.valueOf ( (String)requestData.elementAt (4)).intValue ()); multirftOptions.setDcau(Boolean.valueOf( (String)requestData.elementAt (5)).booleanValue ()); int i = 7; for (int j = 0; j < transfers1.length; j++) { transfers1[j] = new TransferType (); transfers1[j].setTransferId (j); transfers1[j].setSourceUrl ((String)requestData.elementAt (i++)); transfers1[j].setDestinationUrl ((String)requestData.elementAt (i++)); transfers1[j].setRftOptions (multirftOptions); } TransferRequestType transferRequest = new TransferRequestType (); transferRequest.setTransferArray (transfers1); int concurrency = Integer.valueOf ((String)requestData.elementAt(6)).intValue(); if (concurrency > transfers1.length) { System.out.println ("Concurrency should be less than the number" "of transfers in the request"); System.exit (0); } transferRequest.setConcurrency (concurrency); TransferRequestElement requestElement = new TransferRequestElement (); requestElement.setTransferRequest (transferRequest); ExtensibilityType extension = new ExtensibilityType (); extension = AnyHelper.getExtensibility (requestElement); OGSIServiceGridLocator factoryService = new OGSIServiceGridLocator (); Factory factory = factoryService.getFactoryPort (new URL (source_url)); GridServiceFactory gridFactory = new GridServiceFactory (factory); LocatorType locator = gridFactory.createService (extension); System.out.println ("Created an instance of Multi-RFT"); MultiFileRFTDefinitionServiceGridLocator loc = new MultiFileRFTDefinitionServiceGridLocator(); RFTPortType rftPort = loc.getMultiFileRFTDefinitionPort (locator); ((Stub)rftPort)._setProperty (Constants.AUTHORIZATION, NoAuthorization.getInstance()); ((Stub)rftPort)._setProperty (GSIConstants.GSI_MODE, GSIConstants.GSI_MODE_FULL_DELEG); ((Stub)rftPort)._setProperty (Constants.GSI_SEC_CONV, Constants.SIGNATURE); ((Stub)rftPort)._setProperty (Constants.GRIM_POLICY_HANDLER, new IgnoreProxyPolicyHandler ()); int requestid = rftPort.start (); System.out.println ("Request id: " + requestid); } catch (Exception e) { System.err.println (MessageUtils.toString (e)); } }
slide-43
SLIDE 43

File Copy with JavaGAT

import org.gridlab.gat.*; import org.gridlab.gat.io.File; public class RemoteCopy { public static void main(String[] args) throws Exception { GATContext context = new GATContext(); File file = GAT.createFile(context, new URI(args[0])); file.copy(new URI(args[1])); } }
slide-44
SLIDE 44

File Copy with JavaGAT

import org.gridlab.gat.*;

import org.gridlab.gat.io.File; public class RemoteCopy { public static void main(String[] args) throws Exception { File file = GAT.createFile(new URI(args[0])); file.copy(new URI(args[1])); GAT.end(); } }

  • Note: This is an actual program, not pseudo code!
slide-45
SLIDE 45

JavaGAT users

  • Download is anonymous, so we don't know
  • Max Planck Institute for Astrophysics in Garching
  • D-Grid
  • Astrogrid
  • Louisiane State University
  • University of Texas
  • AMOLF, Institute for Atomic and Molecular Physics
  • The Dutch Virtual Labs for E-science project (Vl-e)
  • The workflow system Triana (University of Cardiff)
  • Georgia State University
  • Vrije Universiteit Amsterdam (Ibis, teaching)
  • The Multimedian project
  • Zuse Institute Berlin, Germany
  • VU Medical Center Amsterdam
slide-46
SLIDE 46

Conclusion

  • The JavaGAT offers a Simple yet powerful

interface to a lot of different middleware

  • JavaGAT compensates for the complexity and

faultiness of current middleware

  • JavaGAT can run any application, not just Java
  • (Java)GAT was one of the main inspirations for

SAGA

  • more on that tomorrow
slide-47
SLIDE 47

Ibis-Deploy: Deploy Everywhere

ComplexHPC Spring School 2011 47

slide-48
SLIDE 48

Requirements

  • Resource independence
  • Transparent / easy deployment
  • Middleware independence & interoperability
  • Jungle-aware middleware
  • Jungle-aware communication
  • Robust connectivity
  • Globally unique naming
  • System-support for malleability and fault-tolerance
  • Transparent parallelism & application-level fault-tolerance
  • Easy integration with external software
  • MPI, OpenCL, CUDA, C, C++, scripts, …

ComplexHPC Spring School 2011 48

slide-49
SLIDE 49

Requirements

  • Resource independence
  • Transparent / easy deployment
  • Middleware independence & interoperability
  • Jungle-aware middleware
  • Jungle-aware communication
  • Robust connectivity
  • Globally unique naming
  • System-support for malleability and fault-tolerance
  • Transparent parallelism & application-level fault-tolerance
  • Easy integration with external software
  • MPI, OpenCL, CUDA, C, C++, scripts, …

ComplexHPC Spring School 2011 49

slide-50
SLIDE 50

Deployment

(Yes, another duplicate slide from earlier)

  • How to get your application running in the Jungle
  • For each resource used:
  • Find resource
  • Reserve resource
  • Copy input files (and possibly application itself)
  • Configure/Compile application
  • Run application
  • Copy back output files

ComplexHPC Spring School 2011 50

slide-51
SLIDE 51

We can do better

  • All these steps is not what the user wants
  • Deploy in a “single step”
  • Idea: Make some assumptions
  • Application uses the IPL, and SmartSockets
  • Application written in Java (no need to compile)
  • All files initially on local machine

ComplexHPC Spring School 2011 51

slide-52
SLIDE 52

Ibis-Deploy

  • Library for deploying IPL applications
  • Simplify, Simplify, Simplify!
  • Also deploys SmartSockets hubs and IPL registry
  • Uses Java property files for configuration
  • Simplest configuration files yet, I promise!
  • Comes with an (optional) GUI

ComplexHPC Spring School 2011 52

slide-53
SLIDE 53

ComplexHPC Spring School 2011 53

slide-54
SLIDE 54

Property files

  • Grid file: description of al resources (clusters)
  • Hostname of frontend, JavaGAT adaptor used,

Geolocation, etc

  • Application file: description of applications
  • Main class, jars needed, arguments, etc
  • Experiment file: description of experiment
  • “run application A on 32 nodes of cluster B”
  • Alternative: specify in GUI
  • Workspace: directory containing all 3 files

ComplexHPC Spring School 2011 54

slide-55
SLIDE 55

Putting it all together

  • Small experimental desktop grid setup
  • Student PCs running Zorilla overnight
  • PCs with 1 CPU, 1GB memory, 1Gb/s Ethernet
  • Experiment: gene sequence application
  • 16 cores of DAS-3 with Globus
  • 16 core desktop grid with Zorilla
  • Combination, using Ibis-Deploy
slide-56
SLIDE 56

Desktop Grid Experiment

3574 sec 1099 sec 877 sec

slide-57
SLIDE 57

Slightly Bigger

ComplexHPC Spring School 2011 57

401 Cores, 94.4% Efficiency

slide-58
SLIDE 58

Conclusions (Overall)

  • Deployment is hard
  • Zorilla makes getting resources easier
  • JavaGAT offers a Simple middleware

independent API for Grid/Cloud/? Applications

  • Ibis-Deploy enables users to deploy applications

easily, and focus on their research.

  • HandsOn later this afternoon with Ibis-Deploy

ComplexHPC Spring School 2011 58