Grand Large Grand Large PCRI / INRIA Calcul Global, Desktop - - PDF document

grand large grand large pcri inria calcul global desktop
SMART_READER_LITE
LIVE PREVIEW

Grand Large Grand Large PCRI / INRIA Calcul Global, Desktop - - PDF document

Grand Large Grand Large PCRI / INRIA Calcul Global, Desktop Grids et XtremWeb Franck Cappello PCRI / INRIA Grand-Large LRI, Universit Paris sud. fci@lri.fr www.lri.fr/ ~ fci 23 June 2004 Ecole GridUse 1 Grand Large


slide-1
SLIDE 1

Ecole GridUse 1 23 June 2004

γλGrand Large

Calcul Global, Desktop Grids et XtremWeb

Franck Cappello PCRI / INRIA Grand-Large LRI, Université Paris sud. fci@lri.fr www.lri.fr/ ~ fci Grand Large

PCRI / INRIA

γλ

Ecole GridUse 2 23 June 2004

γλGrand Large

Outline

  • I ntroduction : Desktop Grid, foundations for a

Large Scale Operating System.

  • Some architecture issues
  • User interfaces (XtremWeb)
  • Fault tolerance (XtremWeb)
  • Security (Trust)
  • Final Remarks (what we have learned so far)
slide-2
SLIDE 2

Ecole GridUse 3 23 June 2004

γλGrand Large

Several types of GRID

2 kinds of distributed systems Computing « GRID » « Desktop GRID » or « Internet Computing » Peer-to-Peer systems Large scale distributed systems Large sites Computing centers, Clusters PC Windows, Linux

  • <100
  • Stables
  • Individual

credential

  • Confidence
  • ~100 000
  • Volatiles
  • No

authentication

  • No

confidence

Node Features:

Ecole GridUse 4 23 June 2004

γλGrand Large

DGRid for Large Scale Distributed Computing

  • Principle

– Millions of PCs – Cycle stealing

  • Examples

– SETI@HOME

  • Research for Extra

Terrestrial I

  • 33.79 Teraflop/ s (12.3

Teraflop/ s for the ASCI White!)

– DECRYPTHON

  • Protein Sequence

comparison

– RSA-155

  • Breaking encryption

keys

slide-3
SLIDE 3

Ecole GridUse 5 23 June 2004

γλGrand Large

DGrid for Large Scale P2P File Sharing

  • Direct file transfer after

index consultation

– Client and Server issue direct connections – Consulting the index gives the client the @ of the server

  • File storage

– All servers store entire files – For fairness Client work as server too.

  • Data sharing

– Non mutable Data – Several copies no consistency check

  • Interest of the approach

– Proven to scale up to million users – Resilience of file access

  • Drawback of the approach

– Centralized index – Privacy violated

user B (Client + Server) index file-@IP Association user A (Client + Server) Ecole GridUse 6 23 June 2004

γλGrand Large

DGrid for Large Scale Data Storage/ Access

Providing Global- Scale Persistent Data Distributed Data Integration

Projet Us

Ubiquitous storage

  • Principle

– Millions of PCs – “Disk space” stealing

  • Storing and accessing

stored files on participant nodes. Files are stored as segments. Segments are replicated for availability.

  • Collecting and integrating

data coming from numerous devices

Freenet Intermemory

slide-4
SLIDE 4

Ecole GridUse 7 23 June 2004

γλGrand Large

DGrid for Networking

A set of PCs on the Internet (Lab networks) Coordinated for Networking experiments A set of PCs on the Internet (ADSL) coordinated for measuring the communication performance

NETI@home

Collects network performance statistics from end-systems

Ecole GridUse 8 23 June 2004

γλGrand Large

PC PC Parameters Client application

  • Params. / results.

Network PC Coordinator/ Resource Disc.

Computational/ Networking DGrids

  • Computational Applications

– SETI@Home, distributed.net, – Décrypthon (France) – Folding@home, Genome@home, – ClimatePrediction, etc.

  • Networking applications

– Planet Lab (protocol design, etc.) – “La grenouille” (DSL perf eval.) – Porivo (Web server perf testing)

  • Research Project

– Javelin, Bayanihan, JET, – Charlotte (based on Java), – Condor, Xtrem W eb, P3 , BOINC

  • Commercial Platforms

– Datasynapse, GridSystems – United Devices, Platform (AC) – Cosm

A central coordinator schedules tasks/ coordinate actions on a set of PCs,

For which application domains DGrids are used?

slide-5
SLIDE 5

Ecole GridUse 9 23 June 2004

γλGrand Large

  • Communication Applications

– Jabber, etc. (Instant Messaging) – Napster, Gnutella, Freenet, Kazaa, (Sharing Information) – Skipe, (Phone over IP)

  • Storage Applications

– OceanStore, US, etc. (distributed Storage) – Napster, Gnutella, Freenet, Kazaa, (Sharing Information)

  • Research Projects

– Globe (Tann.), Cx (Javalin), Farsite, – Pastry, Tapestry/ Plaxton, CAN, Chord, – Xtrem W eb

  • Other projects

– Cosm , WebOS, Wos, peer2peer.org, – JXTA (sun), PtPTL (intel),

Service Provider Service provider Resource Discovery/ Coordinator Network Client

req.

Communication/ Storage DGrids (P2P)

A resource discovery/ lookup engine establishes:

  • relation between a client

and server(s)

  • a communication between

2 participants

For which application domains DGrids are used?

Ecole GridUse 10 23 June 2004

γλGrand Large

Historical perspective

I - W AY Globus, Ninf, Legion, Netsolve

GRI D CLUSTER CALCUL GLOBAL P2 P

Cycle stealing Condor Distributed. net SETI @Hom e I nternet Com p. Javelin,CX, Atlas, charlotte Distributed System s DNS, m ail NOW Beow ulf Napster, Gnutella, Freenet Meta- Com puting

XtremWeb research 1999

Tim e DataGrid, GGF

Today

OGSA COSM DHT: Pastry, Tapestry, Can Chord

Xtrem W eb

Cluster of Clusters BOI NC

1995

P3 W SRF XW 2

slide-6
SLIDE 6

Ecole GridUse 11 23 June 2004

γλGrand Large

Outline

  • Introduction : Desktop Grid, foundation for a

Large Scale Operating System.

  • Som e architecture issues
  • User interfaces
  • Fault tolerance
  • Security (Trust)
  • Final Remarks (what we have learned so far)

Ecole GridUse 12 23 June 2004

γλGrand Large

Allows any node to play different roles (client, server, system infrastructure)

Request may be related to Computations or data Accept concerns computation or data

Client (PC)

request result

Server (PC)

accept provide

Coordination/ Match making/ Sheduling/ Fault Tolerance Server (PC)

accept provide Potential communications for parallel applications

Computational Desktop Grids

Client (PC)

request result

A very simple problem statement but leading to a lot of research issues: scheduling, security, fairness, race conditions, message passing, data storage Large Scale enlarges the problematic: volatility, confidence, etc.

slide-7
SLIDE 7

Ecole GridUse 13 23 June 2004

γλGrand Large Some facts about the heterogeneity:

Performance Performance Rank A study made by IMAG Performance of participant team according to their ranking Follow a Zipf law : Performance (rank) = C/ rank law of the 90% / 10% Up to 4 orders of magnitudes between the extremes Rank Credit O. Richard, G. Da Costa

Ecole GridUse 14 23 June 2004

γλGrand Large Some facts about the users:

Number of connected users Accumulated number of users

Date Class number A study made by IMAG User characteristics of French ADSL One week of the ADSL as seen by Lagrenouille Connection increases during daytime up to a maximum Users are behaving differently (if we consider a vector of hours (1 if connected,0 if not), the 4373 users of Jan 7 belong to 1710 different classes !!!) noon 2PM Sunday More 4373 users

Classification considering User connection vector + hamming distance of 1 Between user conn. Vect (508 classes)

Credit O. Richard, G. Da Costa

slide-8
SLIDE 8

Ecole GridUse 15 23 June 2004

γλGrand Large Some facts about dynamicity:

A study made by IMAG Number of users is quite stable across the week days In 1 hour up to ½ of the users may change User connection frequency to ADSL is quite disparate among the users

40! 34

Number of connected hours for a user

2.2 4.3

Number of connected days for a user

432 1402

Number of different users per hours

203 4175

Number of different user per day Standard Dev. Mean One week of the ADSL as seen by Lagrenouille Credit O. Richard, G. Da Costa

Ecole GridUse 16 23 June 2004

γλGrand Large

Architecture

Fundamental components:

Client Client

Agent Worker Engine Agent Worker Engine

interaction mode: PUSH or PULL (non permanent connection vserus connected). Pull job Push job XtremWeb Datasynapse Condor

Fundamental mechanisms and Scheduling modes

  • Res. Disc.

Coordinat.

  • Res. Disc.

Coordinat.

Centralized or distributed

Transport layer: Firewalls, NAT, Proxy :

  • XtremWeb, BOINC: Ad Hoc
  • P3 : Jxta)

PC PC Resource

Internet

Firewall Firewall Tunnel

Infrastructure

slide-9
SLIDE 9

Ecole GridUse 17 23 June 2004

γλGrand Large

Architecture

Data transfer mode: P2P, Data Server or Coordinator

Submit job get job get data Submit job get job get data Put Data Submit Job+ data get Job+ data

Datasynapse P2P systems BOINC XtremWeb What kind of resources can be harnessed? PC Dual-PC PC Cluster 1 agent X threads 1 agent 1 thread Agent compliant With 3thd party scheduler

Data transfers and Resource types

Ecole GridUse 18 23 June 2004

γλGrand Large Resource Discovery/ Coordination

Action Order

Resource discovery and orchestration

Resource Discovery: 1st Gen: centralized/ hierar. 2d Gen: fully dist. 3rd Gen: DHT

peer peer peer peer

GET file S e a r c h q u e r y Search query Search query Peer ID Peer ID 1 2 3 4 5 6 7 6 1 2

Start Interv Succ 2 [2,3) 3 3 [3,5) 3 5 [5,1) 0 Start Interv Succ 4 [4,5) 0 5 [5,7) 0 7 [7,3) 0

Action coordination: Centralized Fully dist.

Submit jobs

  • Res. Disc

Action Order Action Order Action Order

client client

slide-10
SLIDE 10

Ecole GridUse 19 23 June 2004

γλGrand Large

The Software Infrastructure The Software Infrastructure

  • f SETI@home II
  • f SETI@home II

David P. Anderson David P. Anderson

Space Sciences Laboratory Space Sciences Laboratory U.C. Berkeley U.C. Berkeley

Goals of a PRC platform

Research lab X University Y Public project Z

projects applications resource pool

  • Participants install one program, select projects, specify constraints;

all else is automatic

  • Projects are autonomous
  • Advantages of a shared platform:
  • Better instantaneous resource utilization
  • Better resource utilization over time
  • Faster/cheaper for projects, software is better
  • Easier for projects to get participants
  • Participants learn more

Distributed computing platforms

  • Academic and open-source

– Globus – Cosm – XtremWeb – Jxta

  • Commercial

– Entropia – United Devices – Parabon

Goals of BOINC

(Berkeley Open Infrastructure for Network Computing)

  • Public-resource computing/storage
  • Multi-project, multi-application

– Participants can apportion resources

  • Handle fairly diverse applications
  • Work with legacy apps
  • Support many participant platforms
  • Small, simple

Credit: David Anderson

Ecole GridUse 20 23 June 2004

γλGrand Large

Anatomy of a BOINC project

Credit: David Anderson

  • Project:
  • Participant:

Scheduling server ( C+ + )

BOI NC DB ( MySQL)

Project w ork m anager data server ( HTTP) App agent App agent App agent data server ( HTTP) data server ( HTTP) W eb interfaces ( PHP) Core agent ( C+ + )

  • Res. Disc +

Coordination

Worker/ Agent/ Engine

slide-11
SLIDE 11

Ecole GridUse 21 23 June 2004

γλGrand Large

Principle of Personal Power Plant (P3)

Client Worker/ Agent/ Engine

Ecole GridUse 22 23 June 2004

γλGrand Large

P3: no central component

  • Res. Disc Jxta

Coordination Client

slide-12
SLIDE 12

Ecole GridUse 23 23 June 2004

γλGrand Large

Coordinator

W orker PC

result request

Coordinator Client PC W orker PC Client PC

result job result job result request

For research and production Multi-applications Multi-users Multi-exec formats (Bin, Java) Multi-plates-forms (Linux, Windows, MacOS X) Secured (sandbox + certif.) Fault tolerant

  • Res. Disc.

Coordination

Ecole GridUse 24 23 June 2004

γλGrand Large

Client Client Coordinat. Coordinat.

Configure experiment

XW: Client

A API Java XWRPC (Virtual Rexec) Job submission result collection Monitoring/control Interfaces Java API command line (script)

Launch experiment Collect result

EP Launcher EP Result

ClientRegister(); createSession(); createGroup(); for (i=0;i<nbWork;i++) Ti = createWork(); submitTask(Ti); While (!finished) getGroupResult(); reduction(); closeSession();

Thread Thread

slide-13
SLIDE 13

Ecole GridUse 25 23 June 2004

γλGrand Large

Worker Worker Coordinat. Coordinat.

WorkRequest workResult hostRegister workAlive

XW: Worker

Protocol : firewall bypass RMI/XML RPC et SSL authentication

and encryption

Applications Binary (legacy codes CHP en Fortran ou C) Java (recent codes, object codes) OS Linux, SunOS, Mac OSX, Windows Auto-monitoring Trace collection

Ecole GridUse 26 23 June 2004

γλGrand Large Worker Architecture

Job Pool Producer/ Consumer Activity Monitor Trace logging Activity Trigger Execution launcher Sandbox Multi-processors Communications Fault tolerance Duplex comm. Toward Coordinator

Job Results Execute waiting Jobs Use/release the Computing resource

Toward coordinator

4 parallel threads

slide-14
SLIDE 14

Ecole GridUse 27 23 June 2004

γλGrand Large

XW: coordinator architecture

Worker Requests Client Requests Data base set Applications Tasks Results Statistics Task selector Priority manager Scheduler Communication Layer XML-RPC SSL TCP Tasks Result Collector Results Volatility Detect. Request Collector

Ecole GridUse 28 23 June 2004

γλGrand Large

XtremWeb Software Technologies

Installation prerequisites : database (Mysql), web server (apache), PHP, JAVA jdk1.2.

Database SQL PerlDBI Java JDBC Server Java Communication XML-RPC RMI SSL http server PHP3-4 Installation GNU autotool Worker Client Java

slide-15
SLIDE 15

Ecole GridUse 29 23 June 2004

γλGrand Large

Demo

Distributed Image rendering by raytracing with PovRay

PC PC Parameters Network PC Coordinator/ Resource Disc.

Submission web page Submits 100 jobs

Select parameters

Decompose Image rendering In 100 tasks 100 image blocs

Select parameters

Display image Compose Image from 100 blocs

Demo!

Ecole GridUse 30 23 June 2004

γλGrand Large

XtremWeb Application: Pierre Auger Observatory

Understanding the origin of very high cosmic rays:

  • Aires: Air Showers Extended Simulation

– Sequential, Monte Carlo. Time for a run: 5 to 10 hours (500MhzPC)

PC worker Aires PC worker air shower Server Internet and LAN PC Worker PC Client

Air shower parameter database (Lyon, France) XtremWeb

Estimated PC number ~ 5000

  • Trivial parallelism
  • Master Worker paradigm
slide-16
SLIDE 16

Ecole GridUse 31 23 June 2004

γλGrand Large

Deployment example

Internet

Icluster Grenoble PBS Madison Wisconsin Condor U-psud network LRI Condor Pool Autres Labos lri.fr XW Client XW Coordinator

Application : AIRES (Auger) Deployment:

  • Coordinator at LRI
  • Madison: 700 workers

Pentium III, Linux (500 MHz+ 933 MHz) (Condor pool)

  • Grenoble Icluster: 146 workers

(733 Mhz), PBS

  • LRI: 100 workers

Pentium III, Athlon, Linux (500MHz, 733MHz, 1.5 GHz) (Condor pool)

Ecole GridUse 32 23 June 2004

γλGrand Large

WISC-97 W L- 1 1 3 G-146 WLG-271 W LG- 4 5 1 time (hours) 4 hours 2 hours

Node utilization during the experiments

Performance evaluation

500

slide-17
SLIDE 17

Ecole GridUse 33 23 June 2004

γλGrand Large

WISC-97 WL-113 G-146 WLG-271 WLG-451 Task execution time (sorted in decreasing order) PIII-733Mhz -> 18 min. WISC-97/ PIII 551 WISC-97/ PIII 900 Time (minutes) 20 40

Task execution time

Performance evaluation

Ecole GridUse 34 23 June 2004

γλGrand Large

Result arrival time form the first one

Number of results

1024 Tasks executed correctly + 18 minutes to get the first result

1024 500 4h 2h WISC-97 WL-113 G-146 WLG-271 WLG-451

Between 1 to 4 hours to obtain the 1024 résults

Speed-up G-146 : 126.5

Performance evaluation

slide-18
SLIDE 18

Ecole GridUse 35 23 June 2004

γλGrand Large

WLG-300/ 150 faults WLG-270

Execution with massive fault (disconnection of 150 nodes)

Loss of the icluster Delay to get the first results (because of the massive fault) Node utilization Result arrival time (minutes)

Performance evaluation

Ecole GridUse 36 23 June 2004

γλGrand Large

Outline

  • Introduction : Desktop Grid, foundation for a

Large Scale Operating System.

  • Some architecture issues
  • User interfaces
  • Fault tolerance
  • Security (Trust)
  • Final Remarks (what we have learned so far)
slide-19
SLIDE 19

Ecole GridUse 37 23 June 2004

γλGrand Large

User Interfaces for DGrids

There is a demand for applications expressed as:

  • Bag of independent tasks
  • Workflow (imperative program execution)
  • Dataflow Graph (data triggering the execution)

Also use existing applications without modifications There is a demand for several user interfaces :

  • Batch scheduler like
  • Programming API

– RPC-like (non blocking), Master Worker, recursive – MPI MPICH-V,

Ecole GridUse 38 23 June 2004

γλGrand Large

application + command line + file archive + stdin result file archive + stdout+ stderr

XtremWeb execution Model

Working directory New files

PC Client PC Worker

Diffs

SSL

Cloning

SSL

Internet

Client job submission: No shared file systems! Sending messages to servers

slide-20
SLIDE 20

Ecole GridUse 39 23 June 2004

γλGrand Large

XtremWeb Batch mode

xw help xw form at [ htm l | csv | xm l ] ( specify output form at) xw apps ( list installed applications) xw addapp < appNam e> < cpuType> < osType> < appBinaryFile> ( add a new application) xw rm app < appNam e> ( rem ove an application from server) xw w orkers ( get the w orkers list) xw status [ jobUI D [ ...] ] ( retreive user jobs status) xw get | - - xw result [ -- xw noextract] [ - - xw rm zip] [ -- xw erase] [ - - xw override] [ jobUI D [ ...] ] ( retreive user jobs results) xw rem ove | - - xw delete | - - xw rm | -- xw del [ jobUI D [ ...] ] ( rem ove user jobs) xw subm it | - - xw job < appNam e> [ yourParam eters] [ < inputFile.txt> ] ( create a new job)

Ecole GridUse 40 23 June 2004

γλGrand Large

XtremWeb low level client API < < GridRPC

String subm itJob( MobileW ork job) boolean deleteJob( MobileW ork job) int jobStatus( MobileW ork job) MobileW ork getJob( MobileW ork job) MobileResult getJobResult( MobileW ork job) boolean deleteJobs( Vector jobs) Vector getAllJobs( ) Vector getAllResults( ) Vector getAllResults( Vector jobs)

slide-21
SLIDE 21

Ecole GridUse 41 23 June 2004

γλGrand Large

XtremWeb low level client API

Exam ple : a real life production application of Alcatel ( XXX) Tool helping to validate and evaluate com m utation netw orks. Com putes the signal lost and the bandw idth for netw ork configurations. Three stages application: 1 ) generates m any solutions from a netw ork configuration and a set of user constraints, 2 ) filters are applied on the solutions, 3 ) Finally a statistical analysis on the best solution

Ecole GridUse 42 23 June 2004

γλGrand Large

Program example

for( int i = 0 ; i < nbTasks; i+ + ) { / / Building the com m and line String cm dLine = " XXX"+ i+ ".in"; / / Building an archive containing needed input files Zipper zipper = new Zipper( ) ; String [ ] zipEntries = new String[ 4 ] ; zipEntries[ 0 ] = "test_ casper.don"; zipEntries[ 1 ] = "test_ casper.m ac"; zipEntries[ 2 ] = “XXX"+ i+ ".in"; zipEntries[ 3 ] = "born"+ i+ ".in"; zipper.setFileNam e( « XXX_ " + i+ "_ in.zip") ; zipper.zip( zipEntries) ; …

slide-22
SLIDE 22

Ecole GridUse 43 23 June 2004

γλGrand Large

Program example…

… / / create a m obile w ork containing job to subm it MobileW ork job = new MobileW ork( ) ; / / set the zip archive in the m obileW ork job.setDirin ( new File ( « XXX_ " + i+ "_ in.zip") ) ; / / set the Xtrem W eb server to contact job.setServer ( config.getCurrentServer ( ) ) ; / / set the application nam e job.setApplicationNam e ( "leabatch") ; / / set the com m and line job.setCm dLine ( cm dLine) ; / / subm it the job n° i in the created group com m .subm itJob ( job) ; } / / subm ition loop …

Ecole GridUse 44 23 June 2004

γλGrand Large

Program example…

… / / W ait the end of the job Vector jobs = com m .w aitForAllCom pleted( ) ; For( Enum eration e = jobs.elem ents( ) ; e.hasMoreElem ents( ) ;) { MobileW ork job = ( MobileW ork) e.nextElem ent( ) ; MobileResult result = com m .getResult( job) ; Zipper zip = com m .saveResult( result) ; zip.unzip( ) ; }

slide-23
SLIDE 23

Ecole GridUse 45 23 June 2004

γλGrand Large

Program Example… Execution on the LRI -LIFL testbed

LRI Client LRI Client

LIFL Workers LIFL Workers LIFL Coordinator LIFL Coordinator LRI Workers LRI Workers

X120 X160 X1 X1

Ecole GridUse 46 23 June 2004

γλGrand Large

Outline

  • Introduction : Desktop Grid, foundation for a

Large Scale Operating System.

  • Some architecture issues
  • Programming
  • Fault tolerance
  • Security (Trust)
  • Final Remarks (what we have learned so far)
slide-24
SLIDE 24

Ecole GridUse 47 23 June 2004

γλGrand Large

Goal: execute RPC on volatile nodes Programmer’s view unchanged: Objective summary: 1) Automatic fault tolerance 2) Transparent for the programmer & user 3) Tolerate Client and Server faults 4) Firewall bypass 5) Avoid global synchronizations (ckpt/ restart)

RPC-V (Volatile)

Problems: 1) volatile nodes (all nodes may crash) 2) firewalls (PC Grids) 3) Recursion (recursive RPC)

PC client RPC(Foo, params.) PC Server Foo(params.)

Ecole GridUse 48 23 June 2004

γλGrand Large

Fault tolerance and RPC operations in Internet connected Desktop Grids

What kind of operations can we expect? Multi-clients Statefull or Stateless RPC operations? Depends on the system class: synchronous (bound on communication time, trusty fault detectors)

  • r asynchronous (no bound on communication time, …)?

The feasibility of agreement depends on the system features To have multi-clients statefull operations, we need server replicas to agree on the same order between clients requests To ensure fault tolerance, we need a form of server replication Replication messages client1 client2 Server-r Server-r

slide-25
SLIDE 25

Ecole GridUse 49 23 June 2004

γλGrand Large

Synchronous or Asynchronous?

  • Volatility
  • Any component of the system may fail
  • Intermittent crashes (components may fail abruptly or restart from

checkpoint image)

  • Components are connected by the Internet (long distance, no trusty fault

detectors)

  • Connection less interactions (large number of workers)
  • High size variability

Network:

  • ~ 10 k nodes or larger,
  • Wide area network (best effort network)
  • Standard protocols (TCP/ IP)

Nodes:

  • Volatile, Byzantine, crash may be permanent

Ecole GridUse 50 23 June 2004

γλGrand Large

Synchronous or Asynchronous?

1) Intermittent crashes + connection less interaction unbound delay on message transmission 2) Volatility (crashes may be permanent) + no stable component + 1) Consensus impossibility 3) Unreliable failure detectors : 2 requirements should hold for a “sufficiently long” period relative to the application execution time Dynamicity is too high 4) Some FT techniques rely on majority Changes of system size and high dynamicity turns the majority notion into a fuzzy one 5) Agreements algorithm (all to all) even on synchronous networks are very slow According to the current knowledge, we should conservatively consider that Internet connected desktop Grid are Asynchronous!

slide-26
SLIDE 26

Ecole GridUse 51 23 June 2004

γλGrand Large

RPC-V Design

Asynchronous network (Internet + P2P volatility)? If yes restriction to stateless or single user statefull apps. If no muti-users statefull apps. (needs atomic broadcast)

XtremWeb infrastructure Client API Client Coordinator Server R. Server App. Application R.: RPC (XW-RPC) Client Worker

Message logging Passive Replication Message logging

Application Infrastruct. FT

R.

TCP/IP TCP/IP

(FT + scheduling)

Ecole GridUse 52 23 June 2004

γλGrand Large

RPC-V Architecture

3 tiers + Message logging + Passive replication For stateless only RPC operations

slide-27
SLIDE 27

Ecole GridUse 53 23 June 2004

γλGrand Large

RPC-V Implementation

XtremWeb virtualizes the RPC calls

Ecole GridUse 54 23 June 2004

γλGrand Large

RPC-V in Action

Client Client Coord. Coord.

Submit task

Worker1 Worker1

Get work Put result Sync/Retrieve result

Client2 Client2 Worker2 Worker2

Sync/Get work Put result Sync/Retrieve result

Coord. Coord.

Sync/Submit task Sync/Get work Sync/Put result S y n c / R e t r i e v e r e s u l t

  • Allow Client volatility (mobile clients)
  • Worker volatility (server crash or disconnection)
  • Coordinator crash or transient faults (warning: task may be

executed more than once)

slide-28
SLIDE 28

Ecole GridUse 55 23 June 2004

γλGrand Large

RPC-V Coordinator faults

1) Client submits to Lille 2) Lille crashes 3) LRI replicates 4) LRI as primary 5) LRI reaches Lille 6) Lille restarts 7) Lille replicates 8) LRI crashes 9) Lille as primary

Ecole GridUse 56 23 June 2004

γλGrand Large

RPC-V unconsistent view

Client doesn’t see Lille submits to LRI Workers don’t see LRI Get work from Lille Coordinator passive replication let jobs flow from client to workers

Submissions replication Results replication

slide-29
SLIDE 29

Ecole GridUse 57 23 June 2004

γλGrand Large

Outline

  • Introduction : Desktop Grid, foundation for a

Large Scale Operating System.

  • Some architecture issues
  • Fault tolerance
  • Programming
  • Security ( Trust)
  • Final Remarks (what we have learned so far)

Ecole GridUse 58 23 June 2004

γλGrand Large Security and trust models for DGrids (users, system administrators, others?)

DGrid are gathering 4 entities: 1) The user(s) who can submit jobs 2) The Participant who give resource (CPU cycles, files, mem., etc.) 3) The Infrastructure which connects users to participants 4) The applications run on the participating PCs

A: Authentication

User User

Application

Participant Participant

Infra- structure

slide-30
SLIDE 30

Ecole GridUse 59 23 June 2004

γλGrand Large

Security and trust models

Risk examples coming from the other 3 entities:

  • Applications reading/ writing the disc

Operating System corruption, participant data corruption/ spying

  • Applications using the participating PC network

Infrastructure attack, Participant PC attack

  • Participant reverse engineering the application codes or accessing

the application working directory Application data spying/ result corruption/ user application attack

  • Participant reverse engineering the DGrid Middleware

Infrastructure attack

  • Infrastructure connecting the User to aggressive PC

User data spying + Result corruption + Application abuse

  • Infrastructure running a wrong application

Participant PC hijacking

  • Users submitting aggressive/ malicious jobs from trusted applications

Infrastructure & Participating PCs attack (voluntarily or not)

  • Users submitting jobs to a non owned applications

Ecole GridUse 60 23 June 2004

γλGrand Large

Trust/ Authentication

First, distinguish Participating entities of a Dgrid from the external world All entities should authenticate the 3 others. reduces the problem to internal risks

(When a credential certificate is compromised, the attack is considered as internal)

Users Infrastructure Participant Application

Infrastructure A Infrastructure A User A Users A Users A Participant A Application A Infrastructure A Participant A Participant A Application A Application A

A: Authentication Authentication diagram

XtremWeb

slide-31
SLIDE 31

Ecole GridUse 61 23 June 2004

γλGrand Large

Risks exist even internally

Responsibility limits:

  • The system administrator can’t check every applications

with every parameters (impossible).

  • The user can’t guarantee that a third party application

will not be aggressive

  • The application programmer can’t predict all usage

scenarios of his applications

  • The resource owners can’t prevent his machine from

participating to a distributed attack Limits of Authentication

  • Strong user authentication can’t prevent malicious users

to launch aggressive applications

  • Strong participant authentication can’t prevent a participant

to spy/ corrupt results

  • Strong Infrastructure authentication can’t prevent the 2

previous problems

  • Nor Strong Application authentication.

Ecole GridUse 62 23 June 2004

γλGrand Large

Trust and Security approaches

In case of attack detection

Repression:

  • Revocation (fast) of the entity (its certificate)

Prevention:

  • Sandboxing the application on

the participant side and on the client side

  • Result certification on the user side
slide-32
SLIDE 32

Ecole GridUse 63 23 June 2004

γλGrand Large

Trust and Security approaches

CRISIS (Wide Area Security Architecture)

  • Certificate (fast revocation, fast access by cache)

+ Sandbox (Janus)

DEC (Authentication in Distributed Systems, 1992)

  • Theory of Authentication and mechanisms to implement it:

Node to node communication, roles, loading programs, delegation, IPC, considering ACL, Certification Authorities, lifetime, revocation

In case of attack detection

Repression:

  • Revocation (fast) of the entity (its certificate)

Prevention:

  • Sandboxing the application on

the participant side

  • Result certification on the user side

Some solutions exist but we don’t know how they: 1) scale with the number of entities,

2) can be simplified according to different deployment scenario

Ecole GridUse 64 23 June 2004

γλGrand Large

Node security

Execution node PC Corrupted code Internet

  • r LAN

Without resource authentication, How to guaranty the security ? (the executed code should not be able to corrupt the resource ) Native code execution : Application Interceptio n Execution

System call

“Sandboxing” of native code exec (ptrace based): Janus, Subterfugue, Father Process controler Argument checking (stack) Authorization

Fork (ptrace)

Ptraced Proc Kernel

Possible « race condition » here

slide-33
SLIDE 33

Ecole GridUse 65 23 June 2004

γλGrand Large

Node security

Application

Native kernel execution System calls Checked by the Virtual system

System call

Application

Virtual system with ptrace based checking : UML

Fork (ptrace)

Linux Security Modules (LSM) : Kernel …

Write(… ) … Setparams jmp @ …

Application

System call Hook

Module sécurité Modifications Code de verification Authorization

Ecole GridUse 66 23 June 2004

γλGrand Large

Result Certification

Client PC Hacker PC Result Result Collector PC Internet

  • r LAN

Example: FFT computation in SETI@home Corruption cases:

  • hacked modifications on client

PC

  • Fault on client PC

How to detect corruptions:

  • The result certification cannot rely on client

authentication and communication encryption

  • The system must be able to detect corruption using
  • nly result analysis after reception:
slide-34
SLIDE 34

Ecole GridUse 67 23 June 2004

γλGrand Large

  • Execution redundancy +

Majority voting

  • Spot checking (choose randomly

a server for a redundant execution and check its result)

  • Combination of Spot checking and

majority voting: Credibility based Fault Tolerance: Worker credibility (increases with the number of successful spot checks), Result credibility (how many workers Have returned thee same result)

Approaches for corruption detection (also known as: sabotage tolerance) Application based System based

  • Statistical analysis (on large

population) from parameters of result characterization

  • Statistical analysis (Monte-Carlo

application),

  • Easy checkable results

(ex: linear system solution)

  • Applications tolerant to poor result

quality or punctual fault (image synthesis, movie from image synth.)

Result Certification

Ecole GridUse 68 23 June 2004

γλGrand Large

Outline

  • Introduction : Desktop Grid, foundation for a Large

Scale Operating System.

  • Some architecture issues
  • Fault tolerance
  • Programming
  • Security (Trust)
  • Final Rem arks ( w hat w e have learned so far)
slide-35
SLIDE 35

Ecole GridUse 69 23 June 2004

γλGrand Large

1 CGP2P ACI GRID (academic research on Desktop Grid systems), France 2 Industry research project (Airbus + Alcatel Space), France 3 Augernome XtremWeb (LAL+IPNO Desktop Grid), in production, France 4 Orsay University Desktop Grid, research, France 5 EADS (Airplane + Ariane rocket manufacturer), tested, France 6 Alcatel Space, to be used in production, France 7 IFP (French Petroleum Institute), tested, France 8 University of Geneva, (research on Desktop Grid systems), Switzerland 9 University of Winsconsin Madisson, Condor+XW, USA 10 Mathematics lab University of Paris South (PDE solver research) , France 11 University of Lille (control language for Desktop Grid), research, France 12 IRISA (INRIA Rennes), tested, France 13 UCSD, (simulation of scheduling policies), production/research, USA 14 PUCRS, (protein screening), production, Brazil

XtremWeb: User projects

Ecole GridUse 70 23 June 2004

γλGrand Large

Lesson learned 1

Deployment is a complex issue: Human factor (system administrator, PC owner) Installation on a case to case basis (the most limiting factor!!!) Use of network resources (backup during the night) Dispatcher scalability (hierarchical, distributed?) (1 million Jobs requires some Database and file system

  • ptimizations)

Complex topology (NAT, firewall, Proxy).

Computational resource capacities limit the application range: Limited memory (128 MB, 256 MB), Limited network performance (100baseT),

Lack of programming models limit the applications: Need for RPC Need for MPI

slide-36
SLIDE 36

Ecole GridUse 71 23 June 2004

γλGrand Large

Lesson learned 2

Users don’t understand immediately the available computational power When they understand, they propose new utilization of their applications (similar to the transition from sequential to parallel) They also rapidly ask for more resources!!! Strong need for tools helping users browsing the massive amount of results Strong issues about security, fairness, load balancing, etc.

Many usage of Desktop Grid (Separated Platforms) Could we assemble them to build a large scale OS?

Ecole GridUse 72 23 June 2004

γλGrand Large

Bibliography

[1] XtremWeb project, FGCS 2004 [2] Projet ACI GRID CGP2P, www.lri.fr/~fci/CGP2P [3] Projet XtremWeb, www.xtremweb.net [4] Third « Global P2P Computing » Workshop coallocated with IEEE/ACM CCGRID 2003, Tokyo, Mai 2003, http://gp2pc.lri.fr [5] « Peer-to-Peer Computing », D. Barkai, Intel press, 2001, Octobre 2001. [6] « Harnessing the power of disruptive technologies”, A. Oram éditeur, edition O’Reilly, Mars 2001 [7] « Search in power-law networks », L. A. Adamic et al. Physical Review, E Volume 64, 2001 [8] « The Grid : Blueprint for a new Computing Infrastructure », I. Foster et C. Kesselman, Morgan-Kaufmann, 1998.