Investigating the impact of the Large Scale on distributed systems - - PowerPoint PPT Presentation

investigating the impact of the large scale on
SMART_READER_LITE
LIVE PREVIEW

Investigating the impact of the Large Scale on distributed systems - - PowerPoint PPT Presentation

ACI Grid CGP2P Grand Large Investigating the impact of the Large Scale on distributed systems F. Cappello INRIA Grand-Large Project, INRIA/PCRI LRI, Universit Paris Sud fci@lri.fr, www.lri.fr/~fci 1 French / UK workshop on GRID


slide-1
SLIDE 1

French / UK workshop on GRID Computing

1

Grand Large

γλ

ACI Grid CGP2P

Investigating the impact of the Large Scale on distributed systems

  • F. Cappello

INRIA

Grand-Large Project, INRIA/PCRI

LRI, Université Paris Sud

fci@lri.fr, www.lri.fr/~fci

slide-2
SLIDE 2

French / UK workshop on GRID Computing

2

Grand Large

γλ

ACI Grid CGP2P

Several types of GRID

Node Features:

« Desktop GRID »

  • r « Internet Computing »

(Seti@home, Decrypthon, Climate-Prediction) Peer-to-Peer systems (Napster, Kazaa, etc.) Large scale distributed systems Large sites Computing centers, Clusters PC Windows, Linux

  • <100
  • Stables
  • Individual

credential

  • Confidence
  • ~100 000
  • Volatiles
  • No

authentication

  • No

confidence Computing « GRID » 2 kinds of Grids

slide-3
SLIDE 3

French / UK workshop on GRID Computing

3

Grand Large

γλ

ACI Grid CGP2P

Fusion of Dgrid and P2P General Purpose Large Scale Distributed systems

  • Large computing infrastructures (~10 000 nodes or more)
  • Geographically distributed / different administration domains
  • With almost no control of the participating nodes
  • Where any node to play different roles (client, server, system

infrastructure)

Request may be related to Computations or data

Coordination system Client (PC)

request result accept provide

PC PC PC PC PC PC PC PC PC

accept provide request result

Client (PC) Service provider (PC)

Potential communications for parallel applications

Server provider (PC)

Accept concerns computation or data

slide-4
SLIDE 4

French / UK workshop on GRID Computing

4

Grand Large

γλ

ACI Grid CGP2P

Distributed System Problematic renewal

A very simple problem statement but leading to a lot of research issues (classical OS): Scheduling, Load Balancing, Security, Fairness, Coordination, Message passing, Data storage, Programming, Deployment, etc. BUT « Large Scale » feature has severe implications:

  • Node Volatility, Network failures, Asynchrony
  • Lack of trust (very low control of participating nodes)
  • No consistent global view of the system

Conventional techniques/approaches may not fit Ex: fault tolerance

  • Classical fault tolerance (consensus impossible)
  • Self-Stabilization (the system is always changing)

New approaches (intrinsically scalable/FT) are needed

  • Autonomous decisions, Self-organization, etc.
slide-5
SLIDE 5

French / UK workshop on GRID Computing

5

Grand Large

γλ

ACI Grid CGP2P

26 pers, 7 labs (started in 2001; end in July 2004)

Research topics and sub-projects: Global architecture (F. C. and O. R.) User Interface, control language (SPI, S. Petiton) Security, sandboxing (SPII, O. Richard) Large scale Storage (SPIII, Gil Utard) Inter-node communications : MPICH-V (SPIV, F. Cappello) Scheduling -large scale, multi users- (SPIV, C. G. and F.C.) Theoretical proof of the protocols (SPV, J. Beauquier) GRID/P2P interoperability (SPV, A. Cordier) Validation on real applications (G. Alléon, etc.)

slide-6
SLIDE 6

French / UK workshop on GRID Computing

6

Grand Large

γλ

ACI Grid CGP2P

Combining research tools

According to the current knowledge, we need: 1) New tools (model, simulators, emulators, experi. Platforms) 2) Strong interaction between research tools

Tools for Large Scale Distributed Systems

log(cost)

XtremWeb MPICH-V SMLSM US ADSL-Stats Grid’5000

log(realism)

SimLargeGrid Model for LSDS Protocol proof

Grid eXplorer

emulation math simulation live systems

slide-7
SLIDE 7

French / UK workshop on GRID Computing

7

Grand Large

γλ

ACI Grid CGP2P

ACI Grid CGP2P Contribution

CGP2P results log(cost)

XtremWeb MPICH-V SMLSM US ADSL-Stats Grid’5000

log(realism)

SimLargeGrid Model for LSDS Protocol proof

Grid eXplorer

emulation math simulation live systems

slide-8
SLIDE 8

French / UK workshop on GRID Computing

8

Grand Large

γλ

ACI Grid CGP2P

Combining research tools

According to the current knowledge, we need: 1) New tools (model, simulators, emulators, experi. Platforms) 2) Strong interaction between research tools

Tools for Large Scale Distributed Systems

log(cost) log(realism)

SimLargeGrid Model for LSDS Protocol proof

Grid eXplorer INRIA Grand-Large

XtremWeb MPICH-V SMLSM US ADSL-Stats Grid’5000 emulation math simulation live systems

slide-9
SLIDE 9

French / UK workshop on GRID Computing

9

Grand Large

γλ

ACI Grid CGP2P

Design of a theoretical model capturing LSDS characteristics

Network:

  • ~10 k nodes or larger,
  • Wide area network (Network failure, rare but to be considered)
  • Standard protocols (TCP/IP)

Nodes:

  • Volatile, Byzantine, crash may be permanent

TCP/IP + Very large scale + volatility

  • higher levels protocols must be "connexionless“ (<500 open

connection with select)

  • If a connexion fails, what does it means?

Either the target is down OR it cannot accept new connexions because all slots are full OR it does not see the incoming SYN message due to high network traffic

  • When a connexion is broken, what does it means?

Etc.

slide-10
SLIDE 10

French / UK workshop on GRID Computing

10

Grand Large

γλ

ACI Grid CGP2P

Design of a theoretical model capturing LSDS characteristics

Current issues:

  • LSDS systems seem to fall into the category of asynchronous

systems! (consensus impossibility)

  • Can fundamental mechanisms of LSDS systems be designed

without requiring consensus? An interesting strategy would be to consider for each node an “horizon”. Concensus would be guaranteed only inside this horizon.

These questions are not trivial!

Workshop: Hugues Fauconnier, Carole Delporte (Paris 7), Joffroy Beauquier, Franck Cappello, Colette Johnen, Sébastien Tixeuil, Thomas Herault (Paris 11)

slide-11
SLIDE 11

French / UK workshop on GRID Computing

11

Grand Large

γλ

ACI Grid CGP2P

Combining research tools

According to the current knowledge, we need: 1) New tools (model, simulators, emulators, experi. Platforms) 2) Strong interaction between research tools

Tools for Large Scale Distributed Systems

log(cost)

XtremWeb MPICH-V SMLSM US ADSL-Stats Grid’5000

log(realism)

Model for LSDS Protocol proof

Grid eXplorer

SimLargeGrid emulation math simulation live systems

slide-12
SLIDE 12

French / UK workshop on GRID Computing

12

Grand Large

γλ

ACI Grid CGP2P

SimLargeGrid: Large Scale Nearest NeighborScheduling Simulator

Global coordination seems very difficult at large scale (Hierarchical solutions exist and may fit). More speculative approaches based on autonomous decisions, self organization are also good candidates. Investigate this last idea with a concrete mechanism: Scheduler/Load balancer (SimGrid, Bricks, GriSim don’t scale) Current status: a simulation tool: topology, volatility, asynchrony, latency/BW, heterogeneity + nearest neighbor scheduling algorithms + use the tool to compare them.

slide-13
SLIDE 13

French / UK workshop on GRID Computing

13

Grand Large

γλ

ACI Grid CGP2P

SimLargeGrid: Large Scale Nearest NeighborScheduling Simulator

Based on Swarm Multi-agent simulator

slide-14
SLIDE 14

French / UK workshop on GRID Computing

14

Grand Large

γλ

ACI Grid CGP2P

Combining research tools

According to the current knowledge, we need: 1) New tools (model, simulators, emulators, experi. Platforms) 2) Strong interaction between research tools

Tools for Large Scale Distributed Systems

log(cost)

XtremWeb MPICH-V SMLSM US ADSL-Stats Grid’5000

log(realism)

Model for LSDS Protocol proof

Grid eXplorer

SimLargeGrid emulation math simulation live systems

slide-15
SLIDE 15

French / UK workshop on GRID Computing

15

Grand Large

γλ

ACI Grid CGP2P

Grid eXplorer

A “GRIDinLAB” instrument for CS researchers Founded by the French ministry of research through the ACI “Data Mass” incentive + INRIA For

  • Grid/P2P researcher community
  • Network researcher community

Addressing specific issues of each domain Enabling research studies combining the 2 domains Ease and develop collaborations between the two communities. Statistics: 13 Laboratories 80 researchers 24 Research Experiments >1M€ (not counting salaries) Installed at IDRIS (Orsay)

slide-16
SLIDE 16

French / UK workshop on GRID Computing

16

Grand Large

γλ

ACI Grid CGP2P

Grid eXplorer: the big picture

Close to

  • 13 Laboratories
  • 80 researchers

Emulab and WaniLab

A set of sensors An experimental Conditions data base Emulator Core

Hardware + Soft

for Emulation Simulation A set of tools for analysis Validation on Real life testbed

slide-17
SLIDE 17

French / UK workshop on GRID Computing

17

Grand Large

γλ

ACI Grid CGP2P

Grid eXplorer (GdX) current status:

  • First stage: Building the Instrument

– First GdX meeting was on September 16, 2003. – Hardware design meeting planned for October 15. – Hardware selection meeting on November 8

– Choosing the nodes (single or dual?) – Choosing the CPU (Intel IA 32, IA64, Athlon 64, etc.) – Choosing the experimental Network (Myrinet, Ethernet, Infiniband, etc.) – Choosing the general experiment production architecture (parallel OS architecture, user access, batch scheduler, result repositoty) – Choosing the experimental database harware – Etc.

slide-18
SLIDE 18

French / UK workshop on GRID Computing

18

Grand Large

γλ

ACI Grid CGP2P

Combining research tools

According to the current knowledge, we need: 1) New tools (model, simulators, emulators, experi. Platforms) 2) Strong interaction between research tools

Tools for Large Scale Distributed Systems

log(cost)

XtremWeb MPICH-V SMLSM US ADSL-Stats Grid’5000

log(realism)

Model for LSDS Protocol proof

Grid eXplorer

SimLargeGrid emulation math simulation live systems

slide-19
SLIDE 19

French / UK workshop on GRID Computing

19

Grand Large

γλ

ACI Grid CGP2P

Coordinator

Worker PC

result request

Coordinator Client PC Worker PC Client PC

job result job request

For research on DGrid:

  • Scalability
  • Fault tolerant
  • Programming models
  • GridRPC
  • Security (Sandbox)
  • Scheduling
  • DGrid Services
  • Deployment (Firewall/

NAT/Proxy bypass)

Main international users:

  • UCSD (Chien, Casanova)
  • U. Tsukuba (Sato)
  • U. Geneva (Abdenader)

result result

slide-20
SLIDE 20

French / UK workshop on GRID Computing

20

Grand Large

γλ

ACI Grid CGP2P

Production Example: XtremWeb-Auger

PC worker air shower Coordinator Internet and LAN PC Worker PC Client Air shower parameter database (Lyon, France)

  • Tasks are submited form
  • params. Database
  • users
  • Estimated PC number:

~ 5000

  • Production should start

by the fall December

  • Result certification by

replication

Understanding the origin of very high cosmic rays:

  • Aires: Air Showers Extended Simulation

– Sequential, Monte Carlo. Time for a run: 5 to 10 hours XtremWeb

PC worker

slide-21
SLIDE 21

French / UK workshop on GRID Computing

21

Grand Large

γλ

ACI Grid CGP2P

XtremWeb-Testbed

Internet

Grenoble PC pool (PBS) Madison Wisconsin PC Pool (Condor) University Network LRI PC Pool Other Labs Lab Network lri.fr XW Client XW Coordinator

EU-USA Network

50 100 150 200 250 300 350 400 10 20 30 40 50 60 70 80 90

Processeurs utilisés Temps en minutes

WLG−309/Fautes WLG−270

Massive Fault (150 CPUs) Fault Free Situation

Scalability Resistance to massive fault

50 100 150 200 250 200 400 600 800 1000

Temps en minutes Nombre de travaux exécutés

WISC−97 WL−113 G−146 WLG−270 WL−451

About 1K CPUs

slide-22
SLIDE 22

French / UK workshop on GRID Computing

22

Grand Large

γλ

ACI Grid CGP2P

Toward an automatic/scalable fault tolerant MPI for Clusters & Grids MPICH-V is a research effort

– with theoretical studies, – experimental evaluations, – pragmatic implementations, aiming to provide a MPI implementation based on MPICH, featuring multiple fault tolerant protocols (3 currently), for Desktop Grids, Large Clusters and Grids

slide-23
SLIDE 23

French / UK workshop on GRID Computing

23

Grand Large

γλ

ACI Grid CGP2P

Main Results

Execution time with faults (Fault injection) BT.A.9

base comp base comm w/o checkpoint comp w/o checkpoint comm p4 comp p4 comm 9 16 25 Number of nodes 50 100 150 200 250 300 350 Time in seconds whole comp whole comm

MPICH-V (CM but no logs) MPICH-V (CM with logs) MPICH-V (CM+CS+ckpt) MPICH-P4

MPICH-V vs. MPICH-P4 BT.A

Number of faults during execution Total execution time (sec.)

Base exec. without ckpt. and fault

1 2 3 4 5 6 7 8 9 10 610 650 700 750 800 850 900 950 1000 1050 1100 ~1 fault/110 sec.

Performance similar to MPICH-P4 Resistance to a very frequent faults

slide-24
SLIDE 24

French / UK workshop on GRID Computing

24

Grand Large

γλ

ACI Grid CGP2P

Other work and Conclusion

Many of the CGP2P Participants are also involved in:

  • Grid’5000
  • CoreGrid (NoE) proposals for FP6

Summary: We are involved in different projects related to large scale distributed systems: – From Theoretical Studies to Actual Grid Deployments – About Fault Tolerance and Performance – Middleware Design and Implementation: XtremWeb, MPICH-V – Large Scale Experimental Platforms: Grid eXplorer, Grid’5000

Contact: fci@lri.fr

slide-25
SLIDE 25

French / UK workshop on GRID Computing

25

Grand Large

γλ

ACI Grid CGP2P

Links

ACI Grid CGP2P: www.lri.fr/~fci/CGP2P XtremWeb: www.XtremWeb.net MPICH-V: www.lri.fr/~gk/MPICH-V Grid eXplorer: www.lri.fr/~fci/GdX eGrid’5000: www.lri.fr/~fci/AS1

slide-26
SLIDE 26

French / UK workshop on GRID Computing

26

Grand Large

γλ

ACI Grid CGP2P

Grid eXplorer 4 Research Topics

The 4 research topics and their leaders:

  • Infrastructure (Hardware + system),

Olivier Richard (ID-IMAG)

  • Emulation,

Pierre Sens (LIP6)

  • Network,

Pascale Primet (LIP, Inria RESO)

  • Applications.

Christophe Cérin (Laria)

slide-27
SLIDE 27

French / UK workshop on GRID Computing

27

Grand Large

γλ

ACI Grid CGP2P

X II.16 Grid coupled sys. X X X II.15 NG. Internet X X X II.14 Security X X X II.13 Reliability X X II.12 P2P storage X II.11 Bioinformatique X X II.10 Cellul. automaton X X II.9 Uni and multicast X X II.8 Data sharing X II.7 Comm. Optimizat. X X II.6 Scheduling X X II.5 Data base X II.4 DHT. X X II.3 Fault tolerance X X II.2 Mobile objects X X X II.1 Engineering tech. X X X I.8 Internet Emul. X I.7 Communication X I.6 Heterogeneity emul X X X I.5 Network. X I.4 Emul driven Simul X X I.3 Virt. Techniques X X I.2 Virtual Grid X X X X I.1 Platform Application Network Emulation Infrastructure Experiences

slide-28
SLIDE 28

French / UK workshop on GRID Computing

28

Grand Large

γλ

ACI Grid CGP2P

Desktop Grids

Volunteer PC Downloads and executes the application Volunteer PC Parameters Client application

  • Params. /results.

Internet Volunteer PC Coordinator

  • Dedicated Applications

– SETI@Home, distributed.net, – Décrypthon (France)

  • Production applications

– Folding@home, Genome@home, – Xpulsar@home,Folderol, – Exodus, Peer review,

  • Research Platforms

– Javelin, Bayanihan, JET, – Charlotte (based on Java),

  • Commercial Platforms

– Entropia, Parabon, – United Devices, Platform (AC)

A central coordinator schedules tasks

  • n volunteer computers,

Master worker paradigm, Cycle stealing

slide-29
SLIDE 29

French / UK workshop on GRID Computing

29

Grand Large

γλ

ACI Grid CGP2P

Peer to Peer systems (P2P)

  • User Applications

– Instant Messaging – Managing and Sharing Information – Collaboration – Distributed storage

  • Middleware

– Napster, Gnutella, Freenet, – KaZaA, Music-city, – Jabber, Groove,

  • Research Projects

– Globe (Tann.), Cx (Javalin), Farsite, – OceanStore (USA), – Pastry, Tapestry/Plaxton, CAN, Chord,

  • Other projects

– Cosm, Wos, peer2peer.org, – JXTA (sun), PtPTL (intel), Volunteer Service Provider Volunteer Volunteer PC participating to the resource discovery/coordination Internet Client

req.

All system resources

  • may play the roles of client

and server,

  • may communicate directly

Distributed and self-organizing infrastructure

slide-30
SLIDE 30

French / UK workshop on GRID Computing

30

Grand Large

γλ

ACI Grid CGP2P

Nearest Neighbor Scheduling with a 3D visualization tool

10K tasks on 900 nodes in mesh

  • Negotiation (red movie)
  • Distribution (blue movie)
  • Execution (green movie)
  • Observation results:

—Symmetry for the negotiation phase —Asymmetry for Distribution and Execution phases. —Waves Several hours to get 1 movie parallel simulation is required!

slide-31
SLIDE 31

French / UK workshop on GRID Computing

31

Grand Large

γλ

ACI Grid CGP2P

Objectives and constraints

Goal: execute existing or new MPI Apps

PC client MPI_send() PC client MPI_recv()

Programmer’s view unchanged: Problems: 1) volatile nodes (any number at any time) 2) non named receptions ( should be replayed in the

same order as the one of the previous failed exec.)

Objective summary: 1) Automatic fault tolerance 2) Transparency for the programmer & user 3) Tolerate n faults (n being the #MPI processes) 4) Scalable Infrastructure/protocols 5) Avoid global synchronizations (ckpt/restart) 6) Theoretical verification of protocols