XtreemOS European Project: Achievements & Perspectives Christine - - PowerPoint PPT Presentation

xtreemos european project achievements
SMART_READER_LITE
LIVE PREVIEW

XtreemOS European Project: Achievements & Perspectives Christine - - PowerPoint PPT Presentation

XtremOS tutorial on security XtreemOS European Project: Achievements & Perspectives Christine Morin XtreemOS scientific coordinator Head of Myriads research team INRIA Rennes - Bretagne Atlantique CCGSC 2010 Flat Rock, NC XtreemOS IP


slide-1
SLIDE 1

XtremOS tutorial on security

XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576

XtreemOS European Project: Achievements & Perspectives

Christine Morin XtreemOS scientific coordinator Head of Myriads research team INRIA Rennes - Bretagne Atlantique

CCGSC 2010 – Flat Rock, NC

1

slide-2
SLIDE 2

XtreemOS in a Nutshell

  • Distributed operating system for large scale dynamic

Grids

  • “Operating system” approach
  • Comprehensive set of cooperating system services
  • Ease of use
  • “Bring the Grid to standard users”
  • Unix system interface
  • SAGA programming interface
  • Scalability
  • Dependable system

2

slide-3
SLIDE 3

XtreemOS Flavours

3

slide-4
SLIDE 4

XtreemOS Open Source Software

  • Open source development
  • Release 2.1.1 packaged for Mandriva and Asianux

Linux distributions

  • Packaging in progress for Debian, Ubuntu, Open Suse
  • Ready to use VM images for KVM & Virtual Box
  • Open testbed for the community
  • Test your applications without installing XtreemOS
  • Tool for automatic configuration of the system
  • Deployment on Grid’5000

4

  • Jun. 06
  • Dec. 08
  • Nov. 09
  • Rel. 2.0
  • Rel. 1.0
slide-5
SLIDE 5

5

Overview of Applications

19 applications demonstrating and evaluating XtreemOS from the perspective of industrial and academic end-users

Electromagnetics CAE Particle Physics Mobile applications Virtual Reality Fluid Dynamics Enterprise solutions Cloud Computing Optimization

slide-6
SLIDE 6

Some Contributions

  • XtreemOS system services
  • VO & security management
  • XtreemFS Grid file system
  • Job & resource management
  • OSS object sharing system
  • XOSAGA
  • SAGA programming interface
  • Virtual Node approach
  • Highly available applications & system services

6

slide-7
SLIDE 7

VO & Security Management

  • Scalable VO management
  • Independent user & resource management
  • On-the-fly mapping of Grid credentials to Linux user

accounts

  • Customizable isolation, access control and auditing
  • Secure and reliable application execution
  • Fine-grained control of resource usage

7

slide-8
SLIDE 8

VO & Security Management

  • Improved usability
  • Local resource administrator: autonomous management of local

resources

  • VO administrator: flexible management of credential and VO

policies

  • End user: login as a Grid user into a VO
  • On-line certificate distribution
  • Single sign-on & delegation
  • System services services trust each other (“operating

system approach”)

  • A trusted credential store service associated to each

user session

  • There is not need of proxy certificates

8

slide-9
SLIDE 9

Grid Management

9

slide-10
SLIDE 10

10

XtreemFS Grid File System

Federating storage in different administrative domains

slide-11
SLIDE 11

XtreemFS Features

  • Posix compatible file system (API, behaviour)
  • Provide users a global view of their files in a Grid
  • Each XtreemOS user has a home volume in XtreemFS
  • Transparent location-independent access to data
  • Consistent data sharing
  • Access control based on VO member credentials
  • Autonomous data management with self-organized

replication and distribution

  • Advanced metadata management

11

slide-12
SLIDE 12

Job & Resource Management

  • Job self-scheduling
  • Decentralized resource discovery based on overlays
  • Resource reservation
  • Unix-like job management
  • Support for interactive jobs
  • Accurate & adaptable monitoring
  • Job checkpoint/restart & migration

12

slide-13
SLIDE 13

XtreemGCP Service

  • Automatic management of the user specified fault tolerance

strategy

  • Handling checkpoint/restart for Grid applications

Paris London Düsseldorf Barcelona Job A Job unit A1 Job unit A2 Job unit A3 Job unit A4

13

slide-14
SLIDE 14

XtreemGCP Service

  • Generic service
  • Different levels to implement fault tolerance
  • In the application code
  • In a programming environment (MPI …)
  • At system level transparently to the application
  • VM Suspend/restart
  • Different backward error recovery protocols
  • Checkpoint based (coordinated, independent, message

induced, …), message logging based (pessimistic, optimistic, causal, …),…

  • Different technologies for process group checkpointing
  • Some do not handle all resources

14

slide-15
SLIDE 15

Process Group Checkpointers

CoCheck Condor DCR DMTCP & MTCP BLCR LAM/MPI&BLCR zap CLIP libckpt Dynamite LinuxSSI Linux-native OpenVZ tmPVM VMWare player Ckpt CHPOX CRAK UCLiK Epckpt MCR SCore TICK VMADump KMU

CP/R

15

slide-16
SLIDE 16

Us User er Per erspect pectiv ive

  • User/application commands

$xjobcheckpoint JobID $xjobrestart JobID CPversion

  • JSDL file extensions
  • Extended by checkpointing tags
  • Checkpointer requirements
  • Protocols and parameters
  • ...

16

slide-17
SLIDE 17

JS JSDL L File File Sample: ample: Chec heckpoint kpointing ing

<JobCheckpointing> <Initiator>System</Initiator> <ProtocolManagement> <Name>CoordinatedCheckpointing</Name> <Parameter>1hour<Parameter> </ProtocolManagement> <FileManagement> <ReplicationLevel>5<ReplicationLevel> </FileManagement> <JobCheckpointerMatching> <MultiThread>Yes</MultiThread> <Sockets>Yes</Sockets> </JobCheckpointerMatching> </JobCheckpointing>

17

slide-18
SLIDE 18

XtreemOS-GCP Architecture

Job Checkpointer

(Job Manager extension)

Job-unit Checkpointer

(Execution Manager extension)

SSI-Translib BLCR-Translib LinuxSSI Kernel Checkp. BLCR Checkpointer XtreemOS-SSI cluster Common Checkpointer API Grid level Node Level Job-unit Checkpointer

(Execution Manager extension)

XtreemOS PC

18

slide-19
SLIDE 19

Common Kernel Checkpointer API

  • Provide a uniform access to different checkpointers
  • translib library
  • Translate jobs in Linux process groups
  • Translate user credential in Linux user account
  • Provide callbacks to applications
  • Processed during checkpoint and restart operations
  • Allow applications to optimize checkpointing
  • Used to drain communication channels

19

slide-20
SLIDE 20

20

  • To which extent must existing checkpointers be

adapted to support various checkpointing protocols?

  • We need the following sequences
  • Stop
  • Checkpoint
  • resume_cp
  • Rebuild
  • resume_rst

Common Checkpointer API

Checkpoint Restart

slide-21
SLIDE 21

21

Callback Management

  • Implemented in the generic part of translib
  • Called before and after a checkpoint and after restart
  • Common API for application callback registration
  • Usage
  • Application optimizations
  • Complement checkpointer incapabilities
  • Checkpointing communication channels
slide-22
SLIDE 22

22

Other Issues

  • Fault tolerance information stored in XtreemFS Grid

file system

  • checkpoint replication
  • checkpoint can be accessed from any Grid node
  • Resource conflict avoidance at restart
  • Management of security issues regarding the use of

fault tolerance information

slide-23
SLIDE 23

Current Status

  • XtreemGCP fully integrated in XtreemOS
  • PC and cluster nodes
  • Sequential, parallel and distributed applications
  • System level checkpointing
  • Kernel checkpointer supported
  • BLCR, OpenVZ based checkpointer, native Linux

checkpointer, Kerrighed checkpointer

  • Call back mechanisms
  • Protocols supported
  • Coordinated checkpointing (for job migration)
  • Independent checkpointing

23

slide-24
SLIDE 24

What’s coming next?

24

slide-25
SLIDE 25

What’s coming next?

  • Sustainability of the XtreemOS Grid technology
  • Cloud computing - Contrail EC funded R&D project

25

slide-26
SLIDE 26

XtreemOS & Cloud Computing

  • Feasibility studies (2008 - …)
  • Extending an XtreemOS Grid with resources gathered

from Clouds

  • Hbase on top of XtreemFS
  • Picture sharing application over XtreemOS in a cloud
  • XtreemOS as a system to manage IaaS Clouds

26

XOS over Clouds

Bare HW Bare HW Bare HW Bare HW

XtreemOS

Virtualization

XOS for IaaS XtreemOS

Virtualization

Bare HW Bare HW Bare HW Bare HW

XOS over Clouds

Bare HW Bare HW Bare HW Bare HW

XtreemOS

Virtualization Virtualization

slide-27
SLIDE 27

Contrail European Project

  • Objectives
  • Design, implement, evaluate and promote an open

source system to federate computing resources from different providers in a single cloud easy to access for users

  • Approach
  • Vertical integration of
  • Infrastructure-as-a-Service services
  • Runtimes and high level services providing the

foundations for Platform-as-a-Service services

27

slide-28
SLIDE 28

Cont

  • ntrail

ail in in a a Nut Nutshell hell

28

slide-29
SLIDE 29

Contrail European Integrated Project

  • Coordinator
  • INRIA, France
  • Academic partners
  • CNR, Italy
  • STFC, UK
  • Vrije Universiteit Amsterdam,

The Netherlands

  • ZIB, Germany
  • Industrial partners
  • CONSTELLATION, UK
  • GENIAS, The Netherlands
  • HP, Italy
  • TISCALI, Italy
  • XLAB, Slovenia
  • Starting date: October 2010
  • Duration: 3 years
  • Budget: 11,4 M€
  • EC funding: 8,3 M€

29

slide-30
SLIDE 30

Acknowledgements

30

slide-31
SLIDE 31

More Information

  • XtreemOS
  • Web site: http://www.xtreemos.eu
  • Software: http://gforge.inria.fr/projects/xtreemos/
  • GPL/BSD licence
  • INRIA/XtreemOS booths at SC 2010
  • Contrail
  • http://www.contrail-project.eu

31