XCPU: A Process Management System L a t c h e s a r I o n k o v - - PowerPoint PPT Presentation

xcpu a process management system
SMART_READER_LITE
LIVE PREVIEW

XCPU: A Process Management System L a t c h e s a r I o n k o v - - PowerPoint PPT Presentation

XCPU: A Process Management System L a t c h e s a r I o n k o v L O S A L A M O S N A T I O N A L L A B O R A T O R Y HPC Cluster Desktop Desktop Desktop Desktop Head FS Node ... CN1 CN2 CN k IO1 FS ... CN k+1 CN k+2 CN l IO2


slide-1
SLIDE 1

XCPU: A Process Management System

L a t c h e s a r I o n k o v L O S A L A M O S N A T I O N A L L A B O R A T O R Y

slide-2
SLIDE 2

HPC Cluster

Head Node CN1 CN2 CNk

...

CNk+1 CNk+2 CNl

...

CNl+1 CNl+2 CNm

...

CNm+1 CNm+2 CNn

...

IO1 IO2 IO3 IOp FS FS FS FS FS Desktop Desktop Desktop Desktop

slide-3
SLIDE 3

Hardware

Nodes

  • 4-8 CPU cores
  • 2GB RAM per core
  • diskless

Network

  • high bandwidth (2-20 GB/s)
  • low latency
  • mostly Infiniband
slide-4
SLIDE 4

Software

Linux Cluster management: Perceus, Warewulf, XCAT Job scheduling: Torque, Moab, SLURM Compute Nodes:

  • boot over the network
  • minimal root image in the RAM
  • more software on parallel filesystem
  • same software on all nodes
slide-5
SLIDE 5

Running a Job

Make sure all libraries are included in the cluster software stack Collect all binaries, configuration and data files Write job script that

  • transfers all files to the assigned nodes (if not on a

mounted filesystem already)

  • runs the binary
  • waits for the result
  • collects the results and sends the back

Schedule a job, wait until it is finished

slide-6
SLIDE 6

Unix: Resources As Files

Most devices accessible as files, but ioctl

  • normal files don’t do ioctls
  • ioctls are opaque
  • impossible to support over the network

Not everything is a file

slide-7
SLIDE 7

Unix: Sharing Resources

Files -- NFS, CIFS, AFS, FTP Printers -- CUPS, LPD Sound -- Pulseaudio, aRts, NAS Display -- X11, VNC, NX Ad-hoc protocols for each device

slide-8
SLIDE 8

Sharing I Would Like

Access local files on a remote server Remote program to use the local sound card Program running remotely to print on the local printer Program running remotely to use the locally established VPN

slide-9
SLIDE 9

Resources As (Regular) Files

If device files don’t use ioctl operations sharing

  • ver the network is easy

Devices:

  • /dev/sound
  • /dev/printer
  • /dev/display
  • /dev/net
slide-10
SLIDE 10

Unix: Too Is Uni-

Single file namespace

  • all users see the same files
  • the root decides what filesystems can be

mounted The root decides what printers the users can print to The root decides what networks are available

slide-11
SLIDE 11

Linux Private Namespaces

Linux allows processes to have private namespaces Security issues -- legacy applications and libraries expect single namespace Solution -- only root can create private namespaces Result -- nobody uses private namespaces

slide-12
SLIDE 12

Sharing Solutions

Virtualization User-level workarounds Files -- GNOME GIO/GVFS, KDE KIO Printers -- none Network -- none

slide-13
SLIDE 13

How To Fix IT

Fix legacy code and loosen private namespace restrictions, loosen mount restrictions Represent more resources as files -- FUSE and 9P make it easy Get rid of ioctls for the kernel devices Effect:

  • resources can be shared easily over the network
  • users can setup their favorite resources on a remote

server without involving the sysadmin

slide-14
SLIDE 14

How Will It Work

When a user logs in on a remote server, a new private namespace is created Print on local printer -- mount at /dev/printer Sound on local speakers -- mount at /dev/sound Use local VPN -- mount it at /dev/net The resources are invisible to other users and don’t affect their work

slide-15
SLIDE 15

XCPU: Remote Execution

Distribute job related files (binary, data, configuration) to all nodes Setup job environment, arguments Start, monitor and control job execution Clean-up when the job is done Can survive head node crash

slide-16
SLIDE 16

XCPU Interface

Interface implemented as file tree Global files

  • arch
  • ctl
  • clone
  • env

Job session files

  • ctl
  • argv
  • env
  • std{in,out,err}
  • wait
slide-17
SLIDE 17

XCPU: Example

XCPU file interface mounted on /mnt/xcpu

$ cd /mnt/xcpu $ ls arch clone ctl env $ tail -f clone & 2 $ cd 2 $ ls argv ctl fs/ stdin stdout stderr wait $ echo foo > argv $ cp /bin/cat fs/cat $ echo hello world > fs/foo $ echo exec cat > ctl $ cat stdout hello world

slide-18
SLIDE 18

XCPU: How To Scale

Copy files to many nodes Linear (or even parallel) distribution doesn’t scale Solution: setup few sessions from the head node and instruct the compute nodes to clone them further Runs recursively, as many levels as necessary

n3 n4 n5 n6 head node n1 n2 n7 n8 n9 n10

slide-19
SLIDE 19

XCPU: Tree Spawn

1.Head node creates sessions

  • n all nodes (open clone)

2.Head node sets up few sessions (argv, env, executable and input files) 3.Head node instructs sessions to clone themselves to other sessions

echo clone n3,n4,n7,n8 > ctl

4.Head node starts execution

n3 n4 n5 n6 head node n1 n2 n7 n8 n9 n10

slide-20
SLIDE 20

XCPU Implementation

9P2000 resource sharing protocol The server (xcpufs) runs on every compute node The synthetic file interface exported by xcpufs is mountable on Linux, or accessible

  • n any OS using user-space 9P client library

Total size, server, tools and libraries, is 20K lines of code

slide-21
SLIDE 21

XCPU Client Tools

job execution

xrx n[1-128],n250 /bin/date xrx -s n[1-128] /bin/date xrx -n 2 n[1-128] /bin/date xrx -a /bin/date xrx -J foo

list processes -- xps kill process -- xk libxcpu

slide-22
SLIDE 22

XCPU Security

Ownership and permissions on files define who can do what The program runs as the user that mounted the file interface Authentication

  • similar to the ssh authentication
  • no authentication server, public keys distributed

in advance

  • superuser (xcpu-admin) can run job as any user

XCPU users different than Unix users

slide-23
SLIDE 23

XCPU2: Next Level Of Sharing

The desktop exports its filesystem All nodes for a job mount it and see the same files as the user’s desktop If an application works on user’s desktop it will (most likely) work on the cluster No library mismatches, no missing files, no wrong pathnames Similar to Plan9’s cpu command

slide-24
SLIDE 24

XCPU2 Cluster

New node type -- job control node Responsible for controlling the nodes assigned for a job Job nodes “see” the filesystem on the job control node Jobs on the same node can use different distributions

Control Node

Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Job Control Node Job Control Node Job Control Node

slide-25
SLIDE 25

XCPU2 Example

XCPU file interface mounted on /mnt/xcpu

$ pwd /home/lucho $ ls foo bar $ xrx remote pwd /home/lucho $ xrx remote ls foo bar $

slide-26
SLIDE 26

XCPU2 Namespace

Common case -- import the root filesystem from job control node ns file allows custom namespaces Example:

unshare

import $XCPUTSADDR /mnt/term bind /dev /mnt/term/dev bind /proc /mnt/term/proc bind /sys /mnt/term/sys chroot /mnt/term

Operations unshare mount bind import cd chroot cache

slide-27
SLIDE 27

XCPU2 Scalability

All nodes for a job are likely to use the same system files Cooperative caching between the nodes in a job would achieve high hit rate Currently read-only, non-cooperative caching

n3 n4 n5 n6 head node n1 n2

/bin/cat /etc/hosts /bin/cat /bin/cat /bin/cat /bin/cat

slide-28
SLIDE 28

Conclusions

XCPU2 transparently imports user’s desktop environment to all cluster nodes Makes it very easy to use different distributions and configurations If more devices and services operated as normal files, the integration would be even better (Plan9’s cpu command) Experiment with user- and kernel-level services that look like regular files Don’t be afraid of private namespaces, use them and ask your distributions for support!

slide-29
SLIDE 29

Links

Plan9 http://plan9.bell-labs.com Glendix http://www.glendix.org 9P libraries http://9p.cat-v.org/implementations XCPU http://xcpu.org