cluster computing with openhpc
play

Cluster Computing with OpenHPC Karl W. Schulz, Ph.D. Technical - PowerPoint PPT Presentation

http://openhpc.community Cluster Computing with OpenHPC Karl W. Schulz, Ph.D. Technical Project Lead, OpenHPC Community Scalable Datacenter Solutions Group, Intel HPC PCSYS YSPR PROS16 Workshop, SC16 November 14 ! Salt Lake City, Utah


  1. http://openhpc.community Cluster Computing with OpenHPC Karl W. Schulz, Ph.D. Technical Project Lead, OpenHPC Community Scalable Datacenter Solutions Group, Intel HPC PCSYS YSPR PROS16 Workshop, SC’16 November 14 ! Salt Lake City, Utah

  2. Acknowledgements • Co Co-Au Authors : Reese Baird, David Brayford, Yiannis Georgiou, Gregory Kurtzer, Derek Simmel, Thomas Sterling, Nirmala Sundararajan, Eric Van Hensbergen • OpenHPC Technical Steering Committee (TSC) • Linux Foundation and all of the project members • Intel, Cavium, and Dell for hardware donations to support community testing efforts • Texas Advanced Computing Center for hosting support 2

  3. Outline • Community project overview - mission/vision - members - governance • Stack overview • Infrastructure: build/test • Summary 3

  4. OpenHPC: Mission and Vision • Mi Mission : to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools. • Vi Vision : OpenHPC components and best practices will enable and accelerate innovation and discoveries by broadening access to state-of-the-art, open-source HPC methods and tools in a consistent environment, supported by a collaborative, worldwide community of HPC users, developers, researchers, administrators, and vendors. 4

  5. OpenHPC: Project Members Argonne6 National6 Laboratory CEA Project6member6participation6interest? Please6contact6 Mi Mixture of Academi mics, Labs, OE OEMs, and d ISVs/OS OSVs Jeff6ErnstFriedman jernstfriedman@linuxfoundation.org

  6. OpenHPC Technical Steering Committee (TSC) Role Overview OpenHPC Technical6Steering6Committee6(TSC) Integration6 Upstream6Component6 Project6 EndUUser6/6Site6 Testing6 Development6 Maintainers Leader Representative(s) Coordinator(s) Representative(s) https://github.com/openhpc/ohpc/wiki/Governance7Overview 6

  7. Stack Overview • Packaging efforts have HPC in mind and include compatible modules (for use with Lmod) with development libraries/tools • Endeavoring to provide hierarchical development environment that is cognizant of different compiler and MPI families • Intent is to manage package dependencies so they can be used as building blocks (e.g. deployable with multiple provisioning systems) • Include common conventions for env variables • Development library install example: # yum install petsc-gnu-mvapich2-ohpc End user interaction example with above install: (assume we are a user • wanting to build a PETSC hello world in C) $ module load petsc $ mpicc -I$PETSC_INC petsc_hello.c -L$PETSC_LIB –lpetsc

  8. Typical Cluster Architecture Lustre* storage system • Install guides walk thru high speed network bare-metal install • Leverages image-based compute Master nodes provisioner (Warewulf) (SMS) Data - PXE boot (stateless) Center eth0 eth1 to compute eth interface Networ k to compute BMC interface - optionally connect tcp networking external Lustre file Figure 1: Overview of physical cluster architecture. system $ { } • $ { sms name } # Hostname for SMS server • $ { sms ip } • Obviously need # Internal IP address on SMS server • $ { sms eth internal } # Internal Ethernet interface on SMS • hardware-specific $ { eth provision } # Provisioning interface for computes • $ { internal netmask } # Subnet netmask for internal network • information to support $ { ntp server } # Local ntp server for time synchronization • $ { bmc username } # BMC username for use by IPMI (remote) bare-metal • $ { bmc password } # BMC password for use by IPMI • $ { c ip[0] } , $ { c ip[1] } , ... provisioning # Desired compute node addresses • $ { c bmc[0] } , $ { c bmc[1] } , ... # BMC addresses for computes • $ { c mac[0] } , $ { c mac[1] } , ... # MAC addresses for computes • $ { compute regex } # Regex for matching compute node names (e.g. c*) • Optional: $ { mgs fs name } # Lustre MGS mount name • $ { sms ipoib } # IPoIB address for SMS server • $ { ipoib netmask } # Subnet netmask for internal IPoIB • $ { c ipoib[0] } , $ { c ipoib[1] } , ... # IPoIB addresses for computes • 8

  9. OpenHPC v1.2 - Current S/W components Functional*Areas Components Base6OS CentOS 7.2, SLES126SP1 new6with6v1.2 Architecture x86_64,6aarch64 (Tech6Preview) Conman,6Ganglia, Lmod,6LosF,6Nagios,6pdsh,6prun,6EasyBuild,6ClusterShell,6 Administrative6Tools mrsh,6Genders,6Shine,6Spack Notes: Provisioning6 Warewulf Additional dependencies • that are not provided by Resource6Mgmt. SLURM,6Munge,6PBS6Professional the BaseOS or community repos (e.g. EPEL) are also Runtimes OpenMP, OCR included I/O6Services Lustre client (community6version) 3 rd Party libraries are built • for each compiler/MPI Numerical/Scientific6 Boost,6GSL,6FFTW,6Metis,6PETSc,6Trilinos,6Hypre,6SuperLU,6SuperLU_Dist,6 family (8 combinations Libraries Mumps, OpenBLAS,6Scalapack typically) I/O6Libraries HDF56(pHDF5),6NetCDF (including6C++6and6Fortran6interfaces),6Adios Resulting repositories • currently comprised of Compiler6Families GNU6(gcc,6g++,6gfortran) ~300 RPMs MPI6Families MVAPICH2,6OpenMPI,6MPICH Development6Tools Autotools (autoconf,6automake,6libtool),6Valgrind,R,6SciPy/NumPy Performance6Tools PAPI,6IMB, mpiP,6pdtoolkit TAU,6Scalasca,6ScoreP,6SIONLib 9

  10. Hierarchical Overlay for OpenHPC software Di Distro Re Repo Centos 7 General Tools lmod slurm munge losf and System Services warewulf lustre client ohpc prun pdsh Development Environment Compilers gcc Intel Composer Serial Apps/Libs hdf5-gnu hdf5-intel OHPC Repo OH MPI MVAPICH2 IMPI OpenMPI MVAPICH2 IMPI OpenMPI Toolchains Standalone6 Boost pHDF5 Boost pHDF5 3 rd party6 Parallel boost-gnu-openmpi phdf5-gnu-openmpi boost-intel-openmpi phdf5-intel-openmpi components Apps/Libs boost-gnu-impi phdf5-gnu-impi boost-intel-impi phdf5-intel-impi boost-gnu-mvapich2 phdf5-gnu-openmpi boost-intel-mvapich2 phdf5-intel-mvapich2 single input drives all permutations • packaging conventions highlighted further in paper • 10

  11. Infrastructure 11

  12. Community Build System - OBS https://build.openhpc.community Using the Ope Open Build d • Serv Service e ( OBS) to manage build process OBS can drive builds for • multiple repositories Repeatable builds • carried out in chroot environment Ge Generates binary and sr src • rp rpms Publishes corresponding • package repositories Client/server • architecture supports distributed build slaves and multiple architectures 12

  13. Integration/Test/Validation Testing is a key element for us and the intent is to build upon existing validation efforts and augment component-level validation with targeted cluster-validation and scaling initiatives including: • install recipes • development environment • cross-package interaction • mimic use cases common in HPC deployments Integrated Cluster Testing Dev Parallel Software Tools Libs System Tools Hardware + OpenHPC Perf. + Compilers Tools Resource Manager OS Distribution I/O Libs User Env Provisioner Mini Serial Apps Libs Individual Component Validation

  14. Post Install Integration Tests - Overview Example ./configure output (non-root) Package version............... : test-suite-1.0.0 Build user.................... : jilluser Global6testing6harness6 Build host.................... : master4-centos71.localdomain includes6a6number6of6 Configure date................ : 2015-10-26 09:23 Build architecture............ : x86_64-unknown-linux-gnu embedded6subcomponents: Test suite configuration...... : long Submodule Configuration: major components have • Libraries: configuration options to User Environment: Adios .................... : enabled RMS test harness.......... : enabled enable/disable Boost .................... : enabled Munge..................... : enabled Boost MPI................. : enabled Apps...................... : enabled FFTW...................... : enabled end user tests need to • Compilers................. : enabled GSL....................... : enabled MPI....................... : enabled touch all of the supported HDF5...................... : enabled HSN....................... : enabled HYPRE..................... : enabled compiler and MPI families Modules................... : enabled IMB....................... : enabled OOM....................... : enabled Metis..................... : enabled Dev Tools: we abstract this to repeat • MUMPS..................... : enabled Valgrind.................. : enabled NetCDF.................... : enabled the tests with different R base package............ : enabled Numpy..................... : enabled TBB....................... : enabled compiler/MPI OPENBLAS.................. : enabled CILK...................... : enabled PETSc..................... : enabled Performance Tools: environments: PHDF5..................... : enabled mpiP Profiler........ .... : enabled ScaLAPACK................. : enabled gcc/Intel compiler • Papi...................... : enabled Scipy..................... : enabled PETSc..................... : enabled toolchains Superlu................... : enabled TAU....................... : enabled MPICH, OpenMPI, Superlu_dist.............. : enabled • Trilinos ................. : enabled MVAPICH2, Intel MPI Apps: families MiniFE.................... : enabled MiniDFT................... : enabled HPCG...................... : enabled 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend