- www. chameleoncloud.org
APRIL 23, 2018
1
Kate Keahey Computation Institute, University of Chicago Argonne - - PowerPoint PPT Presentation
www. chameleoncloud.org CHAMELEON: A LARGE SCALE, RECONFIGURABLE EXPERIMENTAL INSTRUMENT FOR COMPUTER SCIENCE Kate Keahey Computation Institute, University of Chicago Argonne National Laboratory keahey@anl.gov 1 APRIL 23, 2018 INTRODUCING
APRIL 23, 2018
1
Deeply Reconfigurable
Instrument for Computer Science Research Support for isolation, bare metal reconfiguration, custom kernel reboot,
Large-scale Experimental Infrastructure
Total of ~650 nodes (~14,500 cores), 5 PB of storage distributed over 2 sites
Large-scale homogenous partition Heterogeneous hardware: Infiniband, FPGAs, GPUs, ARMs, Atoms, etc. Support for large-scale in capabilities and policies
Developed primarily on top of commodity open source system
Leverages community investment in the project Contributes to development: revival of Blazar, contributions to Ironic, Nova,
Interacts with the community via the scientific working group
www.chameleoncloud.org
SCUs connect to core and fully connected to each other
Heterogeneous Cloud Units
Alternate Processors and Networks
Standard Cloud Unit
42 compute 4 storage
x10
Chicago
To UTSA, GENI, Future Partners
Austin
100Gbps uplink public network (each site)
3.6 PB Central File Systems, Front End and Data Movers
Core Services
Front End and Data Mover Nodes 504 x86 Compute Servers 48 Dist. Storage Servers 102 Heterogeneous Servers 16 Mgt and Storage Nodes
Standard Cloud Unit
42 compute 4 storage
x2
Appliance Catalog Allocation Management Grid’5000 Resource Discovery Blazar Ironic Ceilometer Keystone Web portal Horizon TAS (TACC) Request Tracker Nova Neutro n Swift Glance Heat Chameleon Instance
Utilities Agents Clients
Monitoring Services Configuration Services Resource Management Services Discovery Services Appliance Catalog User Services
Building on top of a commodity open source project
Significant advantages in terms of direct and indirect community investment Advantages for long-term maintenance
We need more than a testbed to support CS research
Traces and workloads, research data Tools for repeatability
New concept: myths and misperceptions
Not true: “only available to users with NSF allocation” Not true: “they use OpenStack so it is VMs” Not true: “can’t get an experiment with 100s of nodes”
Managing incentives
Balancing individual versus community needs: allocations and lease limits Resource scarcity
APRIL 23, 2018
8