CANFAR platform for data-intensive research David Schade Canadian - - PowerPoint PPT Presentation
CANFAR platform for data-intensive research David Schade Canadian - - PowerPoint PPT Presentation
CANFAR platform for data-intensive research David Schade Canadian Advanced Network for Astronomy Research (CANFAR) Canadian Astronomy Data Centre National Research Council Canada & University of Victoria The Canadian organizations Canadian
The Canadian organizations
Canadian Astronomy Data Centre
- Curation of Canada’s national astronomy data collections
- 29 years supporting Canadian university researchers
- National Research Council Canada / GoC
Compute Canada
- Canada’s national Advanced Research Computing organization
CANFAR
- National consortium of university astronomers
- Directs CANFAR development and operations
CADC-CANFAR
CADC is part of the National Research Council Canada (government) 29 years of experience in data management CANFAR is the Canadian Advanced Network for Astronomy Research
- Consortium of university astronomers
Compute Canada is the national organization that provides Advanced Research Computing HPC
- Now moving toward support for data-intensive research
Canadian Astronomy Data Centre
We began as a Data Centre
- Data curation
- Long-term preservation
- Distribution
- Telescope collections:
- Multiple missions, facilities and
wavelengths
- 12 telescopes
- 22 staff
- 6 Scientists
- 5 Operations staff
- 10 Developers
- Admin
Canadian Advanced Network for Astronomical Research A cloud ecosystem for data intensive astronomy
- User services
- Store and share data
- Create and configure VMs
- Run interactive VMs
- Run persistent VMs
- Batch processing with VMs
- Support for visualization & analystics
- Using research cloud resources
- Compute Canada
- Integrated authentication and authorization
Big Data
The era of “silo-ed” data centre is dead The fundamental problem now is to develop a range of architectures that couple data to processing, networking, and services in ways that support researchers
CANFAR serves a global research community
CANFAR/CADC 2014
- Size:
- 932M files
- 2.3 PiB
- Users
- Authenticated access: 762
- Anonymous access: 7,544
- Registered: 7,018
- Data moved in the last year
- TiB: 1,106
- Files: 91M
Leap in data transfers 2010
- Large national computing
infrastructure
- Agencies pushing
researchers to use it
- Limited success in data-
intensive astronomy
- Users must adapt to local OS,
software and policies
- Conflicting demands
- Limited mobility
Context: Compute Canada
CANFAR as a platform
CANFAR develops, integrates, and operates the services
- Distributed storage
- VOSpace: user-managed storage
- Batch cloud processing
- Interactive and persistent VMs
- Authentication and Authorization
CANFAR supports users of the services Compute Canada provides hardware There is a Compute Canada-CANFAR Operations Committee
COMPUTE CANADA: things are changing
Compute Canada (CC) has new funding and is committed to serving all Advanced Research Computing needs New funding program for CC emphasizes data-intensive research The future of CC lies in providing services rather than hardware CADC is contracting with CC to develop services
- Project kick-off meeting November 5,6
CADC is pushing for generic research services
CANFAR: generic research platform
Why Federate International e-Infrastructures?
CANFAR: Observatory Partners / Primary Data Producers in astronomy
- Canada
- France
- United States
- United Kingdom
- Netherlands
- Argentina
- Brazil
- Chile
- Australia
- Korea
- China
- Taiwan
- Japan
- + ESA
members
Where are the consumers of CANFAR data & services?
Science, Facilities, Data
- All Canadian astronomy is collaborative, global,
reciprocal
- Many other sciences are the same
- All Canadian observing facilities are multi-national
- All Canadian science teams are multi-national
- Shared e-infrastructure needs to be multi-national
European Grid Initiative: CANFAR/INAF/EGI
18
studies will be launched at PM15. Canadian Advanced Network for Astronomical Research (Lead: INFN) (M6 – M30) The Canadian Advanced Network for Astronomical Research (CANFAR)20 is a computing infrastructure for astronomers in Canada. International collaboration in the Astronomy discipline will be supported both by the Canadian Astronomy Data Centre (CADC) and EGI. CANFAR and EGI will work together to integrate both e-Infrastructures towards a seamless and uniform platform for international astronomy research
- collaboration. Community services will be provided on top of the federated cloud of EGI using open source
solutions and re-using the CANFAR experience. Integration for gCube and the D4Science infrastructure (M1 - M12)
PROPOSAL – Technical Annex
Sections 1-3: Excellence, Impact & Implementation
Proposal full title: Engaging the Research Community towards an Open Science Commons Proposal acronym: EGI-Engage Call: EINFRA-1-2014
CADC CANFAR
Key Data Activities
- Data engineering
- Operations and user support
- Software development
- Software integration
- Data processing
- Data management
- User web services
- User web interfaces
CANFAR/CADC
STORAGE MANAGEMENT
2.6 PETABYTES
PROCESSING MANAGEMENT
954 COMPUTE CORES
META DATA MANAGEMENT
8.5 TERABYTES
Compute Canada Victoria Compute Canada Saskatoon NRC Herzberg Victoria Meta Data Compute Canada Victoria
Archive Data
Compute Canada Calgary NRC Herzberg Victoria
UNIVERSITY RESEARCHER CLIENT TELESCOPE CLIENT
User Data Meta Data Queries VM Images VM Control Interactive use
- f VMs
Processing Control
University researchers and telescope staff have privileges to upload data, create VMs and install science applications, run interactive VM sessions, submit batch processing jobs to VMs, share their VMs, control the life-cycle for their VMs, offer software-as-a-service applications in their VMs. Definition: VM – Virtual Machine
Archive Data Telescope Data Algorithms and Software Meta Data Queries Processing Control VM Images Interactive use
- f VMs
VM Service Creation and Deployment Data In Data Out # of files Terabytes # of files Terabytes Peak per day 2,169,190 8.0 648,093 16.8 Avg per day 130,952 0.4 99,253 2.6
S
E R V I C E S
S
E R V I C E S