CANFAR platform for data-intensive research David Schade Canadian - - PowerPoint PPT Presentation

canfar platform for data intensive research
SMART_READER_LITE
LIVE PREVIEW

CANFAR platform for data-intensive research David Schade Canadian - - PowerPoint PPT Presentation

CANFAR platform for data-intensive research David Schade Canadian Advanced Network for Astronomy Research (CANFAR) Canadian Astronomy Data Centre National Research Council Canada & University of Victoria The Canadian organizations Canadian


slide-1
SLIDE 1

CANFAR platform for data-intensive research

David Schade

Canadian Advanced Network for Astronomy Research (CANFAR) Canadian Astronomy Data Centre National Research Council Canada & University of Victoria

slide-2
SLIDE 2

The Canadian organizations

Canadian Astronomy Data Centre

  • Curation of Canada’s national astronomy data collections
  • 29 years supporting Canadian university researchers
  • National Research Council Canada / GoC

Compute Canada

  • Canada’s national Advanced Research Computing organization

CANFAR

  • National consortium of university astronomers
  • Directs CANFAR development and operations
slide-3
SLIDE 3

CADC-CANFAR

CADC is part of the National Research Council Canada (government) 29 years of experience in data management CANFAR is the Canadian Advanced Network for Astronomy Research

  • Consortium of university astronomers

Compute Canada is the national organization that provides Advanced Research Computing HPC

  • Now moving toward support for data-intensive research
slide-4
SLIDE 4

Canadian Astronomy Data Centre

We began as a Data Centre

  • Data curation
  • Long-term preservation
  • Distribution
  • Telescope collections:
  • Multiple missions, facilities and

wavelengths

  • 12 telescopes
  • 22 staff
  • 6 Scientists
  • 5 Operations staff
  • 10 Developers
  • Admin
slide-5
SLIDE 5

Canadian Advanced Network for Astronomical Research A cloud ecosystem for data intensive astronomy

  • User services
  • Store and share data
  • Create and configure VMs
  • Run interactive VMs
  • Run persistent VMs
  • Batch processing with VMs
  • Support for visualization & analystics
  • Using research cloud resources
  • Compute Canada
  • Integrated authentication and authorization
slide-6
SLIDE 6

Big Data

The era of “silo-ed” data centre is dead The fundamental problem now is to develop a range of architectures that couple data to processing, networking, and services in ways that support researchers

slide-7
SLIDE 7

CANFAR serves a global research community

slide-8
SLIDE 8

CANFAR/CADC 2014

  • Size:
  • 932M files
  • 2.3 PiB
  • Users
  • Authenticated access: 762
  • Anonymous access: 7,544
  • Registered: 7,018
  • Data moved in the last year
  • TiB: 1,106
  • Files: 91M
slide-9
SLIDE 9

Leap in data transfers 2010

slide-10
SLIDE 10
  • Large national computing

infrastructure

  • Agencies pushing

researchers to use it

  • Limited success in data-

intensive astronomy

  • Users must adapt to local OS,

software and policies

  • Conflicting demands
  • Limited mobility

Context: Compute Canada

slide-11
SLIDE 11

CANFAR as a platform

CANFAR develops, integrates, and operates the services

  • Distributed storage
  • VOSpace: user-managed storage
  • Batch cloud processing
  • Interactive and persistent VMs
  • Authentication and Authorization

CANFAR supports users of the services Compute Canada provides hardware There is a Compute Canada-CANFAR Operations Committee

slide-12
SLIDE 12

COMPUTE CANADA: things are changing

Compute Canada (CC) has new funding and is committed to serving all Advanced Research Computing needs New funding program for CC emphasizes data-intensive research The future of CC lies in providing services rather than hardware CADC is contracting with CC to develop services

  • Project kick-off meeting November 5,6

CADC is pushing for generic research services

slide-13
SLIDE 13

CANFAR: generic research platform

slide-14
SLIDE 14

Why Federate International e-Infrastructures?

slide-15
SLIDE 15

CANFAR: Observatory Partners / Primary Data Producers in astronomy

  • Canada
  • France
  • United States
  • United Kingdom
  • Netherlands
  • Argentina
  • Brazil
  • Chile
  • Australia
  • Korea
  • China
  • Taiwan
  • Japan
  • + ESA

members

slide-16
SLIDE 16

Where are the consumers of CANFAR data & services?

slide-17
SLIDE 17

Science, Facilities, Data

  • All Canadian astronomy is collaborative, global,

reciprocal

  • Many other sciences are the same
  • All Canadian observing facilities are multi-national
  • All Canadian science teams are multi-national
  • Shared e-infrastructure needs to be multi-national
slide-18
SLIDE 18

European Grid Initiative: CANFAR/INAF/EGI

18

studies will be launched at PM15. Canadian Advanced Network for Astronomical Research (Lead: INFN) (M6 – M30) The Canadian Advanced Network for Astronomical Research (CANFAR)20 is a computing infrastructure for astronomers in Canada. International collaboration in the Astronomy discipline will be supported both by the Canadian Astronomy Data Centre (CADC) and EGI. CANFAR and EGI will work together to integrate both e-Infrastructures towards a seamless and uniform platform for international astronomy research

  • collaboration. Community services will be provided on top of the federated cloud of EGI using open source

solutions and re-using the CANFAR experience. Integration for gCube and the D4Science infrastructure (M1 - M12)

PROPOSAL – Technical Annex

Sections 1-3: Excellence, Impact & Implementation

Proposal full title: Engaging the Research Community towards an Open Science Commons Proposal acronym: EGI-Engage Call: EINFRA-1-2014

slide-19
SLIDE 19

CADC CANFAR

slide-20
SLIDE 20

Key Data Activities

  • Data engineering
  • Operations and user support
  • Software development
  • Software integration
  • Data processing
  • Data management
  • User web services
  • User web interfaces

CANFAR/CADC

STORAGE MANAGEMENT

2.6 PETABYTES

PROCESSING MANAGEMENT

954 COMPUTE CORES

META DATA MANAGEMENT

8.5 TERABYTES

Compute Canada Victoria Compute Canada Saskatoon NRC Herzberg Victoria Meta Data Compute Canada Victoria

Archive Data

Compute Canada Calgary NRC Herzberg Victoria

UNIVERSITY RESEARCHER CLIENT TELESCOPE CLIENT

User Data Meta Data Queries VM Images VM Control Interactive use

  • f VMs

Processing Control

University researchers and telescope staff have privileges to upload data, create VMs and install science applications, run interactive VM sessions, submit batch processing jobs to VMs, share their VMs, control the life-cycle for their VMs, offer software-as-a-service applications in their VMs. Definition: VM – Virtual Machine

Archive Data Telescope Data Algorithms and Software Meta Data Queries Processing Control VM Images Interactive use

  • f VMs

VM Service Creation and Deployment Data In Data Out # of files Terabytes # of files Terabytes Peak per day 2,169,190 8.0 648,093 16.8 Avg per day 130,952 0.4 99,253 2.6

S

E R V I C E S

S

E R V I C E S