simpleArchive as a Service Marius Politze RWTH Aachen University - - PowerPoint PPT Presentation

simplearchive as a service
SMART_READER_LITE
LIVE PREVIEW

simpleArchive as a Service Marius Politze RWTH Aachen University - - PowerPoint PPT Presentation

simpleArchive as a Service Marius Politze RWTH Aachen University IT Center Content Challenge: How to get researchers to archive their data? Our solution: make it simple simpleArchive concept Demo Scaling simpleArchive as a


slide-1
SLIDE 1

simpleArchive as a Service

Marius Politze RWTH Aachen University IT Center

slide-2
SLIDE 2

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 2

Content

  • Challenge: How to get researchers to archive their data?
  • Our solution: make it simple

 simpleArchive concept  Demo

  • Scaling simpleArchive as a service
  • Conclusion and future challenges
slide-3
SLIDE 3

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 3

Publications, Data, Metadata – A Research Data Infrastructure

RWTH Publications

  • Nachweis
  • Volltext
  • Verweis

Archive

(Text-)

Publications

(Research-)

Data Metadata

  • f research data

Metadata Store

yes publish?

PID

no link yes publish? no

+ + visibility – –

slide-4
SLIDE 4

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 4

Archiving Until Now

  • https://doc.itc.rwth-aachen.de/display/ARC/Archiv+Knoten+anlegen
  • https://doc.itc.rwth-aachen.de/display/ARC/TSM+Installation
  • https://doc.itc.rwth-aachen.de/display/ARC/TSM+Konfiguration+-+Archiv
  • https://doc.itc.rwth-aachen.de/display/ARC/Benutzung+des+TSM-Clients
slide-5
SLIDE 5

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 5

Requirements

  • Allow researchers to archive “small” files

 Up to 2GB  Make it a free service so researchers will use it  Reduce costs by storing on tape

  • Reuse existing concepts and applications

 Allow use in federated context  Reduce development and maintenance costs by using available systems

  • Make sharing of archived data as easy as archiving

 Archived data is not necessarily open access  Let researchers restore their data  … and let them share it using a simple URL

  • Make archived data globally identifiable using PIDs

 So researchers can reference it elsewhere  … and can retrieve it using the PID

slide-6
SLIDE 6

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 6

Archiving with simpleArchive

slide-7
SLIDE 7

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 7

Archive and Restore Process (simplified)

User

  • Temp. File

System ePIC Tape Archive

upload file create PID save file schedule archival archive file notify user request file retrieve file schedule restore create temporary download notify user

Timestamp

sign file hash

slide-8
SLIDE 8

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 8

simpleArchive is an implementation of a process not an application!

slide-9
SLIDE 9

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 9

Concept: Software Layers

Infrastructure Base applications and services Common Userinterfaces Zenodo/Invenio Sciebo mit FDM Erweiterungen Metadatentool PID Datenmanagementpläne Common Processes Virtualized Compute Object Store ISP Rosetta

Zugang & Nachnutzung Datenportale,

  • publikationen

Private Domäne Forscher Arbeitsgruppe Gruppen- domäne Kollaborative Zusammenarbeit Dauerhafte Domäne Archiv

IdM / Roles / Rights / DFN-AAI

PID

slide-10
SLIDE 10

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 10

Loadbalancer and internet connection to DFN Network

  • DNS Loadbalancing
  • Redundant sites in Aachen (SW23 and WW10)
  • Rendundant connection to DFN Network

User Interface: app.rwth-aachen.de

  • Shared Hosts with process layer
  • Acesses process layer via load balancers

Processes: moped.ecampus.rwth-aachen.de

  • 4 VMs at Redundant in sites Aachen (SW23 and WW10)
  • Each site retains capacity to keep services available in case of site failure
  • Homogeneous access to base applications and services
  • Automated deployment

Base applications and services

  • Base on specific OLAs with the service providers
  • Partially redundant, cold standby or desaster recovery
  • Failures in these systems impact only dependent processes

Infrastructure Since 2016

SW23 WW10 REST Application Proxy REST Application Proxy REST Application Proxy REST Application Proxy

Loadbalancer Loadbalancer

ePIC GigaMove ISP

Internet

UI Server UI Server

slide-11
SLIDE 11

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 11

REST Application Proxy UI Server

Loadbalancer

  • Pro

 Simple for customers and providers  Only single instance reduces maintenance costs  Reuses already available federated infrastructures like DFN-AAI

  • Con

 Failure in the instance impacts all customers  Does not scale for data or compute intensive services  Researchers and service providers often want to keep services local

Scaling Out: Vision 2018 Providing FDM Processes and Infrastructure as a service

REST Application Proxy

Loadbalancer

ISP

Internet

UI Server IT Center RWTH Aachen Access

ePIC GigaMove

RWTH Aachen FZ Jülich Partner A

DFN-AAI / eduGAIN

slide-12
SLIDE 12

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 12

RWTH Aachen Partner A REST Application Proxy UI Server

Loadbalancer

  • Pro

 Mirroring infrastructure components increases redundancy  Local services remain for local users and  Services can be used cross-site

  • Con

 Maintaining multiple infrastructures becomes expensive  Instead of core scientific processes sites may degenerate to support only local services

Scaling Out: Vision 2018 Scaling by adding new sites

REST Application Proxy

Loadbalancer

TSM

UI Server IT Center RWTH Aachen Access

ePIC GigaMove

REST Application Proxy UI Server

Loadbalancer

REST Application Proxy

Loadbalancer

TSM UI Server Partner A Access

Internet

slide-13
SLIDE 13

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 13

RWTH Aachen You? Partner A REST Application Proxy UI Server

Loadbalancer

  • Pro

 Compute and data capacity provided locally  Easy cross-site reuse of services  Using available federative infrastructures  Standardized processes allow interoperability

  • Con

 Failure in process layer impacts all users  OLAs required to control users and processes

Scaling Out: Vision 2018 Scaling by adding base applications and services from other sites

REST Application Proxy

Loadbalancer

ISP

Internet

UI Server IT Center RWTH Aachen Access

ePIC GigaMove

ISP You? DFN-AAI / eduGAIN ePIC Partner A

slide-14
SLIDE 14

simpleArchive as a service Marius Politze Kolloquium der Betreiber von Forschungsdateninfrastrukturen in NRW, 15.12.2017 14

Conclusion & Future Challenges

  • simpleArchive is available to all researchers at RWTH Aachen since Q3 2016
  • Implementation of process reuses existing systems and APIs
  • Focusing on the process rather the technology reduces vendor-lockin
  • Process needs to be backed by local policies

 How long is the data actually stored?  Who can restore the data?  Can archives be transferred?  Can archives be deleted?

  • Combine scaling methods to build a process oriented

cloud-like ecosystem

PID Service Archive Cache OAuth Authorization User Information Application Proxy Simple Archive UI

slide-15
SLIDE 15

Thank you for your attention Vielen Dank für Ihre Aufmerksamkeit