Discovery Environment Extensible Data Science workbench and - - PowerPoint PPT Presentation

discovery environment
SMART_READER_LITE
LIVE PREVIEW

Discovery Environment Extensible Data Science workbench and - - PowerPoint PPT Presentation

Discovery Environment Extensible Data Science workbench and data-centric collaboration platform powered by iRODS CyVerse Discovery Environment Development Team University of Arizona Agenda Discovery Environment (DE) Overview Overview


slide-1
SLIDE 1

Discovery Environment

Extensible Data Science workbench and data-centric collaboration platform powered by iRODS

CyVerse Discovery Environment Development Team University of Arizona

slide-2
SLIDE 2
  • Discovery Environment (DE) Overview
  • Overview
  • Features
  • Technology Choices
  • Terrain API
  • Overview
  • Available Documentation
  • Brief Demonstration
  • Visual Interactive Computing Environment (VICE)
  • Overview
  • Architecture
  • Demonstration

Agenda

slide-3
SLIDE 3
  • Motivation
slide-4
SLIDE 4
  • Scalable
  • Extensible
  • Low barrier of entry
  • High productivity

Requirements

slide-5
SLIDE 5

Overview - Usage Statistics

slide-6
SLIDE 6
  • CyVerse Data Store
  • Share data sets
  • Search all data that is accessible
  • Automatically detect format and type of data in files
  • Third party and built-in data visualization tools
  • Genome browsers (Ensembl, UCSC, JBrowse, etc)

with byte-range service

  • Tabular data view
  • Metadata management, tags and comments

Data Management

slide-7
SLIDE 7
  • Graphical user interface to apps
  • Apps can target several different platforms
  • Add your own tools and apps
  • Apps can be chained together in a pipeline
  • GPU support for VICE
  • More than 500 apps with documentation

and example data sets

  • Almost 300 distinct Docker images

Tools and Apps

slide-8
SLIDE 8
  • What’s an analysis?
  • Control resource allocation (CPU, RAM, Disk)
  • Output files are uploaded to the CyVerse Data Store
  • Parameters are recorded
  • You’ll be notified when an analysis completes
  • Batch processing

Analyses

slide-9
SLIDE 9
slide-10
SLIDE 10
  • Avoid the monolith trap
  • Scalability
  • Extensibility
  • Customization

Terrain - Goals

slide-11
SLIDE 11
  • Avoid the monolith trap
  • https://de.cyverse.org/terrain/docs
  • Interactive console
  • Dynamic documentation
  • Work in progress
  • https://cyverse-de.github.io/api
  • Ignore the authentication instructions
  • Mostly complete
  • Documentation for some endpoints is outdated

Terrain - Documentation

slide-12
SLIDE 12

Terrain Demonstration

Documentation, Console, Command Line

slide-13
SLIDE 13
  • Work with software and

data interactively

  • Visualize
  • Experiment
  • Discover
slide-14
SLIDE 14
slide-15
SLIDE 15

Demonstration

Discovery Environment Inception

slide-16
SLIDE 16
  • Integrate 3rd Party storage providers
  • BYOC
  • Improved UX
  • Singularity support
  • 3rd party install

Future