data commons and data ecosystems
play

Data Commons and Data Ecosystems Phillis Tang Center for - PowerPoint PPT Presentation

Introduction to the Gen3 Platform for Data Commons and Data Ecosystems Phillis Tang Center for Translational Data Science University of Chicago & Open Commons Consortium Data Commons Data organize the data for a Warehouses scientific


  1. Introduction to the Gen3 Platform for Data Commons and Data Ecosystems Phillis Tang Center for Translational Data Science University of Chicago & Open Commons Consortium

  2. Data Commons Data organize the data for a Warehouses scientific discipline, community , or field and are enabled by Data warehouses large scale cloud organize the data for computing. an organization (and are enabled by enterprise computing) Databases organize the data around a project .

  3. Multi-Discipline Discipline (Virtual) Organization Data Ecosystems Data Commons 2018 - 2028 Project 2014 - 2024 Data Clouds • Interoperates multiple • Supports large data 2010 - 2020 data commons, • Workspaces databases, knowledge • • Supports large data & Common data models bases , and other • data intensive computing Databases Core data services resources • with cloud computing Data & Commons 1982 - present • Supports ecosystem of • Researchers can analyze Governance commons, portals, • Data repository • data with collaborative Harmonized data notebooks, applications & • Data catalogs • tools ( workspaces ) – so Data sharing simulations across • Download data • data does not have to be Reproducible research multiple disciplines downloaded)

  4. Genomic Data Commons - data exploration

  5. AW AWS S3 S bucket with Gen3 Secure Environment data Authorization Log Database security Google Gen3 Stack events bucket with Graph Data data Database On-Prem Controlled ingress from outside bucket with data Authentication via Presigned urls to Single Sign On (SSO) directly access buckets Users for raw data

  6. Data Access Control • Bucket policy prevents access by unauthorized users Cloud Bucket With Data • Data access is logged for auditing and compliance • Gen3 Auth(Fence) provides Authentication and Authorization, and Data Access. Gen3 Auth • Gen3 Auth works with multiple identify providers (IdP) including Google, and easily adaptable for any support OIDC provider • This enables Single Sign On (SSO) compatibility with most systems • Authorization for data access via internal Access Control List specified by the stakeholders

  7. Data Access Control • Gen3 auth has a Role Based Access Control (RBAC) engine Gen3 Auth The RBAC engine understands the hierarchical nature of a users permissions, and can be used to determine if the user has access to a specific piece of data Program Alpha Project Adam Project Baker Project Charlie Authorization for a user would then be stored as: Case Zulu Case Mike rgrossman1@uchicago.edu: resources: Sample 1 Sample 1 - resource: /programs/alpha/projects/baker privilege: [create, read, read-storage, write-storage] - resource: /programs/alpha/projects/adam/cases/zulu privilege: [read, read-storage] Giving write (submission) access to the Baker project and all nodes underneath it, while read access to only the Zulu case in the Adam project

  8. Data Access Control ● Query gateway provides the potential to limit the queries that users can perform and control when Query Gateway results are returned. Examples of queries: Query1: StandardDeviation(variable) where STUDENTS_GENDER is MALE Blue = querying user can specify Results returned only when # of students represented in the query > a threshold. I.e. only return standard deviations when the query is computing it for at least 10 students.

  9. Jupyter Notebooks • Jupyter Notebooks are powerful tools for creating custom analysis over datasets Jupyter • Gen3 runs Jupyter Notebooks in a secure cloud environment helping to reduce the need to download data to laptops, etc.

  10. Data Ontologies Dictionary viewer • Gen3 dictionary viewer allows browsing data vocabularies within a particular data commons

  11. Data Ontologies • Ontologies contain controlled vocabulary developed by a PFB standards body. • Data dictionaries contain references to the ontology terms allowing harmonization of differing data dictionaries

  12. Data Aggregation

  13. Data & User Flow with Gen3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend