Using Property Graphs for Rich Metadata Management in HPC Systems - PowerPoint PPT Presentation

Using Property Graphs for Rich Metadata Management in HPC Systems Dong Dai , Robert B. Ross, Philip Carns, Dries Kimpe, and Yong Chen 1

Rich Metadata in HPC • The data used to describe other data • Simple Metadata - A. Leung, et. al. Magellan: A Searchable Metadata Architecture for Large-Scale File Systems • Rich Metadata • HPC systems heavily rely on these metadata • inode attributes for file management - S. A. Weil, et. al. Ceph: A Scalable, High-Performance Distributed File System • location information for directories and files stored across metadata server • provenance information partially collected and stored - Wf4Ever Research Object Model 1.0, http://wf4ever.github.io/ro/ 2

Rich Metadata in HPC Users Machines Processes Programs Threads Data Files HPC 1. Diverse metadata need to be managed; 2. Relationships need to be captured 3

Rich Metadata in HPC name, id, group, permission, … machine name, ip_addr, dc, rack, … process id, job, machine, reads, Users writes, start_ts, finish_ts, … Machines job id, params, config, inputs, outputs, start_ts, finish_ts, … Processes Programs Threads Data Files HPC file name, location, size, permission, parent, children, … 1. Diverse metadata need to be managed; 2. Relationships need to be captured 3

Rich Metadata in HPC name, id, group, permission, … machine name, ip_addr, dc, rack, … Relationships (Provenance) process id, job, machine, reads, Users writes, start_ts, finish_ts, … Machines job id, params, config, inputs, outputs, start_ts, finish_ts, … Processes Programs Threads Data Files HPC file name, location, size, permission, parent, children, … 1. Diverse metadata need to be managed; 2. Relationships need to be captured 3

Rich Metadata Challenges • Metadata Integration • diverse metadata should be collected from different components • diverse metadata should be managed in a unified way • Storage System Pressure • large volume of metadata generated from different components • high concurrent insert rates from parallel applications • Efficient Processing and Querying • some operations exist in the critical execution path of applications • some operations require complex query and searching 4

Graph-based Solution • Based on Property Graph Model time: 5-years type:fan-of-team name: Alice name:Cowboy location: EU type: football time: 3-years time: 2-years type:friends type:player-of-team name: Bob location: US 5

Graph-based Solution • Based on Property Graph Model Vertex time: 5-years type:fan-of-team name: Alice name:Cowboy location: EU type: football time: 3-years time: 2-years type:friends type:player-of-team name: Bob location: US Edge Properties/Attributes 5

Graph-based Solution • Based on Property Graph Model Vertex Motivation: time: 5-years type:fan-of-team name: Alice name:Cowboy location: EU type: football Metadata Integration • Storage Pressure • time: 3-years time: 2-years type:friends type:player-of-team Graph-based Traversal • name: Bob location: US Edge Properties/Attributes 5

Graph-based Solution • Based on Property Graph Model Vertex Motivation: time: 5-years type:fan-of-team name: Alice name:Cowboy location: EU type: football Metadata Integration • Storage Pressure • time: 3-years time: 2-years type:friends type:player-of-team Graph-based Traversal • name: Bob location: US name : sam User Entity Edge group : cgroup name : john Execution Entity group : admin File Entity run run Properties/Attributes exe write name : job201405 params : -n 1024 ..., ... read read exe write ts : 20140501 name : app-01 size : 256KB writeSize : 7M name : dset-1 ..., ... ..., ... size : 1020M ..., ... 5

Map HPC Metadata to Graph 6

Map HPC Metadata to Graph • Entity => Vertex • Data Object: represents the basic data unit in storage • Executions: represents applications including Jobs, Processes, Threads • User: represents real end user of a system • Users allowed to define their own entities 6

Map HPC Metadata to Graph • Entity => Vertex • Data Object: represents the basic data unit in storage • Executions: represents applications including Jobs, Processes, Threads • User: represents real end user of a system • Users allowed to define their own entities • Relationship => Edge • Relationships between different entities are mapped as edges • User runs Executions. An edge with type ‘ Run’ is created between them • Reversed relationships also are defined • Users allowed to define their own relationships 6

Map HPC Metadata to Graph • Entity => Vertex • Data Object: represents the basic data unit in storage • Executions: represents applications including Jobs, Processes, Threads • User: represents real end user of a system • Users allowed to define their own entities • Relationship => Edge • Relationships between different entities are mapped as edges • User runs Executions. An edge with type ‘ Run’ is created between them • Reversed relationships also are defined • Users allowed to define their own relationships • Attributes => Property • On both Entity and Relationship • Stored as Key-Value pairs attached on vertices and edges 6

name : sam User Entity id : 430823375 name : John Execution Entity id : 330862395 Create an Example Graph File Entity run run exe write id : 2726768805 params : -n 2048 ..., ... read read exe write ts : 20130101... name : 2111648390 ..., ... writeSize : 7M name : 203863... fs-type : gpfs ..., ... • Each log file => one Job jobid, start_time, end_time, exe • • Each uid => one User • All Ranks => Processes nprocs, file_access • • File and exe => Data Object • Synthetically create directory structure Complete set of logs from Intrepid in 2013 data files visited by the same execution will be • placed under the same directory 42% of all core-hours consumed in 2013 directories accessed by the same user are placed • under one directory 7

Sample Graph: Size detailed level User Applications Processes ( I/O Ranks) Files Processes (All Ranks) 8

Sample Graph: Structure • Common Attribute • most entities have small degree • small number of entities have much huge degree • Skewed power-law distribution • many nature graphs belong to this category • obey: • Further investigation also confirm they fit the power- law distribution 9

Operations on the Graph: Namespace Traversal • Hierarchical Namespace Traversal • Present logical layout of data sets to users • traditional POSIX-style tree-structure directory • The metadata graph already contains • belongs/contains relationships between Data Objects vertices • directory can be considered as Data Object entity too • locate files by given path 1. locate the root directory in the graph 2. repeatedly travel through contains edges from directory vertices to directory or files vertices Locate -> Traversal -> Filter -> Traversal 10

Operations on the Graph: Data Audit • Data Audit • The metadata graph already contains • run relationships between Users and Executions • read/write relationships between Executions and Data Objects • additional attributes are also recorded with these relationships • locate files accessed by a specific user in a given time frame 1. locate the given user in the graph 2. travel through run edges from User to Execution 3. filter execution based on the time frame 4. travel through read edges from Executions to Data Objects Locate -> Traversal -> Filter -> Traversal 11

Operations on the Graph: Provenance Search • Provenance Support • Wide range of use cases • data sharing, reproducibility, work-flow #8 Problems: Given a fMRI workflow with multiple stages processing. • The metadata graph already contains • Relationships between different entities Try to find the Execution whose model is ‘AlignWarp’ and inputs have • User-defined attributes and relationships annotation [‘center’:’UChicago’] • #8 in the first Provenance Challenge 1. Use graph to abstract the workflow executions 2. Search all Executions with model “AlignWarp” 3. Travel through read edges to Data Objects entities 4. Filter based on property ‘center’ (‘UChicago’) Search Attributes -> Traversal -> Filter -> Traversal 12

Using Property Graphs for Rich Metadata Management in HPC Systems - PowerPoint PPT Presentation

Using Property Graphs for Rich Metadata Management in HPC Systems Dong Dai , Robert B. Ross, Philip Carns, Dries Kimpe, and Yong Chen 1 Rich Metadata in HPC The data used to describe other data Simple Metadata - A. Leung, et. al.

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Building a model So far, we have talked about prediction , where the purpose of learning is to be

ECS 235B, Lecture 23 March 6, 2019 March 6, 2019 ECS 235B, Foundations of Computer and

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data

Attribute-Based Signatures [Maji et al. 2008] . Users have attributes (e.g. Departmental

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

Section 18.3 Learning Decision Trees CS4811 - Artificial Intelligence Nilufer Onder Department

Receding Horizon Control Mar a M. Seron September 2004 Centre for Complex Dynamic Systems

Life Between Systems Martin Brynskov AU Smart Cities @brynskov pit.au.dk AARHUS GOTO 2013

Sambuz

Useful Links

Newsletter

Mail Us

Using Property Graphs for Rich Metadata Management in HPC Systems - PowerPoint PPT Presentation

Using Property Graphs for Rich Metadata Management in HPC Systems Dong Dai , Robert B. Ross, Philip Carns, Dries Kimpe, and Yong Chen 1 Rich Metadata in HPC The data used to describe other data Simple Metadata - A. Leung, et. al.

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Building a model So far, we have talked about prediction , where the purpose of learning is to be

ECS 235B, Lecture 23 March 6, 2019 March 6, 2019 ECS 235B, Foundations of Computer and

The Syntax of Classes and Objects in Python Defining a Class - &quot;Inventing a Composite Data

Attribute-Based Signatures [Maji et al. 2008] . Users have attributes (e.g. Departmental

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

Section 18.3 Learning Decision Trees CS4811 - Artificial Intelligence Nilufer Onder Department

Receding Horizon Control Mar a M. Seron September 2004 Centre for Complex Dynamic Systems

Life Between Systems Martin Brynskov AU Smart Cities @brynskov pit.au.dk AARHUS GOTO 2013

Sambuz

Useful Links

Newsletter

Mail Us

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data