MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, Steve DiBenedetto, Catherine Olschanowsky, Christos Papadopoulos NDNcomm 2015 Sept 28, 2015 Los Angeles, CA Supported by NSF #13410999 and NSF#1345236
Introduction Scientific data is often very large and complex Climate - CMIP5: 3.5 PB, CMIP6: 350PB-3EB Physics - Atlas: 4 PB/Year Astronomy, bioinformatics, others… Science infrastructure Cutting edge hardware but often incompatible domain software (ESGF, xrootd, etc.) Complexity, replication, redundancy 1 1
Our Project Build and deploy software to evaluate NDN in scientific applications over a dedicated hardware infrastructure Evaluate NDN in the context of: Application services: publishing, discovery, retrieval, access control, load balancing, failover, caching, etc. Network integration (OSCARS, SDN, etc.) Metrics Performance, reduced complexity, ease of deployment, interoperability, reuse, efficiency, routing, security/trust, etc. 2 2
NDN Layer Structure host host UDP/IP UDP/IP 3
NDN Layer Structure host host APP UDP/IP UDP/IP 4
NDN Layer Structure host host APP router NDN UDP/IP UDP/IP 5
NDN Layer Structure host host APP router NDN NDN NDN LINK ETH ETH UDP/IP UDP/IP Other Other 6
NDN Layer Structure host host APP APP router NDN NDN NDN LINK ETH ETH UDP/IP UDP/IP Other Other 7
NDN Layer Structure host host APP APP router NDN NDN NDN NDN LINK ETH ETH UDP/IP UDP/IP Other Other 8
NDN Layer Structure host host APP APP router router NDN NDN NDN NDN LINK LINK ETH ETH UDP/IP UDP/IP Other Other 9
Methodology Investigate the use of NDN as a common platform for scientific data applications by: Understanding data management challenges of various scientific domains Developing and evaluating prototype applications that leverage NDN's features Use prototypes to further drive NDN research 10 4
First Step – Build a Catalog Create a shared resource – a distributed, synchronized catalog of names over NDN Provide common operations such as publishing, discovery, access control Catalog only deals with name management, not dataset retrieval Platform for further research and experimentation Research questions: Namespace construction, distributed publishing, key management, UI design, failover, etc. Functional services such as subsetting Mapping of name-based routing to tunneling services (VPN, OSCARS, MPLS) 11 5
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 Publisher NDN Data storage Consumer Catalog node 2 12 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 (1)Publish Dataset names Publisher NDN Data storage Consumer Catalog node 2 13 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 Publisher NDN Data storage Consumer Catalog node 2 14 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 (2) Sync changes Publisher NDN Data storage Consumer Catalog node 2 15 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 Publisher NDN Data storage Consumer Catalog node 2 16 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 Publisher NDN Data storage (3) Query for Dataset names Consumer Catalog node 2 17 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 Publisher NDN Data storage Consumer Catalog node 2 18 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 Publisher NDN Data storage (4) Retrieve data Consumer Catalog node 2 19 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 Publisher NDN Data storage (4) Retrieve data Consumer Catalog node 2 20 6
Overview of Catalog Workflow Catalog node 1 Data storage Catalog node 3 Publisher NDN Data storage (4) Retrieve data Consumer Catalog node 2 21 6
NDN-Science Testbed NSF CC-NIE campus infrastructure award 10G testbed (courtesy of ESnet, UCAR, and CSU Research LAN) Currently ~50TB of CMIP5, ~70TB of HEP data 22 7
Demos Search Publication and Sync Access control Retrieval and failover 23 8
Conclusions IP encourages common host access, not common data access methods Does not encourage interoperability at the application level NDN has the potential to unify the service interface required by scientific applications Science testbed and prototypes to test hypothesis and drive research and experimentation Ready-to-try catalog, we invite you to try it with your data Catalog is general, supports a variety of applications Currently CMIP5 and HEP applications UI for data search and retrieval. 24 9
Our sponsors: NSF and ESnet Join us @ http://www.netsec.colostate.edu/mailman/listinfo/ndn-sci 25 10
Backup Slides 11
Current Example: xrootd Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Fragile, fairly complex middleware 27 12
Current Example: xrootd Manager xrootd cmsd (a.k.a. Redirector) Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Fragile, fairly complex middleware 28 12
Current Example: xrootd Manager Client xrootd cmsd (a.k.a. Redirector) Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Fragile, fairly complex middleware 29 12
Current Example: xrootd Manager Client 4: Try open() at A xrootd cmsd (a.k.a. Redirector) Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Fragile, fairly complex middleware 30 12
xrootd under NDN NDN Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Significantly reduced system complexity Better service abstraction 31 13
xrootd under NDN NDN Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Significantly reduced system complexity Better service abstraction 32 13
xrootd under NDN NDN Client Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Significantly reduced system complexity Better service abstraction 33 13
xrootd under NDN ? /my/file NDN Client Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Significantly reduced system complexity Better service abstraction 34 13
xrootd under NDN ? /my/file NDN Client Data Servers xrootd cmsd xrootd cmsd xrootd cmsd A /my/file B C /my/file Significantly reduced system complexity Better service abstraction 35 13
Data Publication Catalog Publisher 1) Listening on /<catalog- prefix>/publish 36
Data Publication Catalog Publisher 2) Generate NDN names for 1) Listening on /<catalog- datasets/services prefix>/publish 37
Data Publication Catalog Publisher 2) Generate NDN names for 1) Listening on /<catalog- datasets/services prefix>/publish 3) Request publish 38
Data Publication Catalog Publisher 2) Generate NDN names for 1) Listening on /<catalog- datasets/services prefix>/publish 3) Request publish 4) Fetch published name list 39
Data Publication Catalog Publisher 2) Generate NDN names for 1) Listening on /<catalog- datasets/services prefix>/publish 3) Request publish 4) Fetch published name list 5) Authenticate the Data and validate data name against trust model 40
Data Publication Catalog Publisher 2) Generate NDN names for 1) Listening on /<catalog- datasets/services prefix>/publish 3) Request publish 4) Fetch published name list 5) Authenticate the Data and validate data name against trust model 6) Share names with other catalogs 41
Keys for ndn-atmos /cmip5/KEY Self-signed root key Site’s keys … /cmip5/lbl/KEY /cmip5/nwsc/KEY Application’s keys (Dataset names publishing) (NLSR) /cmip5/nwsc/<operator>/KEY /cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<router>/KEY 42 15
Keys for ndn-atmos /cmip5/KEY Self-signed root key signs Site’s keys … /cmip5/lbl/KEY /cmip5/nwsc/KEY Application’s keys (Dataset names publishing) (NLSR) /cmip5/nwsc/<operator>/KEY /cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<router>/KEY 43 15
Trust Model Only namespace owners are allowed to publish data Data provenance built into the data packet /PublisherA/publish Content Name Publisher A’s signature Signature - /PublisherA/publish/file/1 Data payload - /PublisherA/publish/file/2 + /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Valid publish message 44 16
Trust Model Only namespace owners are allowed to publish data Data provenance built into the data packet /PublisherA/publish /PublisherA/publish Content Name Publisher A’s signature Publisher A’s signature Signature - /PublisherA/publish/file/1 Data payload - /PublisherA/publish/file/2 - /PublisherB/publish/file + /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Valid publish message Invalid publish message 45 16
Recommend
More recommend