MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, - - PowerPoint PPT Presentation
MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, - - PowerPoint PPT Presentation
MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, Steve DiBenedetto, Catherine Olschanowsky, Christos Papadopoulos NDNcomm 2015 Sept 28, 2015 Los Angeles, CA Supported by NSF #13410999 and NSF#1345236 Introduction
1
Introduction
Scientific data is often very large and complex Climate - CMIP5: 3.5 PB, CMIP6: 350PB-3EB Physics - Atlas: 4 PB/Year Astronomy, bioinformatics, others… Science infrastructure Cutting edge hardware but often incompatible
domain software (ESGF, xrootd, etc.)
Complexity, replication, redundancy
1
2
Our Project
Build and deploy software to evaluate NDN in
scientific applications over a dedicated hardware infrastructure
Evaluate NDN in the context of: Application services: publishing, discovery, retrieval, access
control, load balancing, failover, caching, etc.
Network integration (OSCARS, SDN, etc.) Metrics Performance, reduced complexity, ease of deployment,
interoperability, reuse, efficiency, routing, security/trust, etc.
2
3
NDN Layer Structure
UDP/IP host host UDP/IP
4
NDN Layer Structure
APP UDP/IP host host UDP/IP
5
NDN Layer Structure
APP NDN UDP/IP host router host UDP/IP
6
NDN Layer Structure
APP NDN UDP/IP ETH Other host router NDN host LINK UDP/IP ETH Other NDN
7
NDN Layer Structure
APP NDN UDP/IP ETH Other host router NDN host LINK UDP/IP ETH Other NDN APP
8
NDN Layer Structure
APP NDN UDP/IP ETH Other host router NDN host LINK UDP/IP ETH Other NDN APP NDN
9
NDN Layer Structure
APP NDN UDP/IP ETH Other host router NDN host LINK UDP/IP ETH Other NDN APP NDN LINK router
10
Methodology
Investigate the use of NDN as a common
platform for scientific data applications by:
Understanding data management challenges of
various scientific domains
Developing and evaluating prototype applications
that leverage NDN's features
Use prototypes to further drive NDN research
4
11
First Step – Build a Catalog
Create a shared resource – a distributed, synchronized
catalog of names over NDN
Provide common operations such as publishing, discovery, access control Catalog only deals with name management, not dataset retrieval Platform for further research and experimentation Research questions: Namespace construction, distributed publishing, key management, UI
design, failover, etc.
Functional services such as subsetting Mapping of name-based routing to tunneling services (VPN, OSCARS,
MPLS)
5
12
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage Publisher Catalog node 2 Consumer Catalog node 3
13
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage (1)Publish Dataset names Publisher Catalog node 2 Consumer Catalog node 3
14
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage Publisher Catalog node 2 Consumer Catalog node 3
15
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage Publisher Catalog node 2 (2) Sync changes Consumer Catalog node 3
16
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage Publisher Catalog node 2 Consumer Catalog node 3
17
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage (3) Query for Dataset names Publisher Catalog node 2 Consumer Catalog node 3
18
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage Publisher Catalog node 2 Consumer Catalog node 3
19
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage Publisher (4) Retrieve data Catalog node 2 Consumer Catalog node 3
20
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage Publisher (4) Retrieve data Catalog node 2 Consumer Catalog node 3
21
Overview of Catalog Workflow
6
NDN
Catalog node 1 Data storage Data storage Publisher (4) Retrieve data Catalog node 2 Consumer Catalog node 3
22
NDN-Science Testbed
NSF CC-NIE campus infrastructure award 10G testbed (courtesy of ESnet, UCAR, and CSU Research LAN) Currently ~50TB of CMIP5, ~70TB of HEP data
7
23
Demos
Search Publication and Sync Access control Retrieval and failover
8
24
Conclusions
IP encourages common host access, not common data access
methods
Does not encourage interoperability at the application level NDN has the potential to unify the service interface required
by scientific applications
Science testbed and prototypes to test hypothesis and drive research
and experimentation
Ready-to-try catalog, we invite you to try it with your data Catalog is general, supports a variety of applications Currently CMIP5 and HEP applications UI for data search and retrieval.
9
25
Our sponsors: NSF and ESnet Join us @ http://www.netsec.colostate.edu/mailman/listinfo/ndn-sci
10
Backup Slides
11
27
Current Example: xrootd
12
/my/file /my/file
Data Servers
cmsd xrootd cmsd xrootd cmsd xrootd A B C
Fragile, fairly complex middleware
28
Current Example: xrootd
12
/my/file /my/file
Data Servers Manager
(a.k.a. Redirector)
cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd A B C
Fragile, fairly complex middleware
29
Current Example: xrootd
12
/my/file /my/file
Data Servers Manager
(a.k.a. Redirector)
Client
cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd A B C
Fragile, fairly complex middleware
30
Current Example: xrootd
12
/my/file /my/file
4: Try open() at A
Data Servers Manager
(a.k.a. Redirector)
Client
cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd A B C
Fragile, fairly complex middleware
31 NDN
xrootd under NDN
Significantly reduced system complexity Better service abstraction
13
/my/file /my/file
Data Servers
cmsd xrootd cmsd xrootd cmsd xrootd A B C
32 NDN
xrootd under NDN
Significantly reduced system complexity Better service abstraction
13
/my/file /my/file
Data Servers
cmsd xrootd cmsd xrootd cmsd xrootd A B C
33 NDN
xrootd under NDN
Significantly reduced system complexity Better service abstraction
13
/my/file /my/file
Data Servers
Client
cmsd xrootd cmsd xrootd cmsd xrootd A B C
34 NDN
xrootd under NDN
Significantly reduced system complexity Better service abstraction
13
/my/file /my/file
Data Servers
Client
cmsd xrootd cmsd xrootd cmsd xrootd A B C
? /my/file
35 NDN
xrootd under NDN
Significantly reduced system complexity Better service abstraction
13
/my/file /my/file
Data Servers
Client
cmsd xrootd cmsd xrootd cmsd xrootd A B C
? /my/file
36
Data Publication
Publisher Catalog
1) Listening on /<catalog- prefix>/publish
37
Data Publication
Publisher Catalog
1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services
38
Data Publication
Publisher Catalog
3) Request publish 1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services
39
Data Publication
Publisher Catalog
3) Request publish 4) Fetch published name list 1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services
40
Data Publication
Publisher Catalog
3) Request publish 4) Fetch published name list 5) Authenticate the Data and validate data name against trust model 1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services
41
Data Publication
Publisher Catalog
3) Request publish 4) Fetch published name list 6) Share names with other catalogs 5) Authenticate the Data and validate data name against trust model 1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services
42
Keys for ndn-atmos
15
Self-signed root key /cmip5/KEY
/cmip5/lbl/KEY /cmip5/nwsc/KEY
… Site’s keys
/cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<operator>/KEY
Application’s keys
(Dataset names publishing) (NLSR)
/cmip5/nwsc/<router>/KEY
43
Keys for ndn-atmos
15
Self-signed root key /cmip5/KEY
/cmip5/lbl/KEY /cmip5/nwsc/KEY
… Site’s keys
/cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<operator>/KEY
Application’s keys
signs
(Dataset names publishing) (NLSR)
/cmip5/nwsc/<router>/KEY
44
Trust Model
Only namespace owners are allowed to publish data Data provenance built into the data packet
16 /PublisherA/publish Publisher A’s signature
- /PublisherA/publish/file/1
- /PublisherA/publish/file/2
+ /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Content Name Signature Data payload
Valid publish message
45
Trust Model
Only namespace owners are allowed to publish data Data provenance built into the data packet
16 /PublisherA/publish Publisher A’s signature
- /PublisherA/publish/file/1
- /PublisherA/publish/file/2
+ /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Content Name Signature Data payload
/PublisherA/publish Publisher A’s signature
- /PublisherB/publish/file
Valid publish message Invalid publish message
46
Trust Model
Only namespace owners are allowed to publish data Data provenance built into the data packet
16 /PublisherA/publish Publisher A’s signature
- /PublisherA/publish/file/1
- /PublisherA/publish/file/2
+ /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Content Name Signature Data payload
/PublisherA/publish Publisher A’s signature
- /PublisherB/publish/file
Valid publish message Invalid publish message
47
Trust Model
Only namespace owners are allowed to publish data Data provenance built into the data packet
16 /PublisherA/publish Publisher A’s signature
- /PublisherA/publish/file/1
- /PublisherA/publish/file/2
+ /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Content Name Signature Data payload
/PublisherA/publish Publisher A’s signature
- /PublisherB/publish/file
Valid publish message Invalid publish message
48
Name Discovery
Consumer Catalog
1) Listening on /<catalog- prefix>/query
49
Name Discovery
Consumer Catalog
2) Query with parameters (model=cmip5 AND frequency=6hr) 1) Listening on /<catalog- prefix>/query
50
Name Discovery
Consumer Catalog
2) Query with parameters (model=cmip5 AND frequency=6hr) 3) Query local DB; Packetize results under /<catalog-prefix>/query- results/<params> 1) Listening on /<catalog- prefix>/query
51
Name Discovery
Consumer Catalog
2) Query with parameters (model=cmip5 AND frequency=6hr) 3) Query local DB; Packetize results under /<catalog-prefix>/query- results/<params> 3) ACK 1) Listening on /<catalog- prefix>/query
52
Name Discovery
Consumer Catalog
2) Query with parameters (model=cmip5 AND frequency=6hr) 3) Query local DB; Packetize results under /<catalog-prefix>/query- results/<params> 3) ACK 4) Fetch query results (name list) 1) Listening on /<catalog- prefix>/query
53
Name Discovery
Consumer Catalog
2) Query with parameters (model=cmip5 AND frequency=6hr) 3) Query local DB; Packetize results under /<catalog-prefix>/query- results/<params> 3) ACK 4) Fetch query results (name list) 1) Listening on /<catalog- prefix>/query 5) Fetch desired dataset(s) or re-query
54
Data Publication
Catalog Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher Generate NDN names for
datasets/services
Inform catalog of names to
add/remove Publisher Catalog
55
Data Publication
Catalog Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher Generate NDN names for
datasets/services
Inform catalog of names to
add/remove Publisher Catalog
Request publish
56
Data Publication
Catalog Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher Generate NDN names for
datasets/services
Inform catalog of names to
add/remove Publisher Catalog
Request publish Fetch published name list
57
Data Publication
Catalog Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher Generate NDN names for
datasets/services
Inform catalog of names to
add/remove Publisher Catalog
Request publish Fetch published name list
Validate data name against trust model
58
Data Publication
Catalog Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher Generate NDN names for
datasets/services
Inform catalog of names to
add/remove Publisher Catalog
Request publish Fetch published name list
Share names with other catalogs Validate data name against trust model
59
Name Discovery
Catalog Accept queries on
/<catalog-prefix>/query
Query local DB Packetize the returned names
under
/<catalog-prefix>/query- results/<params>
User Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list Fetch desired dataset(s) or re-
query Consumer Catalog
60
Name Discovery
Catalog Accept queries on
/<catalog-prefix>/query
Query local DB Packetize the returned names
under
/<catalog-prefix>/query- results/<params>
User Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list Fetch desired dataset(s) or re-
query Consumer Catalog
Query with parameters
61
Name Discovery
Catalog Accept queries on
/<catalog-prefix>/query
Query local DB Packetize the returned names
under
/<catalog-prefix>/query- results/<params>
User Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list Fetch desired dataset(s) or re-
query Consumer Catalog
Query with parameters
Query local DB; Packetize results
62
Name Discovery
Catalog Accept queries on
/<catalog-prefix>/query
Query local DB Packetize the returned names
under
/<catalog-prefix>/query- results/<params>
User Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list Fetch desired dataset(s) or re-
query Consumer Catalog
Query with parameters
Query local DB; Packetize results
ACK
63
Name Discovery
Catalog Accept queries on
/<catalog-prefix>/query
Query local DB Packetize the returned names
under
/<catalog-prefix>/query- results/<params>
User Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list Fetch desired dataset(s) or re-
query Consumer Catalog
Query with parameters
Query local DB; Packetize results
ACK Fetch query results
64
Name Discovery
Catalog Accept queries on
/<catalog-prefix>/query
Query local DB Packetize the returned names
under
/<catalog-prefix>/query- results/<params>
User Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list Fetch desired dataset(s) or re-
query Consumer Catalog
Query with parameters
Query local DB; Packetize results
ACK
Fetch data with standard NDN
Fetch query results
65
Name Discovery Optimization
Catalog Accept queries on
/<catalog-prefix>/queryParams
Query local DB Packetize the returned names
under /<catalog- prefix>/queryParams/seg#
In case of failure, queries get
redirected to another catalog
Consumers Can query any catalog
instances
Can transparently failover to
another catalog
- Avoid maintaining state between user and catalog
- Enables graceful failover
66 NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval … Provides a better abstraction to the applications
21
/my/file /my/file
Data Servers
cmsd xrootd cmsd xrootd cmsd xrootd A B C
67 NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval … Provides a better abstraction to the applications
21
/my/file /my/file
Data Servers
cmsd xrootd cmsd xrootd cmsd xrootd A B C
68 NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval … Provides a better abstraction to the applications
21
/my/file /my/file
Data Servers
Client
cmsd xrootd cmsd xrootd cmsd xrootd A B C
69 NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval … Provides a better abstraction to the applications
21
/my/file /my/file
Data Servers
Client
cmsd xrootd cmsd xrootd cmsd xrootd A B C
? /my/file
70 NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval … Provides a better abstraction to the applications
21
/my/file /my/file
Data Servers
Client
cmsd xrootd cmsd xrootd cmsd xrootd A B C
? /my/file
71
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer
NDN
User wants: /CMIP5/output1/VA/6hr/2016
. . .
72
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer
NDN
/CMIP5
User wants: /CMIP5/output1/VA/6hr/2016
. . .
73
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer
/CMIP5/output/BCC/6hr/1998
NDN
/CMIP5
User wants: /CMIP5/output1/VA/6hr/2016
. . .
74
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer
/CMIP5/output/BCC/6hr/1998
NDN
/CMIP5 /CMIP5/output/BCC/6hr (exclude 1998)
User wants: /CMIP5/output1/VA/6hr/2016
. . .
75
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer
/CMIP5/output/BCC/6hr/1998
NDN
/CMIP5 /CMIP5/output/BCC/6hr (exclude 1998)
May take too many requests to find desired data or service
User wants: /CMIP5/output1/VA/6hr/2016
. . .
76
NDN Support for Big Science
NDN Names separate data from hosts Discovery: Names directly translate to network queries Failover: Network can get verifiable data from anywhere Retrieval: Data can be fetched from optimal source(s) Investigate the use of NDN as a platform for scientific data
applications
Understand data management challenges of various scientific domains Develop prototype applications to leverage NDN's built-in features Use these applications as case studies to drive NDN research aspects
23
77
Summary
NDN improves scientific data management at scale Apps benefit from transparent multipath, automatic failover, etc. Built-in security provides publisher provenance Names are the common building block for content and services Names are flexible: can refer to static content or dynamic services Catalog supports efficient publication, non-contiguous name
discovery
Users can discover content and services with minimal a priori knowledge Catalog validates publication requests for authorization
24
78
Managing Scientific Data with NDN
Science testbed
10G testbed (courtesy of ESnet,
UCAR, and CSU Research LAN)
Nodes strategically located near
scientific data (climate +HEP)
CC-NIE NSF award Distributed, synchronized catalog of
names and services
Common functionality: publishing,
discovery, access control, etc.
Search and retrieval UI Platform for further research and
experimentation
Research questions:
Namespace construction, distributed
publishing, key management, UI design, failover, etc.
Functional services such as subsetting Mapping of name-based routing to
tunneling services (VPN, OSCARS, MPLS)
79
Managing Scientific Data with NDN
Science testbed
10G testbed (courtesy of ESnet,
UCAR, and CSU Research LAN)
CMIP5 and HEP data
CC-NIE NSF award Name-based Internet architecture
Name the data, not the host All data digitally signed Unifies and pushes common functionality
to the network: publishing, discovery, access control, etc.
Data Intensive applications
Automatic pervasive in-network caching,
parallel retrieval, automatic failover and more
Simpler alternative middleware
implementation e.g., ESGF, xrootd