MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, - - PowerPoint PPT Presentation

managing scientific data with ndn
SMART_READER_LITE
LIVE PREVIEW

MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, - - PowerPoint PPT Presentation

MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, Steve DiBenedetto, Catherine Olschanowsky, Christos Papadopoulos NDNcomm 2015 Sept 28, 2015 Los Angeles, CA Supported by NSF #13410999 and NSF#1345236 Introduction


slide-1
SLIDE 1

MANAGING SCIENTIFIC DATA WITH NDN

Chengyu Fan, Susmit Shannigrahi, Steve DiBenedetto, Catherine Olschanowsky, Christos Papadopoulos

NDNcomm 2015 Sept 28, 2015 Los Angeles, CA Supported by NSF #13410999 and NSF#1345236

slide-2
SLIDE 2

1

Introduction

 Scientific data is often very large and complex Climate - CMIP5: 3.5 PB, CMIP6: 350PB-3EB Physics - Atlas: 4 PB/Year Astronomy, bioinformatics, others…  Science infrastructure Cutting edge hardware but often incompatible

domain software (ESGF, xrootd, etc.)

Complexity, replication, redundancy

1

slide-3
SLIDE 3

2

Our Project

 Build and deploy software to evaluate NDN in

scientific applications over a dedicated hardware infrastructure

 Evaluate NDN in the context of:  Application services: publishing, discovery, retrieval, access

control, load balancing, failover, caching, etc.

 Network integration (OSCARS, SDN, etc.)  Metrics  Performance, reduced complexity, ease of deployment,

interoperability, reuse, efficiency, routing, security/trust, etc.

2

slide-4
SLIDE 4

3

NDN Layer Structure

UDP/IP host host UDP/IP

slide-5
SLIDE 5

4

NDN Layer Structure

APP UDP/IP host host UDP/IP

slide-6
SLIDE 6

5

NDN Layer Structure

APP NDN UDP/IP host router host UDP/IP

slide-7
SLIDE 7

6

NDN Layer Structure

APP NDN UDP/IP ETH Other host router NDN host LINK UDP/IP ETH Other NDN

slide-8
SLIDE 8

7

NDN Layer Structure

APP NDN UDP/IP ETH Other host router NDN host LINK UDP/IP ETH Other NDN APP

slide-9
SLIDE 9

8

NDN Layer Structure

APP NDN UDP/IP ETH Other host router NDN host LINK UDP/IP ETH Other NDN APP NDN

slide-10
SLIDE 10

9

NDN Layer Structure

APP NDN UDP/IP ETH Other host router NDN host LINK UDP/IP ETH Other NDN APP NDN LINK router

slide-11
SLIDE 11

10

Methodology

 Investigate the use of NDN as a common

platform for scientific data applications by:

Understanding data management challenges of

various scientific domains

Developing and evaluating prototype applications

that leverage NDN's features

Use prototypes to further drive NDN research

4

slide-12
SLIDE 12

11

First Step – Build a Catalog

 Create a shared resource – a distributed, synchronized

catalog of names over NDN

 Provide common operations such as publishing, discovery, access control  Catalog only deals with name management, not dataset retrieval  Platform for further research and experimentation  Research questions:  Namespace construction, distributed publishing, key management, UI

design, failover, etc.

 Functional services such as subsetting  Mapping of name-based routing to tunneling services (VPN, OSCARS,

MPLS)

5

slide-13
SLIDE 13

12

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage Publisher Catalog node 2 Consumer Catalog node 3

slide-14
SLIDE 14

13

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage (1)Publish Dataset names Publisher Catalog node 2 Consumer Catalog node 3

slide-15
SLIDE 15

14

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage Publisher Catalog node 2 Consumer Catalog node 3

slide-16
SLIDE 16

15

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage Publisher Catalog node 2 (2) Sync changes Consumer Catalog node 3

slide-17
SLIDE 17

16

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage Publisher Catalog node 2 Consumer Catalog node 3

slide-18
SLIDE 18

17

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage (3) Query for Dataset names Publisher Catalog node 2 Consumer Catalog node 3

slide-19
SLIDE 19

18

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage Publisher Catalog node 2 Consumer Catalog node 3

slide-20
SLIDE 20

19

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage Publisher (4) Retrieve data Catalog node 2 Consumer Catalog node 3

slide-21
SLIDE 21

20

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage Publisher (4) Retrieve data Catalog node 2 Consumer Catalog node 3

slide-22
SLIDE 22

21

Overview of Catalog Workflow

6

NDN

Catalog node 1 Data storage Data storage Publisher (4) Retrieve data Catalog node 2 Consumer Catalog node 3

slide-23
SLIDE 23

22

NDN-Science Testbed

 NSF CC-NIE campus infrastructure award  10G testbed (courtesy of ESnet, UCAR, and CSU Research LAN)  Currently ~50TB of CMIP5, ~70TB of HEP data

7

slide-24
SLIDE 24

23

Demos

 Search  Publication and Sync  Access control  Retrieval and failover

8

slide-25
SLIDE 25

24

Conclusions

 IP encourages common host access, not common data access

methods

 Does not encourage interoperability at the application level  NDN has the potential to unify the service interface required

by scientific applications

 Science testbed and prototypes to test hypothesis and drive research

and experimentation

 Ready-to-try catalog, we invite you to try it with your data  Catalog is general, supports a variety of applications  Currently CMIP5 and HEP applications  UI for data search and retrieval.

9

slide-26
SLIDE 26

25

Our sponsors: NSF and ESnet Join us @ http://www.netsec.colostate.edu/mailman/listinfo/ndn-sci

10

slide-27
SLIDE 27

Backup Slides

11

slide-28
SLIDE 28

27

Current Example: xrootd

12

/my/file /my/file

Data Servers

cmsd xrootd cmsd xrootd cmsd xrootd A B C

 Fragile, fairly complex middleware

slide-29
SLIDE 29

28

Current Example: xrootd

12

/my/file /my/file

Data Servers Manager

(a.k.a. Redirector)

cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd A B C

 Fragile, fairly complex middleware

slide-30
SLIDE 30

29

Current Example: xrootd

12

/my/file /my/file

Data Servers Manager

(a.k.a. Redirector)

Client

cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd A B C

 Fragile, fairly complex middleware

slide-31
SLIDE 31

30

Current Example: xrootd

12

/my/file /my/file

4: Try open() at A

Data Servers Manager

(a.k.a. Redirector)

Client

cmsd xrootd cmsd xrootd cmsd xrootd cmsd xrootd A B C

 Fragile, fairly complex middleware

slide-32
SLIDE 32

31 NDN

xrootd under NDN

 Significantly reduced system complexity  Better service abstraction

13

/my/file /my/file

Data Servers

cmsd xrootd cmsd xrootd cmsd xrootd A B C

slide-33
SLIDE 33

32 NDN

xrootd under NDN

 Significantly reduced system complexity  Better service abstraction

13

/my/file /my/file

Data Servers

cmsd xrootd cmsd xrootd cmsd xrootd A B C

slide-34
SLIDE 34

33 NDN

xrootd under NDN

 Significantly reduced system complexity  Better service abstraction

13

/my/file /my/file

Data Servers

Client

cmsd xrootd cmsd xrootd cmsd xrootd A B C

slide-35
SLIDE 35

34 NDN

xrootd under NDN

 Significantly reduced system complexity  Better service abstraction

13

/my/file /my/file

Data Servers

Client

cmsd xrootd cmsd xrootd cmsd xrootd A B C

? /my/file

slide-36
SLIDE 36

35 NDN

xrootd under NDN

 Significantly reduced system complexity  Better service abstraction

13

/my/file /my/file

Data Servers

Client

cmsd xrootd cmsd xrootd cmsd xrootd A B C

? /my/file

slide-37
SLIDE 37

36

Data Publication

Publisher Catalog

1) Listening on /<catalog- prefix>/publish

slide-38
SLIDE 38

37

Data Publication

Publisher Catalog

1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services

slide-39
SLIDE 39

38

Data Publication

Publisher Catalog

3) Request publish 1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services

slide-40
SLIDE 40

39

Data Publication

Publisher Catalog

3) Request publish 4) Fetch published name list 1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services

slide-41
SLIDE 41

40

Data Publication

Publisher Catalog

3) Request publish 4) Fetch published name list 5) Authenticate the Data and validate data name against trust model 1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services

slide-42
SLIDE 42

41

Data Publication

Publisher Catalog

3) Request publish 4) Fetch published name list 6) Share names with other catalogs 5) Authenticate the Data and validate data name against trust model 1) Listening on /<catalog- prefix>/publish 2) Generate NDN names for datasets/services

slide-43
SLIDE 43

42

Keys for ndn-atmos

15

Self-signed root key /cmip5/KEY

/cmip5/lbl/KEY /cmip5/nwsc/KEY

… Site’s keys

/cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<operator>/KEY

Application’s keys

(Dataset names publishing) (NLSR)

/cmip5/nwsc/<router>/KEY

slide-44
SLIDE 44

43

Keys for ndn-atmos

15

Self-signed root key /cmip5/KEY

/cmip5/lbl/KEY /cmip5/nwsc/KEY

… Site’s keys

/cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<operator>/KEY

Application’s keys

signs

(Dataset names publishing) (NLSR)

/cmip5/nwsc/<router>/KEY

slide-45
SLIDE 45

44

Trust Model

 Only namespace owners are allowed to publish data  Data provenance built into the data packet

16 /PublisherA/publish Publisher A’s signature

  • /PublisherA/publish/file/1
  • /PublisherA/publish/file/2

+ /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Content Name Signature Data payload

Valid publish message

slide-46
SLIDE 46

45

Trust Model

 Only namespace owners are allowed to publish data  Data provenance built into the data packet

16 /PublisherA/publish Publisher A’s signature

  • /PublisherA/publish/file/1
  • /PublisherA/publish/file/2

+ /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Content Name Signature Data payload

/PublisherA/publish Publisher A’s signature

  • /PublisherB/publish/file

Valid publish message Invalid publish message

slide-47
SLIDE 47

46

Trust Model

 Only namespace owners are allowed to publish data  Data provenance built into the data packet

16 /PublisherA/publish Publisher A’s signature

  • /PublisherA/publish/file/1
  • /PublisherA/publish/file/2

+ /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Content Name Signature Data payload

/PublisherA/publish Publisher A’s signature

  • /PublisherB/publish/file

Valid publish message Invalid publish message

slide-48
SLIDE 48

47

Trust Model

 Only namespace owners are allowed to publish data  Data provenance built into the data packet

16 /PublisherA/publish Publisher A’s signature

  • /PublisherA/publish/file/1
  • /PublisherA/publish/file/2

+ /PublisherA/publish/file/3 + /PublisherA/publish/file/4 Content Name Signature Data payload

/PublisherA/publish Publisher A’s signature

  • /PublisherB/publish/file

Valid publish message Invalid publish message

slide-49
SLIDE 49

48

Name Discovery

Consumer Catalog

1) Listening on /<catalog- prefix>/query

slide-50
SLIDE 50

49

Name Discovery

Consumer Catalog

2) Query with parameters (model=cmip5 AND frequency=6hr) 1) Listening on /<catalog- prefix>/query

slide-51
SLIDE 51

50

Name Discovery

Consumer Catalog

2) Query with parameters (model=cmip5 AND frequency=6hr) 3) Query local DB; Packetize results under /<catalog-prefix>/query- results/<params> 1) Listening on /<catalog- prefix>/query

slide-52
SLIDE 52

51

Name Discovery

Consumer Catalog

2) Query with parameters (model=cmip5 AND frequency=6hr) 3) Query local DB; Packetize results under /<catalog-prefix>/query- results/<params> 3) ACK 1) Listening on /<catalog- prefix>/query

slide-53
SLIDE 53

52

Name Discovery

Consumer Catalog

2) Query with parameters (model=cmip5 AND frequency=6hr) 3) Query local DB; Packetize results under /<catalog-prefix>/query- results/<params> 3) ACK 4) Fetch query results (name list) 1) Listening on /<catalog- prefix>/query

slide-54
SLIDE 54

53

Name Discovery

Consumer Catalog

2) Query with parameters (model=cmip5 AND frequency=6hr) 3) Query local DB; Packetize results under /<catalog-prefix>/query- results/<params> 3) ACK 4) Fetch query results (name list) 1) Listening on /<catalog- prefix>/query 5) Fetch desired dataset(s) or re-query

slide-55
SLIDE 55

54

Data Publication

 Catalog  Accept publish requests:

/<catalog-prefix>/publish

 Authenticate and retrieve

data names from publisher

 Sync names with other

catalogs

 Publisher  Generate NDN names for

datasets/services

 Inform catalog of names to

add/remove Publisher Catalog

slide-56
SLIDE 56

55

Data Publication

 Catalog  Accept publish requests:

/<catalog-prefix>/publish

 Authenticate and retrieve

data names from publisher

 Sync names with other

catalogs

 Publisher  Generate NDN names for

datasets/services

 Inform catalog of names to

add/remove Publisher Catalog

Request publish

slide-57
SLIDE 57

56

Data Publication

 Catalog  Accept publish requests:

/<catalog-prefix>/publish

 Authenticate and retrieve

data names from publisher

 Sync names with other

catalogs

 Publisher  Generate NDN names for

datasets/services

 Inform catalog of names to

add/remove Publisher Catalog

Request publish Fetch published name list

slide-58
SLIDE 58

57

Data Publication

 Catalog  Accept publish requests:

/<catalog-prefix>/publish

 Authenticate and retrieve

data names from publisher

 Sync names with other

catalogs

 Publisher  Generate NDN names for

datasets/services

 Inform catalog of names to

add/remove Publisher Catalog

Request publish Fetch published name list

Validate data name against trust model

slide-59
SLIDE 59

58

Data Publication

 Catalog  Accept publish requests:

/<catalog-prefix>/publish

 Authenticate and retrieve

data names from publisher

 Sync names with other

catalogs

 Publisher  Generate NDN names for

datasets/services

 Inform catalog of names to

add/remove Publisher Catalog

Request publish Fetch published name list

Share names with other catalogs Validate data name against trust model

slide-60
SLIDE 60

59

Name Discovery

 Catalog  Accept queries on

/<catalog-prefix>/query

 Query local DB  Packetize the returned names

under

/<catalog-prefix>/query- results/<params>

 User  Query catalog for names with

specified components

 e.g.: model=cmip5 AND

frequency=6hr

 Fetch generated name list  Fetch desired dataset(s) or re-

query Consumer Catalog

slide-61
SLIDE 61

60

Name Discovery

 Catalog  Accept queries on

/<catalog-prefix>/query

 Query local DB  Packetize the returned names

under

/<catalog-prefix>/query- results/<params>

 User  Query catalog for names with

specified components

 e.g.: model=cmip5 AND

frequency=6hr

 Fetch generated name list  Fetch desired dataset(s) or re-

query Consumer Catalog

Query with parameters

slide-62
SLIDE 62

61

Name Discovery

 Catalog  Accept queries on

/<catalog-prefix>/query

 Query local DB  Packetize the returned names

under

/<catalog-prefix>/query- results/<params>

 User  Query catalog for names with

specified components

 e.g.: model=cmip5 AND

frequency=6hr

 Fetch generated name list  Fetch desired dataset(s) or re-

query Consumer Catalog

Query with parameters

Query local DB; Packetize results

slide-63
SLIDE 63

62

Name Discovery

 Catalog  Accept queries on

/<catalog-prefix>/query

 Query local DB  Packetize the returned names

under

/<catalog-prefix>/query- results/<params>

 User  Query catalog for names with

specified components

 e.g.: model=cmip5 AND

frequency=6hr

 Fetch generated name list  Fetch desired dataset(s) or re-

query Consumer Catalog

Query with parameters

Query local DB; Packetize results

ACK

slide-64
SLIDE 64

63

Name Discovery

 Catalog  Accept queries on

/<catalog-prefix>/query

 Query local DB  Packetize the returned names

under

/<catalog-prefix>/query- results/<params>

 User  Query catalog for names with

specified components

 e.g.: model=cmip5 AND

frequency=6hr

 Fetch generated name list  Fetch desired dataset(s) or re-

query Consumer Catalog

Query with parameters

Query local DB; Packetize results

ACK Fetch query results

slide-65
SLIDE 65

64

Name Discovery

 Catalog  Accept queries on

/<catalog-prefix>/query

 Query local DB  Packetize the returned names

under

/<catalog-prefix>/query- results/<params>

 User  Query catalog for names with

specified components

 e.g.: model=cmip5 AND

frequency=6hr

 Fetch generated name list  Fetch desired dataset(s) or re-

query Consumer Catalog

Query with parameters

Query local DB; Packetize results

ACK

Fetch data with standard NDN

Fetch query results

slide-66
SLIDE 66

65

Name Discovery Optimization

 Catalog  Accept queries on

/<catalog-prefix>/queryParams

 Query local DB  Packetize the returned names

under /<catalog- prefix>/queryParams/seg#

 In case of failure, queries get

redirected to another catalog

 Consumers  Can query any catalog

instances

 Can transparently failover to

another catalog

  • Avoid maintaining state between user and catalog
  • Enables graceful failover
slide-67
SLIDE 67

66 NDN

Simplified xrootd Under NDN

 NDN integrates discovery, failover, retrieval …  Provides a better abstraction to the applications

21

/my/file /my/file

Data Servers

cmsd xrootd cmsd xrootd cmsd xrootd A B C

slide-68
SLIDE 68

67 NDN

Simplified xrootd Under NDN

 NDN integrates discovery, failover, retrieval …  Provides a better abstraction to the applications

21

/my/file /my/file

Data Servers

cmsd xrootd cmsd xrootd cmsd xrootd A B C

slide-69
SLIDE 69

68 NDN

Simplified xrootd Under NDN

 NDN integrates discovery, failover, retrieval …  Provides a better abstraction to the applications

21

/my/file /my/file

Data Servers

Client

cmsd xrootd cmsd xrootd cmsd xrootd A B C

slide-70
SLIDE 70

69 NDN

Simplified xrootd Under NDN

 NDN integrates discovery, failover, retrieval …  Provides a better abstraction to the applications

21

/my/file /my/file

Data Servers

Client

cmsd xrootd cmsd xrootd cmsd xrootd A B C

? /my/file

slide-71
SLIDE 71

70 NDN

Simplified xrootd Under NDN

 NDN integrates discovery, failover, retrieval …  Provides a better abstraction to the applications

21

/my/file /my/file

Data Servers

Client

cmsd xrootd cmsd xrootd cmsd xrootd A B C

? /my/file

slide-72
SLIDE 72

71

Name Discovery Challenges

 Users may need to discover content/services without knowing a

the full NDN name prefix structure

 NDN names are contiguous prefixes  Users may only know a few disjoint name components (e.g.

frequency=6hr)

 But can not use wildcards for name discovery

22

Consumer

NDN

User wants: /CMIP5/output1/VA/6hr/2016

. . .

slide-73
SLIDE 73

72

Name Discovery Challenges

 Users may need to discover content/services without knowing a

the full NDN name prefix structure

 NDN names are contiguous prefixes  Users may only know a few disjoint name components (e.g.

frequency=6hr)

 But can not use wildcards for name discovery

22

Consumer

NDN

/CMIP5

User wants: /CMIP5/output1/VA/6hr/2016

. . .

slide-74
SLIDE 74

73

Name Discovery Challenges

 Users may need to discover content/services without knowing a

the full NDN name prefix structure

 NDN names are contiguous prefixes  Users may only know a few disjoint name components (e.g.

frequency=6hr)

 But can not use wildcards for name discovery

22

Consumer

/CMIP5/output/BCC/6hr/1998

NDN

/CMIP5

User wants: /CMIP5/output1/VA/6hr/2016

. . .

slide-75
SLIDE 75

74

Name Discovery Challenges

 Users may need to discover content/services without knowing a

the full NDN name prefix structure

 NDN names are contiguous prefixes  Users may only know a few disjoint name components (e.g.

frequency=6hr)

 But can not use wildcards for name discovery

22

Consumer

/CMIP5/output/BCC/6hr/1998

NDN

/CMIP5 /CMIP5/output/BCC/6hr (exclude 1998)

User wants: /CMIP5/output1/VA/6hr/2016

. . .

slide-76
SLIDE 76

75

Name Discovery Challenges

 Users may need to discover content/services without knowing a

the full NDN name prefix structure

 NDN names are contiguous prefixes  Users may only know a few disjoint name components (e.g.

frequency=6hr)

 But can not use wildcards for name discovery

22

Consumer

/CMIP5/output/BCC/6hr/1998

NDN

/CMIP5 /CMIP5/output/BCC/6hr (exclude 1998)

May take too many requests to find desired data or service

User wants: /CMIP5/output1/VA/6hr/2016

. . .

slide-77
SLIDE 77

76

NDN Support for Big Science

 NDN Names separate data from hosts  Discovery: Names directly translate to network queries  Failover: Network can get verifiable data from anywhere  Retrieval: Data can be fetched from optimal source(s)  Investigate the use of NDN as a platform for scientific data

applications

 Understand data management challenges of various scientific domains  Develop prototype applications to leverage NDN's built-in features  Use these applications as case studies to drive NDN research aspects

23

slide-78
SLIDE 78

77

Summary

 NDN improves scientific data management at scale  Apps benefit from transparent multipath, automatic failover, etc.  Built-in security provides publisher provenance  Names are the common building block for content and services  Names are flexible: can refer to static content or dynamic services  Catalog supports efficient publication, non-contiguous name

discovery

 Users can discover content and services with minimal a priori knowledge  Catalog validates publication requests for authorization

24

slide-79
SLIDE 79

78

Managing Scientific Data with NDN

 Science testbed

 10G testbed (courtesy of ESnet,

UCAR, and CSU Research LAN)

 Nodes strategically located near

scientific data (climate +HEP)

 CC-NIE NSF award  Distributed, synchronized catalog of

names and services

 Common functionality: publishing,

discovery, access control, etc.

 Search and retrieval UI  Platform for further research and

experimentation

 Research questions:

 Namespace construction, distributed

publishing, key management, UI design, failover, etc.

 Functional services such as subsetting  Mapping of name-based routing to

tunneling services (VPN, OSCARS, MPLS)

slide-80
SLIDE 80

79

Managing Scientific Data with NDN

 Science testbed

 10G testbed (courtesy of ESnet,

UCAR, and CSU Research LAN)

 CMIP5 and HEP data

 CC-NIE NSF award  Name-based Internet architecture

 Name the data, not the host  All data digitally signed  Unifies and pushes common functionality

to the network: publishing, discovery, access control, etc.

 Data Intensive applications

 Automatic pervasive in-network caching,

parallel retrieval, automatic failover and more

 Simpler alternative middleware

implementation e.g., ESGF, xrootd