Data on OSG Frank Wrthwein OSG Executive Director Professor of - - PowerPoint PPT Presentation

data on osg
SMART_READER_LITE
LIVE PREVIEW

Data on OSG Frank Wrthwein OSG Executive Director Professor of - - PowerPoint PPT Presentation

Data on OSG Frank Wrthwein OSG Executive Director Professor of Physics UCSD/SDSC The purpose of this presentation is to give the Council a summary of where we are at with supporting data on OSG High Level Messages


slide-1
SLIDE 1

Data on OSG

Frank Würthwein
 OSG Executive Director
 Professor of Physics
 UCSD/SDSC


slide-2
SLIDE 2

The purpose of this presentation is to give the Council a summary of where we are at with supporting data on OSG

slide-3
SLIDE 3

October 3rd, 2017

High Level Messages

  • First line of defense is HTCondor file transfer
  • If that’s not sufficient for needed scale:
  • Pull/put data to/from job via gridftp and/or xrdcp
  • We offer data hosting in some cases.
  • Use cashing if same input data is reused often.
  • We can support “reasonable” privacy of data, but not HIPAA or FISMA.
  • If data movement needs to be managed across multiple global locations,

independent of jobs:

  • We helped Xenon1T to adopt Rucio, the ATLAS data management solutions.
  • We expect to help others evaluate this as possible solution in the future.
  • First potential customers for an eval are LSST and LIGO.

The services provided by OSG in support of data on OSG vary, depending on the size of the needs of the communities we deal with.

3

slide-4
SLIDE 4

Benchmarking HTCondor Filetransfer

Initiated by GlueX and Jefferson Lab. Wanted to know if a single submit host at JLab can support glueX operations needs. Concern was primarily the IO in and out of the system. OSG did the test on our system. Then provided instructions for deployment at JLab. Then repeated test on their system, and helped debug until expected performance was achieved.

slide-5
SLIDE 5

October 3rd, 2017

GlueX Requirements

5

Parameter GlueX Spec OSG Test Running jobs 20,000 4,000 Output Size 10-100 MB 250 MB Input Size 1-10 MB 1-10 MB Job Runtime 8h - 9h 0.5 h

O(n, l, s) = nJobs ∗ size length = 20000 ∗ 90 9 ∗ 3600 ≈ 55.5MB sec

GlueX specs translated into 55.5MB/sec and ~1Hz transaction rate

We tested x10 larger IO and x3 more transactions per second.

slide-6
SLIDE 6

October 3rd, 2017

Benchmarking Result

  • Smooth operations at the scale tested.
  • Lessons Learned:
  • Stay away from exceeding significantly more than half the

10Gbps network bandwidth on the submit host.

  • Be careful with TCPIP settings to avoids latencies of schedd

communications with far away worker nodes.

6

10 Gbit interface limit

slide-7
SLIDE 7

Put and Get at 100Gbps

OSG offers installation instructions for deploying a cluster

  • f Gridftp or xrdcp hosts, each of which is 10Gbps

connected, and seen by the clients as a single service, using Linux Virtual Server. This is the OSG strategy pursued for replacing SRM. It’s also what LIGO used for its first gravitational wave detection work on OSG.

slide-8
SLIDE 8

October 3rd, 2017

Aside on Reducing Complexity

  • We are working to reduce the complexity we

support for the LHC in order to sustain it with less effort in the future.

  • E.g. SRM:
  • In OSG 3.2 there were 4 SRM clients
  • In OSG 3.4 there are none.
  • E.g. X509:
  • We are working on eliminating the need for

X509 from OSG.

  • More on that later.

8

slide-9
SLIDE 9

October 3rd, 2017

Caching via StashCache

  • OSG now operates its own Data Federation.
  • We support federations inside ours that

have privacy from each other.

  • We support people to build their own.
  • The advantage of living inside OSG is

that you have access to the StashCache deployed infrastructure.

  • If you roll your own, you are on your own.

9

slide-10
SLIDE 10

September 7th, 2017

OSG Data Federation

Data Server

XRootd

Data Server

XRootd

Data Server

XRootd

Xrootd Origin A

Data Server

XRootd

Data Server

XRootd

Data Server

XRootd

Xrootd Origin B

Data Server

XRootd

Data Server

XRootd

Data Server

XRootd

Xrootd regional cache

Data Server

XRootd

Data Server

XRootd

Data Server

XRootd

Xrootd regional cache

One Data Origin per Community Multiple caches across US

Applications connect to regional cache transparently. Regional cache asks redirector for location of file. Redirector redirects to relevant origin. File gets cached in regional cache.

10

This is a technology transfer from LHC with some OSG value added.

Xrootd OSG Redirector

Caches at: BNL, FZU, UNL, Syracuse, UChicago, UCSD/SDSC, UIUC.

slide-11
SLIDE 11

October 3rd, 2017

Communities using StashCache

  • OSG-Connect
  • See next slide for examples.
  • LIGO
  • Nova
  • And some expression of interest:
  • Xenon1T, expects future use in front of

Comet@SDSC, and potentially elsewhere.

  • GlueX, initial interest, not yet concrete.

11

slide-12
SLIDE 12

September 7th, 2017

Big Data beyond Big Science

OSG caching infrastructure used at up to ~10TB/hour for meta- or exo-genetics

12

slide-13
SLIDE 13

October 3rd, 2017

StashCP Dashboard Info Last 3 Months

13

Dashboards Hosted at Kibana Instance at MWT2

slide-14
SLIDE 14

October 3rd, 2017

StashCache Instances View 10/1 0:00 to 10/2 19:00

14

Details on data in/out, connections, errors, timeouts, retries, … for each cache are monitored.

slide-15
SLIDE 15

October 3rd, 2017

Rucio and its use in Xenon1T

  • Xenon1T needed something to manage its

transfers between the experiment DAQ in Italy, and various disk locations in EU, Israel, and the US.

15

D a t a F l

  • w

( n

  • w)

X e n

  • n

1 T D A Q x e 1 t

  • d

a t a m a n a g e r / d a t a / x e n

  • n

/ r a w /

  • S

i z e : 5 T B

  • D

a t a b u @ e r : D A Q U p l

  • a

d

B u

  • e

r

  • S

i z e : 5 T B

  • B

u @ e r f

  • r

R u c i

  • t

r a n s f e r s

  • B

u @ e r f

  • r

T a p e u p l

  • a

d

N I K H E F

( A m s t e r d a m )

S i z e : 2 T B I N 2 P 3

( L y

  • n

)

S i z e : 2 T B S t a s h / L

  • g

i n

( C h i c a g

  • )

S i z e : 3 T B M i d w a y / R C C

( C h i c a g

  • )

S i z e : 9 2 T B T a p e B a c k u p

( S t

  • c

k h

  • l

m )

S i z e : 5 . 6 P B

  • r

R u c i a x Rucio Server (Chicago) C a x

L N G S We i z m a n n

( I s r æ l )

S i z e : 8 T B

n

  • t

y e t u s e d

R u c i

  • S

t

  • r

a g e E l e m e n t D i s k s p a c e T a p e

Raw Data

R u c i

  • t

r a n s f e r s ( s i m u l t a n e

  • u

s ) T a p e u p l

  • a

d ( d

  • wn

l

  • a

d ) R u c i a x u p l

  • a

d ( d

  • wn

l

  • a

d )

Processed Data

C a x P r

  • c

e s s e d D a t a a n a l y z e d b y H a x

  • Xenon1T adopted Rucio for this

after joint evaluation with OSG.

  • Since then, LSST and LIGO

expressed an interest in a similar evaluation.

  • Next steps:
  • Two pager to define

metrics for eval project with LSST.

  • OSG Blueprint to better

understand technical concept underlying Rucio.

slide-16
SLIDE 16

A Future without X509

We already have eliminated X509 for user job submission in OSG. The two remaining use cases are: Pilots being authenticated at CEs Users staging out data to Storage Endpoints from jobs.

slide-17
SLIDE 17

October 3rd, 2017

Problem Statement

  • At present, Storage Endpoints authenticate users
  • X509 certificate is delegated to the job for that job to

stage out data to a Storage Endpoint from the worker node.

  • In the future, we want Storage Tokens that define

capability rather than personhood.

  • You are allowed to store data in your directory at

OSG-VOs storage endpoint(s).

  • Working with NSF funded SciToken project to accomplish

this.

  • https://scitokens.org

17

slide-18
SLIDE 18

October 3rd, 2017

Initial “Demo”

  • Initial Demo showed:
  • OSG-Connect HTCondor submit host transparently

generates SciToken

  • Users are oblivious of the existence of such Tokens.
  • User jobs put files from worker node into user-owned

directory at Stash Endpoint using HTTPS protocol.

  • Stash is the Origin of StashCache, implemented as

Xrootd server => This implies that data staged out can be used for subsequent processing via StashCache.

  • Based on OAuth2 framework
  • Same as when you authorize 3rd party website to use your

Facebook/Google/DropBox login.

18

slide-19
SLIDE 19

October 3rd, 2017

Status of SciToken

  • Initial hackathon led to initial demo, and thus

understanding of viability of basic concept.

  • Draft Design write up for technical director

evaluation, and broader discussion exists.

  • Needs a bit more work before it’s ready

for sharing.

19

slide-20
SLIDE 20

October 3rd, 2017

Summary & Conclusion

  • OSG made a lot of progress in supporting data on

OSG broadly for anybody.

  • We build on technologies that have broad community

support and/or are NSF funded projects.

  • We reduced, and will continue to reduce the

complexity of the software stack required to use data

  • n OSG.
  • There’s some i’s to dot and t’s to cross, but decent

functionality now exists, and the geek gap between Big Science and the rest of scientific endeavors has shrunk, and continues shrinking.

20