dCache NFSv4.1 Tigran Mkrtchyan Zeuthen, 13.04.12 dCache NFSv4.1 | - - PowerPoint PPT Presentation

dcache nfsv4 1
SMART_READER_LITE
LIVE PREVIEW

dCache NFSv4.1 Tigran Mkrtchyan Zeuthen, 13.04.12 dCache NFSv4.1 | - - PowerPoint PPT Presentation

dCache NFSv4.1 Tigran Mkrtchyan Zeuthen, 13.04.12 dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 1 Outline NFSv41 basics NFSv4.1 concepts PNFS Id mapping Industry standard dCache implementation dCache NFSv4.1 |


slide-1
SLIDE 1

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 1

dCache NFSv4.1

Tigran Mkrtchyan Zeuthen, 13.04.12

slide-2
SLIDE 2

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 2

Outline

  • NFSv41 basics
  • NFSv4.1 concepts
  • PNFS
  • Id mapping
  • Industry standard
  • dCache implementation
slide-3
SLIDE 3

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 3

Classic NFS

  • BigData never fits into a single server.
  • Big administrative overhead to keep data
  • n multiple servers
  • Single NFS server becomes bottleneck
slide-4
SLIDE 4

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 4

NFSv4.1

  • Provides access to files and directories
  • Stateful
  • Keeps track of OPEN/CLOSE ( LOCK/UNLOCK )
  • Detects client/server reboot
  • Client controlled reply cache and EOS
  • Aware of multihomed servers
  • Detects retransmits
  • Recovery from network disconnect
slide-5
SLIDE 5

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 5

Classic NFS

DISK DISK DISK NFS DISK

slide-6
SLIDE 6

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 6

DISK DISK DISK NFS DISK

slide-7
SLIDE 7

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 7

DISK DISK DISK NFS DISK CPU or network become a bottleneck with growing number of clients

slide-8
SLIDE 8

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 8

Parallel NFS

NFS DS NFS DS NFS DS NFS MDS NFS DS

slide-9
SLIDE 9

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 9

Parallel + Striping

DS NFS b4 read(block1-block4) DS b3 DS b2 DS b1 Read block1 Read block2 Read block3 Read block4

slide-10
SLIDE 10

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 10

Parallel NFS

  • Single Namespace, distributed data
  • Client talks to Meta Data Server for

metadata only

  • Bandwidth and performance grow with

number of Data Server nodes

  • File striping ( like raid0 )
  • Enforces the same security on DS (pools) as
  • n MDS
slide-11
SLIDE 11

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 11

Security

  • To verify client credentials RPCSEC_GSS is used
  • Krb5 implementation supported by all clients/server
  • Three type of Quality of Protection (QOP):
  • NONE – Auth only. Checksum protection of RPC header
  • INTEGRITY – Checksum protection of RPC messages
  • PRIVACY – full encryption of RPC messages
  • Security flavor used on mount enforced to IO traffic as

well

slide-12
SLIDE 12

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 12

Security

NFS server (DS) NFS client NFS server (MDS)

'sec=krb5i'

Client always will use the same security flavor and Quality of Protection for all RPC traffic to MDS and DS.

slide-13
SLIDE 13

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 13

ID mapping

  • All principals are utf-8 strings
  • No UID, no GID
  • To resolve conflict NFSDOMAIN used:
  • “tigran@desy.de” vs. “tigran@cern.ch”
  • E.g. talk to corresponding mapping service
  • Mapping delegated to client and server
  • Client and server may use different mapping services/sources
  • Windows client and server do not need numeric ID
  • Best with ldap or nis
  • Linux client can use numeric strings
  • “124” => 124
slide-14
SLIDE 14

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 14

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-15
SLIDE 15

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 15

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-16
SLIDE 16

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 16

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-17
SLIDE 17

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 17

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-18
SLIDE 18

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 18

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-19
SLIDE 19

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 19

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-20
SLIDE 20

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 20

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-21
SLIDE 21

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 21

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-22
SLIDE 22

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 22

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-23
SLIDE 23

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 23

idmapping

idmapd

  • App. euid 3750

Chimera NFS client NFS server dCache gPlazma LDAP/NIS

3750 tigran 3750 3750 tigran 3750

'owner : tigran'

slide-24
SLIDE 24

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 24

Why so complicated?

  • Your favorite OS may not use numeric id
  • MS Windows uses principals
  • Your numeric ID on client and server may

be different

  • My ID on laptop 500, NIS 3750
slide-25
SLIDE 25

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 25

Existing servers

  • NETAPP (ONTAP 8.1 cluster mode, running at

DESY)

  • dCache
  • Production ready 1.9.12
  • In production since XXX
  • SONAS (IBM), Panasas, EMC, LinuxBox will

be ready by 2Q 2013

slide-26
SLIDE 26

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 26

Existing clients

  • Linux
  • 2.6.39 first usable kernel
  • Part of RHEL 6 (SL6), Fedora >= 15, Debian-

unstable

  • Oracle UEK2 for RHEL5 (sl5) + updates!
  • No AFS and some other kernel modules
  • Windows 7 64 bit
  • Opensource client from CITI
  • VMware ESX (pNFS client in VMware hypervisor)
slide-27
SLIDE 27

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 27

dCache implementation

slide-28
SLIDE 28

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 28

dCache NFS in one slide

Pools

(Data Server)

Pools

(Data Server)

Message passing layer

JVM JVM JVM Door

(MDS)

Pool Manager PnfsManager Pools

(DSs)

DBMS

slide-29
SLIDE 29

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 29

Implementation Status

  • No file striping yet
  • Supports RPCGSS_SEC (kerberos5 only)
  • Krb5, krb5i and krb5p
  • Even with Windows AD as KDC
  • Supports IPv6
  • Can be integrated with NIS and/or local

passwd file

  • Implemented as gPlazma plugins.
slide-30
SLIDE 30

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 30

Still Missing

  • Extended attribute support
  • Will provide native access (setfattr/getfattr) to

checksum, AL and RP

  • Only unix permission bits are supported
  • ACL expected by dcache-2.4, set/getfacl works today.
  • No striping
  • “Striping on read” will come soon.
  • No IO through Meta Data Server (door)
  • A problem if pool crashed/restarted
slide-31
SLIDE 31

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 31

Limitations

  • dCache files are immutable
  • No support of byte range locks
  • No support of multiple writers
slide-32
SLIDE 32

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 32

ACLs with NFSv4.1 in dCache

  • Will be available in dcache 2.4 (or 2.3)
  • Main focus on predictable semantic
  • Unix mode and ACL coexistence
  • NFSv3, FTP and NFS4 coexistence
slide-33
SLIDE 33

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 33

Prerequisites for NFSv4.1

1.Configured gPlazma 2.Kerberos5 keytab with nfs principals for NFS door and all pools. 3.Open TCP port 2049 in NFS door 4.Open TCP range 'net.lan.port.min:net.lan.port.max' on pools

  • Default is 33115:33145 (shared with dcap and xrootd)

5.Client with pNFS aware kernel (SL6.2 and FC16 recommended) 6.Configured rpc.idmapd to match NFS DOMAIN with server 7.Kerberos5 keytab with host principal 8.Start dCache, mount on client and access the data.

slide-34
SLIDE 34

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 34

Terminology

dCache equivalent MDS Meta Data Server dCache NFSv4.1 door DS Data server dCache pool QOP Quality Of Protection

slide-35
SLIDE 35

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 35

Typical IO scenario

  • Open
  • Read/write
  • close
slide-36
SLIDE 36

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 36

OPEN+READ/WRITE+CLOSE

  • Open
  • Open/Create a file
  • Find appropriate pool
  • Start a mover
  • Read/write
  • READ/WRITE
  • Close
  • Release mover
  • Close the file
slide-37
SLIDE 37

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 37

OPEN+READ/WRITE+CLOSE

App NFS Door PnfsManager PoolManager Pool

OPEN GET STORAGE INFO SELECT POOL START MOVER READ/WRITE CLOSE STOP MOVER STORAGE INFO OPEN ID GET POOL POOL MOVER ID POOL/MOVER ID

slide-38
SLIDE 38

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 38

Setup: enable door

# layout file # nfs-4.1 section [nfsdoorDomain/nfsv41] nfs.domain=desy.de nfsIoQueue=nfs # NFS uses direct access to Chimera database chimera.db.name = chimera chimera.db.host = localhost chimera.db.user = chimera chimera.db.password = #

slide-39
SLIDE 39

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 39

Setup: tweak rpcbind

# /etc/sysconfig/rpcbind # enable insecure mode RPCBIND_ARGS="-i" #

slide-40
SLIDE 40

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 40

Setup: start door/pool

# layout file [poolDomain/pool] name=pool-xxx ... poolIoQueue=nfs # # use one of the ports from range for movers. # only single port used per pool net.lan.port.min=33115 net.lan.port.max=33145 #

slide-41
SLIDE 41

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 41

Setup: tweak pool

# layout file [poolDomain/pool] name=pool-xxx ... poolIoQueue=nfs # # use one of the prots from range for movers. # only single port used per pool net.lan.port.min=33115 net.lan.port.max=33145 #

slide-42
SLIDE 42

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 42

Setup: gplazma

  • Identity plugin required
  • Nsswitch – uses local accounts, based on

/etc/nsswitch.conf

  • Uses accounts from gplazma host!
  • Nis – uses site NIS server
  • nfs.domain on door MUST match with

'Domain' on client

  • Linux conf at /etc/idmapd.conf
slide-43
SLIDE 43

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 43

Setup: gplazma

# layout file [gplazmaDomain/gplazma] gplazma.nis.server = nisserv6.desy.de gplazma.nis.domain = desy.de # # gplazma.conf identity requisite nis

slide-44
SLIDE 44

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 44

Setup: tweak idmap

# dcache.conf # maximal number of entries in the cache nfs.idmap.cache.size = 512 # cache entry maximal lifetime nfs.idmap.cache.timeout = 30 # Time unit used for timeout. # SECONDS|MINUTES|HOURS|DAYS) nfs.idmap.cache.timeout.unit = SECONDS

slide-45
SLIDE 45

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 45

Setup: read-to-go

  • dcache start
slide-46
SLIDE 46

dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 | Page 46

Anatomy of NFS package

NFSv41 (rfc 5661) RPC (rfc 1831)

RPCCES_GSS (rfc 2203)

XDR (rfc 1832)