AFS Performance Simon Wilkinson Your File System Ltd - - PowerPoint PPT Presentation

afs performance
SMART_READER_LITE
LIVE PREVIEW

AFS Performance Simon Wilkinson Your File System Ltd - - PowerPoint PPT Presentation

AFS Performance Simon Wilkinson Your File System Ltd sxw@your-file-system.com Wednesday, 17 October 12 The life of a request vlserver ptserver Cache Network Fileserver Disk Application Manager Wednesday, 17 October 12 The life of a


slide-1
SLIDE 1

AFS Performance

Simon Wilkinson Your File System Ltd sxw@your-file-system.com

Wednesday, 17 October 12

slide-2
SLIDE 2

The life of a request

Application Cache Manager Network Fileserver Disk vlserver ptserver

Wednesday, 17 October 12

slide-3
SLIDE 3

The life of a request

Application Cache Manager Network Fileserver Disk vlserver ptserver

Wednesday, 17 October 12

slide-4
SLIDE 4

Application performance

  • Constantly changing tokens is really expensive

Wednesday, 17 October 12

slide-5
SLIDE 5

The life of a request

Application Cache Manager Network Fileserver Disk vlserver ptserver

Wednesday, 17 October 12

slide-6
SLIDE 6

vlserver performance

  • The client accesses the vlserver the first time a new

volume is encountered

  • Caches results for up to 2 hours

Wednesday, 17 October 12

slide-7
SLIDE 7

The life of a request

Application Cache Manager Network Fileserver Disk vlserver ptserver

Wednesday, 17 October 12

slide-8
SLIDE 8

The Virtual File System

  • Most modern OSes have a virtual filesystem
  • Accepts POSIX system calls
  • Implements common functionality
  • Provides API which filesystems must implement

Wednesday, 17 October 12

slide-9
SLIDE 9

Cache Manager in more detail

VFS page cache cache manager disk filesystem

Wednesday, 17 October 12

slide-10
SLIDE 10

Virtual Memory Filesystems (read)

  • All data operations are mediated by

the page cache

  • VFS checks first for up to date data in

cache, and returns this to application

  • Otherwise, filesystem is requested to

fetch date into page cache

  • Either way, caller’s data comes from

the cache VFS page cache filesystem

Wednesday, 17 October 12

slide-11
SLIDE 11

Virtual Memory Filesystems (write)

  • Writes go first to the memory cache
  • Written out to the filesystem in the

background, or when requested

  • Page cache is shared between all

filesystems, and managed by the kernel

  • Use of virtual memory essential for

mmap() support VFS page cache filesystem

Wednesday, 17 October 12

slide-12
SLIDE 12

Cache Manager: Reads

VFS page cache cache manager disk filesystem

Wednesday, 17 October 12

slide-13
SLIDE 13

Cache Manager: Writes

VFS page cache cache manager disk filesystem

Wednesday, 17 October 12

slide-14
SLIDE 14

Cache Manager: Memory cache

VFS page cache cache manager AFS memcache

Wednesday, 17 October 12

slide-15
SLIDE 15

Memory cache: pros and cons

  • Memory cache is faster than disk cache
  • Cache size is limited to the memory on your machine
  • Memory cache space cannot be used for other purposes
  • Memory cache cannot partially use chunks

Wednesday, 17 October 12

slide-16
SLIDE 16

Chunksize

  • The cache is split into a set of chunks
  • Chunksize is 8k on memory cache, and autotuned on

disk cache, according to the cache size

  • Chunksize also determines the amount of data fetched

with each read from the fileserver dcache x chunksize = blocks

Wednesday, 17 October 12

slide-17
SLIDE 17

Chunksize pros and cons

  • Chunksize has a big impact on performance
  • Reading 1Mbyte when the application only wants one byte
  • Reading a byte a thousand times when the application reads

1Mbyte byte by byte

  • If your application has no locality of access and a working

set much larger than your cache, large chunk sizes really hurt performance

Wednesday, 17 October 12

slide-18
SLIDE 18

The Global Lock

  • The OpenAFS cache manager was written when kernels
  • nly had interrupt and normal contexts
  • SMP conversion was done by means of a global lock
  • Parallel access speeds are poor

Wednesday, 17 October 12

slide-19
SLIDE 19

Linux page copying improvements

kilobytes/sec size OpenAFS 1.4 OpenAFS 1.6 ext3 (cache fs)

4G 1G 512M 256M 128M 64M 1M 512k 256k 128k 64k 2M 4M 8M 16M 32M 2G 750000 1500000 2250000 Wednesday, 17 October 12

slide-20
SLIDE 20

Avoiding the Cache

  • fs bypassthreshold -size <filesize>
  • Any files larger than filesize will not be cached
  • Use file options to allow application to bypass cache
  • pen(“/afs/mycell/path/to/file”,

O_RDONLY | O_DIRECT)

Wednesday, 17 October 12

slide-21
SLIDE 21

Using the cache more

  • fs precache -size <filesize>
  • Controls a readahead size

Wednesday, 17 October 12

slide-22
SLIDE 22

Perception and storebehind

  • On Unix, AFS is write on close
  • Users (and applications) don’t expect close to take a long

time!

  • fs storebehind allows writes to the fileserver to happen in

the background

  • BUT if the write fails, no one will know

Wednesday, 17 October 12

slide-23
SLIDE 23

Bulkstat, fakestat and friends

  • Mainly concentrating on data operations, but metadata
  • ps can have performance impact too
  • The afsd option -fakestat avoids looking up the

root.cell volume of every cell in /afs

  • The option -fakestat-all blocks stat lookups of all

mountpoints

  • Bulkstat (not on Mac OS X) lumps multiple stat operations

into a single RPC

Wednesday, 17 October 12

slide-24
SLIDE 24

Tuning the cache manager

  • Apart from the stuff already mentioned, auto tuning works

well in 1.6

  • Make sure your startup script doesn’t hard code in

appropriate values!

Wednesday, 17 October 12

slide-25
SLIDE 25

The life of a request

Application Cache Manager Network Fileserver Disk vlserver ptserver

Wednesday, 17 October 12

slide-26
SLIDE 26

Network performance

RX UDP stack Network UDP stack RX

Wednesday, 17 October 12

slide-27
SLIDE 27

RX: Ooops

5000 10000 15000 20000 1.5.50 1.5.52 1.5.54 1.5.56 1.5.58 1.5.60 1.5.62 1.5.64 1.5.66 1.5.68 1.5.70 1.5.72 1.5.74 1.5.76 1.5.78

Time in ms to perform an RPC with 20Mbyte of data with each OpenAFS version

Wednesday, 17 October 12

slide-28
SLIDE 28

RX performance work

500 1000 1500 2000 OpenAFS 1.4 OpenAFS 1.6 OpenAFS master Experimental RX

Wednesday, 17 October 12

slide-29
SLIDE 29

UDP Stack

  • Don’t run out of packets
  • 30 simultaneous clients moving 1Mbyte of data each is

enough to swamp Linux’s default UDP buffer size

[magrathea]sxw: netstat -su Udp: 7200071 packets received 123 packets to unknown port received. 3283 packet receive errors 7194192 packets sent RcvbufErrors: 3283

Wednesday, 17 October 12

slide-30
SLIDE 30

Jumbograms

  • Jumbograms send UDP payloads larger than the

standard Ethernet MTU

  • Now turned off by default - it broke on too many

networks

  • BUT - fragmented packets can actually be faster
  • Also provides a way of exploiting larger MTUs

Wednesday, 17 October 12

slide-31
SLIDE 31

The life of a request

Application Cache Manager Network Fileserver Disk vlserver ptserver

Wednesday, 17 October 12

slide-32
SLIDE 32

ptserver performance

  • Fileserver contacts the ptserver with every new

connection

  • Until the ptserver responds, fileserver thread is blocked
  • There aren’t many fileserver threads
  • Be very careful about ptserver response time, and

shutting down ptservers for maintenance

Wednesday, 17 October 12

slide-33
SLIDE 33

The life of a request

Application Cache Manager Network Fileserver Disk vlserver ptserver

Wednesday, 17 October 12

slide-34
SLIDE 34

Fileserver tuning - threads

  • Each simultaneous incoming call requires a thread
  • OpenAFS 1.6 allows a maximum of 16384 threads (of

which 16376 are available for calls)

rxdebug <server> Trying 192.168.0.1 (port 7000): Free packets: 3279, packet reclaims: 554, calls: 575230, used FDs: 64 not waiting for packets. 0 calls waiting for a thread 122 threads are idle 0 calls have waited for a thread

Wednesday, 17 October 12

slide-35
SLIDE 35

Ensure you have sufficient callbacks

  • Fileserver has a limited amount of space for callbacks, set

at run time.

  • Check whether you’re running out !

./xstat_fs_test -collID 3 -fsname lammasu.inf.ed.ac.uk 0 DeleteFiles 1517 DeleteCallBacks 0 BreakCallBacks 382707 AddCallBack 0 GotSomeSpaces 23265 DeleteAllCallBacks 33 nFEs 192 nCBs 100000 nblks 7327 CBsTimedOut 0 nbreakers 0 GSS1 0 GSS2 0 GSS3 0 GSS4 0 GSS5

Wednesday, 17 October 12

slide-36
SLIDE 36

UDP buffers

  • Make sure that the -udpsize parameter is big enough!
  • Don’t worry about -rxpcks - we autotune this now

Wednesday, 17 October 12

slide-37
SLIDE 37

Abort threshold

  • Fileserver protects itself against misbehaving clients
  • If a client sends more than a configured number of failed

requests it is throttled

  • Very easy to be throttled by doing, for example, ls on a

directory you don’t have permission for

  • -abortthreshold controls this

Wednesday, 17 October 12

slide-38
SLIDE 38

The life of a request

Application Cache Manager Network Fileserver Disk vlserver ptserver

Wednesday, 17 October 12

slide-39
SLIDE 39

Journalling woes

Wednesday, 17 October 12

slide-40
SLIDE 40

Dodgy RAID

  • Poor performance RAID arrays can have big effects on

fileserver performance

  • In particular, RAID 5 is evil

Wednesday, 17 October 12

slide-41
SLIDE 41

Tuning your OS

  • Normal operating system tuning for high speed I/O

applies

  • Memory not used by the fileserver will be used for page

cache - the more the merrier!

Wednesday, 17 October 12

slide-42
SLIDE 42

Questions

Wednesday, 17 October 12