AFS at Intel AFS at Intel Travis Broughton Travis Broughton

Agenda Agenda Intel’s Engineering Environment Intel’s Engineering Environment Things AFS Does well Things AFS Does well How Intel uses AFS How Intel uses AFS How not to use AFS How not to use AFS Management Tools Management Tools

Intel’s Engineering Environment Intel’s Engineering Environment Learned about AFS in 1991 Learned about AFS in 1991 First deployed AFS in Intel’s Israel First deployed AFS in Intel’s Israel design center in 1992 design center in 1992 Grew to a peak of 30 cells in 2001 Grew to a peak of 30 cells in 2001 Briefly considered DCE/DFS Briefly considered DCE/DFS migration in 1998 (the first time AFS migration in 1998 (the first time AFS was scheduled to go away…) was scheduled to go away…)

Intel’s Engineering Environment Intel’s Engineering Environment ~95% NFS, ~5% AFS ~95% NFS, ~5% AFS ~20 AFS cells managed by ~10 regional ~20 AFS cells managed by ~10 regional organizations organizations AFS used for CAD and /usr/local AFS used for CAD and /usr/local applications, global data sharing for applications, global data sharing for projects, secure access to data projects, secure access to data NFS used for everything else, gives NFS used for everything else, gives higher performance in most cases higher performance in most cases Wide range of client platforms, OSs, etc Wide range of client platforms, OSs, etc

Cell Topology Considerations Cell Topology Considerations Number of sites/campuses/buildings Number of sites/campuses/buildings to support to support Distance (latency) between sites Distance (latency) between sites Max # of replicas needed for a Max # of replicas needed for a volume volume Trust Trust … As a result, Intel has many cells As a result, Intel has many cells …

Things AFS Does Well Things AFS Does Well Security Security • Uses Kerberos, doesn’t have to trust client Uses Kerberos, doesn’t have to trust client • Uses ACLs, better granularity Uses ACLs, better granularity Performance for frequently-used files Performance for frequently-used files • e.g. /usr/local/bin/perl e.g. /usr/local/bin/perl High availability for RO data High availability for RO data Storage virtualization Storage virtualization Global, delegated namespace Global, delegated namespace

AFS Usage at Intel: AFS Usage at Intel: Global Data Sharing Global Data Sharing Optimal use of compute resources Optimal use of compute resources • Batch jobs launched from site x may land at site y, Batch jobs launched from site x may land at site y, depending on demand depending on demand Optimal use of headcount resources Optimal use of headcount resources • A project based at site x may “borrow” idle headcount A project based at site x may “borrow” idle headcount from site y without relocation from site y without relocation Optimal license sharing Optimal license sharing • A project based at site x may borrow idle software licenses A project based at site x may borrow idle software licenses (assuming contract allows “WAN” licensing) (assuming contract allows “WAN” licensing) Efficient IP reuse Efficient IP reuse • A project based at site x may require access to the most A project based at site x may require access to the most recent version of another project being developed at site y recent version of another project being developed at site y Storage virtualization and load balancing Storage virtualization and load balancing • Many servers – can migrate data to balance load and do Many servers – can migrate data to balance load and do maintenance during working hours maintenance during working hours

AFS Usage at Intel: AFS Usage at Intel: Other Applications Other Applications x-site tool consistency x-site tool consistency • Before rsync was widely deployed and SSH-tunneled, used Before rsync was widely deployed and SSH-tunneled, used AFS namespace to keep tools in sync AFS namespace to keep tools in sync @sys simplifies multiplatform support @sys simplifies multiplatform support • Environment variables, automounter macros are reasonable Environment variables, automounter macros are reasonable workarounds workarounds “@cell” link at top-level of AFS simplifies @cell” link at top-level of AFS simplifies “ namespace namespace • In each cell, @cell points to the local cell In each cell, @cell points to the local cell • Mirrored data in multiple cells can be accessed through the Mirrored data in multiple cells can be accessed through the same path (fs wscell expansion would also work) same path (fs wscell expansion would also work) /usr/local, CAD tool storage /usr/local, CAD tool storage • Cache manager outperforms NFS Cache manager outperforms NFS • Replication provides many levels of fault-tolerance Replication provides many levels of fault-tolerance

Things AFS Doesn’t Do Well Things AFS Doesn’t Do Well Performance on seldom-used files Performance on seldom-used files High availability for RW data High availability for RW data Scalability with SMP systems Scalability with SMP systems Integration with OS Integration with OS File/volume size limitations File/volume size limitations

When NOT to Use AFS When NOT to Use AFS CVS repositories CVS repositories • Remote $CVSROOT using SSH seems to Remote $CVSROOT using SSH seems to work better work better rsync rsync Any other tool that would potentially Any other tool that would potentially thrash the cache… thrash the cache…

Other Usage Notes Other Usage Notes Client cache is better than nothing, Client cache is better than nothing, but shared “edge” cache may be but shared “edge” cache may be better better Mirroring w/ rsync accomplishes this for • Mirroring w/ rsync accomplishes this for RO data RO data • Client disk is very cheap, shared Client disk is very cheap, shared (fileserver) disk is fairly cheap, WAN (fileserver) disk is fairly cheap, WAN bandwidth is still costly (and latency can bandwidth is still costly (and latency can rarely be reduced) rarely be reduced)

OpenAFS at Intel OpenAFS at Intel Initially used contrib’d AFS 3.3 port for Initially used contrib’d AFS 3.3 port for Linux Linux Adopted IBM/Transarc port when it Adopted IBM/Transarc port when it became available became available Migrated to OpenAFS when kernel churn Migrated to OpenAFS when kernel churn became too frequent became too frequent Openafs-devel very responsive to bug Openafs-devel very responsive to bug submissions submissions • Number of bug submissions (from Intel) Number of bug submissions (from Intel) tapering off – client has become much more tapering off – client has become much more stable stable

Management Tools Management Tools Data age indicators Data age indicators • Per-volume view only Per-volume view only • 11pm (local) nightly cron job to collect volume access statistics 11pm (local) nightly cron job to collect volume access statistics idle++ if accesses==0, else idle=0 idle++ if accesses==0, else idle=0 Mountpoint database Mountpoint database • /usr/afs/bin/salvager –showmounts on all fileservers /usr/afs/bin/salvager –showmounts on all fileservers • Find root.afs volume, traverse mountpoints to build tree Find root.afs volume, traverse mountpoints to build tree MountpointDB audit MountpointDB audit • Find any volume names not listed MpDB Find any volume names not listed MpDB • Find unused read-only replicas (mounted under RW) Find unused read-only replicas (mounted under RW) Samba integration Samba integration • Smbklog Smbklog “Storage on Demand” Storage on Demand” “ • Delegates volume creation (primarily for scratch space) to Delegates volume creation (primarily for scratch space) to users, with automated reclaim users, with automated reclaim

Management Tools Management Tools Recovery of PTS groups Recovery of PTS groups • Cause – someone confuses “pts del” and “pts rem” Cause – someone confuses “pts del” and “pts rem” • Initial fix – create a new cell, restore pts db, use pts exa to get Initial fix – create a new cell, restore pts db, use pts exa to get list of users list of users • Easier fix – wrap pts to log pts del, capture state of group before Easier fix – wrap pts to log pts del, capture state of group before deleting deleting • Even better fix – do a nightly text dump of your PTS DB Even better fix – do a nightly text dump of your PTS DB Mass deletion of volumes Mass deletion of volumes • Cause – someone does “rm –rf” equivalent in the wrong place Cause – someone does “rm –rf” equivalent in the wrong place (most recent case was a botched rsync) (most recent case was a botched rsync) • Initial fix – lots of vos dump .backup/.readonly | vos restore Initial fix – lots of vos dump .backup/.readonly | vos restore Disks fill up, etc Disks fill up, etc • Other fixes – watch size of volumes, and alert if some threshold Other fixes – watch size of volumes, and alert if some threshold change is exceeded change is exceeded Throw fileserver into debug mode, capture IP address doing the Throw fileserver into debug mode, capture IP address doing the damage and lock it down damage and lock it down

AFS at Intel AFS at Intel Travis Broughton Travis Broughton - PowerPoint PPT Presentation

AFS at Intel AFS at Intel Travis Broughton Travis Broughton Agenda Agenda Intels Engineering Environment Intels Engineering Environment Things AFS Does well Things AFS Does well How Intel uses AFS How Intel uses AFS How not to use

AFS Perl Module AFS Perl Module Alf Wachsmann Stanford Linear Accelerator Center

Antidot Training AFS@Store AFS@Store Introduction 2 Antidot solution for E-Commerce 3 What

Diskless Booting via AFS Summary of Advantages of AFS over NFS as a network subsystem for

Backing UP AFS Using TSM Xueshan Feng Stanford University, March 24 th , 2004 ABSTRACTION AFS is

AFS and Kerberos 5 Ken Hornstein Naval Research Laboratory Kerberos Use in AFS, old school

AFS SKEWER BURNER TECHNOLOGY OPTIMIZING WHOLE TIRE FUEL FOR PRECALCINER KILNS AFS SKEWER

Andrew Deason Sine Nomine Associates European AFS and Kerberos Conference 2012 Agenda Why is

Using Container-specific Sysnames Andrew Deason June 2019 OpenAFS Workshop 2019 1 The Problem

Scale and Performance in a Distributed File System (AFS) Howard et al. CMU 1988, ACM TOCS

Whose Bits Are They, Anyway? Access Controlled Applications Built Around AFS DJ Byrne

AFS and Marine Fisheries Thomas E. Bigford Policy Director,

AFS Cell Management Tools and Techniques Russ Allbery March 20, 2004 Russ Allbery

Hit The Ground Running: AFS Fifteen minutes of information you need to understand how to install

Whose Bits Are They, Anyway? Access Controlled Applications Built Around AFS DJ Byrne

Serving Web Pages Out of AFS General Setup Overview www systems software

AFS Server Performance Comparisons Bo Tretta Kim Kimball Jet Propulsion Laboratory Information

Reflection-based Word Attribute Transfer Background Analogy Analogy in the embedding space

Lecture 12- ECE 240a Threshold Mirror Loss Ver Chap. 8-9 Threshold Conditions Homogeneous

QuickCheck John Hughes DEMO Registry tests on multiple nodes 40000 35000 30000 Tests per

Unifying Mirror Symmetry Constructions David Favero favero@ualberta.ca University of Alberta

NAT64/DNS64 real life experiments and one useful tool NAT64Check ...from go6lab.si and

Automating Drudgery Most of the techniques in this course can benefit from Project Management

Tools Suite Keith Mitchell, Jerry Lundstr m https://www.dns-oarc.net/ OARC's Mission The

Why Open Data May Threaten Your Privacy Sebastian Pape, Jetzabel Serna-Olvera, Welderufael B.