Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC - - PowerPoint PPT Presentation

filesystems and i o balance on the nersc t3e
SMART_READER_LITE
LIVE PREVIEW

Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC - - PowerPoint PPT Presentation

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTE R Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC Systems Group This work was supported by the Director, Office of Advanced Scientific Computing Research, Division of


slide-1
SLIDE 1

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1

Filesystems and I/O Balance on the NERSC T3E

Tina Butler, NERSC Systems Group

This work was supported by the Director, Office of Advanced Scientific Computing Research, Division of Mathematical, Information, and Computational Sciences of the U.S. Department of Energy under contract number DE-AC03-76SF00098.

slide-2
SLIDE 2

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

2

What is NERSC?

¥ National Energy Research Scientific Computing Center

Ð Funded by DOE Office of Science Ð Located at Lawrence Berkeley National Lab Ð Provides Computational Resources to the following programs

¥ Fusion Energy ¥ High Energy and Nuclear Sciences ¥ Basic Energy Sciences ¥ Biology and Environmental Research ¥ Computational and Environmental Research

Ð Approximately 2500 Users from Major Universities and Government Labs Ð Hardware: 696 PE T3E-900, 1 J90 SE system (32 CPUs) & 3 SV1 (64 processors)

slide-3
SLIDE 3

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

3

Mcurie - The NERSC T3E

¥ T3E 900 with 696 PEs running UNICOS/MK 2.0.4.67 ¥ 644 APP PEs ¥ 256 MB per PE ¥ 22 Gigarings ¥ 12 FCNs ¥ 8 MPNs ¥ 2 HPNs

slide-4
SLIDE 4

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

4

MPN22 16 disks MPN23 16 disks MPN24 16 disks MPN25 8 disks

FCN16 25 disks w/5 parity FCN17 25 disks w/5 parity FCN21 30 disks w/6 parity FCN20 30 disks w/6 parity FCN15 30 disks w/6 parity FCN14 30 disks w/6 parity FCN01 25 disks w/5 parity FCN02 25 disks w/5 parity FCN06 30 disks w/6 parity FCN05 30 disks w/6 parity FCN04 30 disks w/6 parity FCN03 30 disks w/6 parity

MPN0 16 disks MPN10 16 disks MPN11 24 disks MPN12 24 disks HPN07 HPN13

mcurie.nersc.gov Cray T3E900 LC696-256 174 GB/21.75 GW memory 2.76 TB disk

Multipurpose Node HIPPI Node FibreChannel Node Gigaring

slide-5
SLIDE 5

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

5

NERSC Job Mix - Application Mix

¥ Applications from the fields of

Ð Chemistry Ð Materials Science Ð Fusion Energy Ð Geophysics Ð Biology Ð High Energy Nuclear Physics Ð Climate Modeling Ð Astrophysics Ð Computational Fluid Dynamics

¥ Mostly user-written codes

slide-6
SLIDE 6

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

6

NERSC Job Mix - Diverse and Dynamic

App Size ( PEs) % of all Apps % of PE Hours 2 - 16 56 6 17 - 64 38 56 65 - 128 5 29 129 - 512 1 9 App Run Time % of all Apps % of PE Hours 0 – 10 min 56 1 10 – 30 min 23 10 0.5 – 3.5 hr 17 49 3.5 – 12.0 hr 4 40

Mix of Development, Capacity and Capability computing

slide-7
SLIDE 7

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

7

Mcurie Filesystems - performance

¥ 68 Fibre Channel disk arrays ¥ Striping of swap and checkpoint ¥ pcache for metadata optimization on root, usr, opt ¥ primary/secondary partitions ¥ remote mount file servers

slide-8
SLIDE 8

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

8

Mcurie Filesystems - resiliency

¥ Mirroring of primary partitions for homes and usrtmp ¥ Alternate path for all arrays ¥ Sized for feasible dump/restore

slide-9
SLIDE 9

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

9

Mcurie Filesystems - swap and checkpoint

¥ NERSC uses both checkpointing and gang scheduling for system scheduling ¥ Swap - 383 Gigabytes - 2.4 times APP memory ¥ Checkpoint - 582 Gigabytes - 3.6 times APP memory ¥ Filesystems have 5 logical partitions that are 5 or 6-way striped on FCN disk ¥ 800 MB/sec observed on checkpoint ¥ Full machine checkpoint regularly under 5 minutes

slide-10
SLIDE 10

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 0

Mcurie Filesystems - homes

¥ Multiple filesystems to distribute user load and risk ¥ Configured for full mirroring ¥ Six filesystems - 25 GB on MPN disks ¥ Approximately 150 users per filesystem

slide-11
SLIDE 11

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 1

Mcurie Filesystems - homes

8 1 92 32 7 68 1 31 0 72 5 24 2 88 2 0 97 1 52 8 3 88 6 08 3 3 5 54 4 32 1 3 4 2 17 7 28 5 3 6 8 70 9 12 / u1 / u2 / u3 / u4 / u5 / u6 T o ta l 1 10 1 00 1 0 00 10 0 00 1 00 0 00 1 0 00 0 00 N um b er of fil es File s i ze ( by t es ) File s y s tem

F ile d is tri b u tio n o n mc ur ie h o m e s

/ u1 / u2 / u3 / u4 / u5 / u6 To ta l

slide-12
SLIDE 12

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 2

Mcurie Filesystems - /usr/tmp

¥ Main area for user data files ¥ 1.5 TB of FCN disk arrays ¥ Primary/secondary partition configuration to allow mirroring of metadata

slide-13
SLIDE 13

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 3

Mcurie filesystems - space management

¥ Hard quotas on user-writable filesystems ¥ Home filesystems - 4 GB and 3500 inodes ¥ /usr/tmp filesystem - 70 GB and 6000 inodes ¥ Homes migrated to HPSS under Cray DMF control ¥ /usr/tmp - purging of files inactive for 14 days

slide-14
SLIDE 14

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 4

Mcurie Filesystems - homes

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 u1 u6 100000 200000 300000 400000 500000 600000 Days Megabytes

mcurie home IO volume - combined

u1 u2 u3 u4 u5 u6

slide-15
SLIDE 15

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 5

Mcurie Filesystems - homes

mcurie home filesystems IO volume

100000 200000 300000 400000 500000 600000 700000 800000 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 226 235 244 253 262 271 280 Days Megabytes Read Write

slide-16
SLIDE 16

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 6

Mcurie Filesystems - home

/u4 Average Daily Transfer Rate

50 100 150 200 250 11/98 to 08/99 4K Blocks/sec Avg Write Avg Read

slide-17
SLIDE 17

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 7

Mcurie filesystems - /usr/tmp

mcurie /usr/tmp IO Volume

500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 226 235 244 253 262 271 280 Days Megabytes Read Write

slide-18
SLIDE 18

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 8

Mcurie Filesystems - DMF traffic

5000 10000 15000 20000 25000 Megabytes 199810 Total 199811 Total 199812 Total 199901 Total 199902 Total 199903 Total 199904 Total 199905 Total 199906 Total 199907 Total 199908 Total 199909 Total Month

mcurie DMF Monthly Volume - FY99

Puts Gets

slide-19
SLIDE 19

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

1 9

Mcurie Filesystems - DMF traffic

1 10 100 1000 10000 Number of accesses 199810 Total 199811 Total 199812 Total 199901 Total 199902 Total 199903 Total 199904 Total 199905 Total 199906 Total 199907 Total 199908 Total 199909 Total Month

mcurie DMF Monthly Puts and Gets - FY99

Puts Gets

slide-20
SLIDE 20

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

2 0

Mcurie Filesystems - HPSS traffic

HPSS-mcurie Data Volume

10000 20000 30000 40000 50000 60000 70000 1 / 4 / 9 8 1 / 1 8 / 9 8 1 1 / 1 / 9 8 1 1 / 1 5 / 9 8 1 1 / 2 9 / 9 8 1 2 / 1 3 / 9 8 1 2 / 2 7 / 9 8 1 / 1 / 9 9 1 / 2 4 / 9 9 2 / 7 / 9 9 2 / 2 1 / 9 9 3 / 7 / 9 9 3 / 2 1 / 9 9 4 / 4 / 9 9 4 / 1 8 / 9 9 5 / 2 / 9 9 5 / 1 6 / 9 9 5 / 3 / 9 9 6 / 1 3 / 9 9 6 / 2 7 / 9 9 7 / 1 1 / 9 9 7 / 2 5 / 9 9 8 / 8 / 9 9 Date Megabytes Puts - MB Gets - MB

slide-21
SLIDE 21

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTE

R

2 1

Mcurie Filesystems - Conclusions

¥ User home filesystems are well balanced in file distribution and transfer load ¥ Data migration is a relief valve for homes, but not a critical resource yet ¥ /usr/tmp filesystem buffers user intermediate data ¥ HPSS is being used as a long-term archive resource for user data ¥ NERSCÕs T3E storage resources are successful in supporting the growing utilization of the system