HUF 2017 KEK site report Share Our Experience Koichi Murakami - PowerPoint PPT Presentation

HUF 2017 KEK site report Share Our Experience Koichi Murakami (KEK/CRC) HUF 2017 KEK, Tsukuba 1 Oct/19/2017 HUF 2017 KEK, TSUKUBA

KEK Diversity in accelerator based sciences High Energy Accelerator Research Organization Pursuing fundamental laws of nature Pursuing origin of function in materials Material science Basic science and its applications Photon factory X-ray as a probe KEK Technical development 技術の波及 and its applications J-PARC MLF neutron and m Superconducting as a probe Energy recovery linac accelerator T2K neutrino exp. SuperKEKB and Belle II Accelerator- based BNCT COMET J-PARC Hadron hall 4 2 Oct/19/2017 HUF 2017 KEK, TSUKUBA

Super KEKB / Belle II SuperKEKB/Belle II is 40 times more powerful machine compered to the previous B factory experiment, KEKB/Belle. Integ. Lum. (ab -1 ) Assumptions: Goal 9 months / year 20 days / month 3 Oct/19/2017 HUF 2017 KEK, TSUKUBA

KEKCC 2016 System Resources GPFS Disk HSM Belle II front-end disk IBM ESS x 8 CPU : 10,024 cores 10 PB p Intel Xeon E5-2697v3 (2.6GHz, 14cores) x 2 358 nodes p 4GB/core (8,000 cores) / 8GB/core (2,000 cores) (for app. use) p 236 kHS06 / site IBM ESS IBM TS3500 / TS1150 HSM Cache Disk 600 TB Tape : 70 PB (max) DDN SFA14K 3PB Disk : 10PB (GPFS) + 3PB (HSM cache) IB 4xFDR Interconnect : IB 4xFDR 10 GbE Tape : 70 PB (max cap.) HSM data : 11 PB data, 220 M files, Work / Batch Servers SX6518 5,200 tapes Nexsus 7018 40 GbE Lenovo NextScale 358nodes Grid EMI servers 10,024 cores Total throughput : 100 GB/s (Disk, GPFS), 55TB memory Lenovo x3550 M5 50 GB/s (HSM, GHI) 36 nodes FW SRX3400 Belle II front-end servers JOB scheduler : Platfrom LSF v10.1 Lenovo x3550 M5 5 nodes Facility Tour on Friday 4 Oct/19/2017 HUF 2017 KEK, TSUKUBA

HSM system HPSS/GHI servers TS3500 GPFS (GHI) : ３ PB DDN SFA 12K Total throughput : > 50 GB/s 5 Oct/19/2017 HUF 2017 KEK, TSUKUBA

Tape System Tape Library p IBM TS3500 (13 racks) p Max. capacity : 70 PB Tape Drive p TS1150 ： 54 drives p TS1140 : 12 drives (for media conversion) IBM TS3500 Tape Media p JD : 10TB, 360 MB/s p JC5 ： 7TB, 300 MB/s (reformatted) p JC4 : 4TB, 250 MB/s p Reformatting was done in background for 10 months (expected). p Users (experiment groups) pay tape media they use. 6 Oct/19/2017 HUF 2017 KEK, TSUKUBA

GHI, GPFS + HPSS : The Best of Both Worlds HPSS p We have used HPSS as HSM system for last 15+ years. p 1 st layer : GGPS DDN 3PB + 2 nd layer : IBM Tape GHI, GPFS + HPSS p GPFS parallel file system staging area p Perfect coherence with GPFS access (POSIX I/O) p KEKCC is the pioneer of GHI customers (since 2012). p Data access with high I/O performance and good usability. p Same access speed as GPFS, once data staged p No HPSS client API, no changes in user codes p small file aggregation helps tape performance for small data 7 Oct/19/2017 HUF 2017 KEK, TSUKUBA

New system configuration params. Component Qty. Software Version HPSS 1 Core Server HPSS 7.4.3 p2 efix1 HPSS 4 Disk Mover GHI 2.5.0 p1 HPSS 3 Tape Mover GPFS 4.2.0.1 Mover Storage 600 TB OS RHEL 6.7 (HPSS nodes) Max. #Files 2 Billion OS RHEL7.1 GHI IOM 6 (GHI nodes) GHI Session 3 Server 8 Oct/19/2017 HUF 2017 KEK, TSUKUBA

Data Processing Cycle Raw data Requirements for storage system p Experimental data from detectors, transferred to p High availability storage system in real-time. (considering electricity cost for operating acc.) p 2GB/s, sustained for Belle II experiment p Scalability up to hundreds PB p x5 the amount of simulation data p Data-intensive processing w/ high I/O performance p Migration to tape, processing to DST, then purged p Hundreds MB/s I/O for many concurrent accesses (Nx10k) from jobs p “Semi-Cold” data (tens to hundreds PB) p Local jobs and GRID jobs (distributed analysis) p Reprocessed sometimes p Data portability to GRID services (POSIX access) DST ( Data Summary Tapes ) p “Hot data” ( ~ tens PB) p Data processing to make physics data p Data shared with various ways (GRID access) Physics summary data p Handy data set for reducing physics results (N- tuple data) 9 Oct/19/2017 HUF 2017 KEK, TSUKUBA

system improvements (1) Separated GPFS clusters ☐ GPFS disk system (10PB) and GHI GPFS system (3PB) ☐ using GPFS remote cluster mount ☐ ITO stability and system management (maintenance, updates,..) COS supports mixed media types. ☐ Can mix different types of tape media as RW COS ☐ JB/JC/JD Purge policy changed for small size files ☐ number of small files is huge, but less impact on disk space ☐ do not purge small size of files ☐ < 8 MB to < 40 MB / 100 MB ☐ depends on file size distributions in file system 10 Oct/19/2017 HUF 2017 KEK, TSUKUBA

system improvements (2) Improve GHI migration ☐ old : listing all migration files, then migrate one time: ☐ Single migration requests for >100 k files overflows the hpss queues, migration stalled. ☐ new : migration by 10k files with ghi_backup 11 Oct/19/2017 HUF 2017 KEK, TSUKUBA

system migration works HSM service on the old system ☐ 3-days downtime for system migration (backup of the current / restore in the new) ☐ Keep GPFS disk mount (read-only) for 2 weeks before the new system ☐ Only staged data on disk is accessible. System migration ☐ 8.5 PB data, 170 M files, 5,000 tapes ☐ 3-days work on Aug / 15 – 17, 2016 ☐ Move physical tapes from the current to new tape library ☐ DB2 migration using QRep ☐ GHI backup and restore ☐ Staging is necessary in the new system. ☐ Admin staging for important data Take checksum for tape data ☐ 6 months work for higher priority data ☐ Taken directly from tapes (tape-ordered, htared file for small files, as hpss file) ☐ 200 MB/s in average, 4,000 vols. ☐ Store checksum and timestamp into GPFS UDA 12 Oct/19/2017 HUF 2017 KEK, TSUKUBA

Operational issues (HpSS/GHI) “Overload request on staging” ☐ All data is in the status of purged in the new system due to system migration. ☐ We did not take system downtime for data staging. ☐ data staging in operation : both admin and user staging ☐ GHI staging priority ☐ Initially : user staging > admin staging (ghi_stage, tape-ordered) ☐ Admin staging was not processed.-> user staging piled up ( Bad spiral ) ☐ Heavy load staging process ☐ hits bugs!! ☐ identify some bad points ☐ patches applied ☐ Thoughts on data migration ☐ enable D2D migration for staged data, late biding D/T data? ☐ Runtime conversion between GPFS and HPPSS could help (3.0.0) 13 Oct/19/2017 HUF 2017 KEK, TSUKUBA

review on staging performance CPU usage Belle Belle2 Grid Had T2K CMB ILC Others What is the bottle neck? 250K Staging performance in long-term view 200K Sep CORES*DAYS 150K ☐ Sep – Dec / 2016 100K ☐ staged files / min. (hourly averaged, GHI) 50K ☐ We can see : 2016/04 2016/05 2016/06 2016/07 2016/08 2016/09 2016/10 2016/11 2016/12 2017/01 2017/02 2017/03 ☐ spikes of admin staging (HPSS cache -> GHI, thousands files /min) ☐ Continuous staging important data for three months (Sep-Nov) ☐ Low staging performance in some periods (< 5/min , see next ) 40 Stage Speed fs01 fs02 SUM 40 35 30 n i m 25 / e l i 20 F f 15 o # 10 5 0 1 8 5 2 9 6 3 0 7 3 0 7 4 1 8 5 2 9 0 0 1 2 2 0 1 2 2 0 1 1 2 0 0 1 2 2 Sep Oct Nov Dec / / / / / / / / / / / / / / / / / / 9 9 9 9 9 0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 / / / / / / / / / / / / / / / / / / 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Oct/19/2017 HUF 2017 KEK, TSUKUBA

Library Accessor performance ☐ We have 54 TS1150 drives, but... ☐ Tape mounts are limited up to 4 tapes / min. ☐ TS3500 Library accessor spec. : 15 sec / (u)mount. –> 60 / 15 = 4 tapes /min. ☐ well consistent with observation ☐ ~ 4 files / min staging in case of continuous requests on different tape medias Tape mounts / min 4 Mount Speed 4 3.5 ) 3 n i 2.5 m / t 2 n u o 1.5 m ( 1 0.5 0 0 4 8 2 6 0 0 4 8 2 6 0 0 4 8 2 6 0 0 0 0 0 1 1 2 0 0 0 1 1 2 0 0 0 1 1 2 0 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 / / / / / / / / / / / / / / / / / / / 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TS1140 TS1150 SUM 15 Oct/19/2017 HUF 2017 KEK, TSUKUBA

HUF 2017 KEK site report Share Our Experience Koichi Murakami - PowerPoint PPT Presentation

HUF 2017 KEK site report Share Our Experience Koichi Murakami (KEK/CRC) HUF 2017 KEK, Tsukuba 1 Oct/19/2017 HUF 2017 KEK, TSUKUBA KEK Diversity in accelerator based sciences High Energy Accelerator Research Organization Pursuing

Tuesday, December 18, 2018 AG ENDA Rex Huf f man, Spit l er Huf f man, LLP Tol edo Regional

A brief introduction of SAD with a design example K. Oide 18-19 Sept. 2019 SAD Workshop 2019 @

Exploring the finite density QCD based on the complex Langevin method Shoichiro Tsutsui KEK

KEK, High Energy Accelerator Research Organization KEK High Energy Accelerator Research

Kaon Kaon Experiments Experiments at KEK and J-PARC at KEK and J-PARC Tadashi Nomura Tadashi

HPSS Treefrog Introduction HUF 2017 http://www.hpss-collaboration.org Disclaimer Forward

NCAR-Developed Tools Bill Anderson and Marc Genty National Center for Atmospheric Research HUF

IESO Report Site Refresh New Report Site Changes October 10, 2014 R. Jovic Introduction The

Event Generator Development at KEK Shigeru Odaka Institute of Particle and Nuclear Studies High

GRID Deployment Status and Plan at KEK ISGC2007 Takashi Sasaki KEK Computing Research Center

KEK Measurement of thermal neutron and 41 Ar at KEK Linac T. Oyama, H. Iwase, A. Toyoda, T.

Vertical EP at Marui & KEK November 2016 TTC High-Gradient WG Meeting T Saeki KEK V

Can the complex Langevin method see the deconfinement phase transition in QCD at finite density?

Status of GRL Yun-Tsung Lai KEK ytlai@post.kek.jp 32nd B2GM February 5, 2019 2019/02/05

Kei Suzuki (KEK) from JLQCD Collaboration: Sinya Aoki (YITP), Yasumichi Aoki (KEK/RIKEN-BNL),

Hercules 009 Landfill Superfund Site Scott Martin Presentation Overview Site History Site

I/O Threads to Reduce Checkpoint Blocking for an EM Solver on Blue Gene/P and Cray XK6 Jing Fu

Linear-(me Approxima(ons for Domina(ng Sets and Independent

How to Build a Petabyte Sized Storage System Invited Talk for LISA09 Ray Paden Version 2.0

MIT 6.875 & Berkeley CS276 Foundations of Cryptography Lecture 17 HOW TO CONSTRUCT NIZK IN

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

Unimodality of q -Eulerian polynomials and q , p -Eulerian polynomials Michelle Wachs University

UL HPC School 2017 Overview & Challenges of the UL HPC Facility at the Belval & EuroHPC

The Privacy of Secured Computations Adam Smith Penn State Crypto & Big Data Workshop

Sambuz

Useful Links

Newsletter

Mail Us

HUF 2017 KEK site report Share Our Experience Koichi Murakami - PowerPoint PPT Presentation

HUF 2017 KEK site report Share Our Experience Koichi Murakami (KEK/CRC) HUF 2017 KEK, Tsukuba 1 Oct/19/2017 HUF 2017 KEK, TSUKUBA KEK Diversity in accelerator based sciences High Energy Accelerator Research Organization Pursuing

Tuesday, December 18, 2018 AG ENDA Rex Huf f man, Spit l er Huf f man, LLP Tol edo Regional

A brief introduction of SAD with a design example K. Oide 18-19 Sept. 2019 SAD Workshop 2019 @

Exploring the finite density QCD based on the complex Langevin method Shoichiro Tsutsui KEK

KEK, High Energy Accelerator Research Organization KEK High Energy Accelerator Research

Kaon Kaon Experiments Experiments at KEK and J-PARC at KEK and J-PARC Tadashi Nomura Tadashi

HPSS Treefrog Introduction HUF 2017 http://www.hpss-collaboration.org Disclaimer Forward

NCAR-Developed Tools Bill Anderson and Marc Genty National Center for Atmospheric Research HUF

IESO Report Site Refresh New Report Site Changes October 10, 2014 R. Jovic Introduction The

Event Generator Development at KEK Shigeru Odaka Institute of Particle and Nuclear Studies High

GRID Deployment Status and Plan at KEK ISGC2007 Takashi Sasaki KEK Computing Research Center

KEK Measurement of thermal neutron and 41 Ar at KEK Linac T. Oyama, H. Iwase, A. Toyoda, T.

Vertical EP at Marui &amp; KEK November 2016 TTC High-Gradient WG Meeting T Saeki KEK V

Can the complex Langevin method see the deconfinement phase transition in QCD at finite density?

Status of GRL Yun-Tsung Lai KEK ytlai@post.kek.jp 32nd B2GM February 5, 2019 2019/02/05

Kei Suzuki (KEK) from JLQCD Collaboration: Sinya Aoki (YITP), Yasumichi Aoki (KEK/RIKEN-BNL),

Hercules 009 Landfill Superfund Site Scott Martin Presentation Overview Site History Site

I/O Threads to Reduce Checkpoint Blocking for an EM Solver on Blue Gene/P and Cray XK6 Jing Fu

Linear-(me Approxima(ons for Domina(ng Sets and Independent

How to Build a Petabyte Sized Storage System Invited Talk for LISA09 Ray Paden Version 2.0

MIT 6.875 &amp; Berkeley CS276 Foundations of Cryptography Lecture 17 HOW TO CONSTRUCT NIZK IN

Out line Robot ics Percept ion Robot ics Planning Reading: R&amp;N Sect .

Unimodality of q -Eulerian polynomials and q , p -Eulerian polynomials Michelle Wachs University

UL HPC School 2017 Overview &amp; Challenges of the UL HPC Facility at the Belval &amp; EuroHPC

The Privacy of Secured Computations Adam Smith Penn State Crypto &amp; Big Data Workshop

Sambuz

Useful Links

Newsletter

Mail Us

Vertical EP at Marui & KEK November 2016 TTC High-Gradient WG Meeting T Saeki KEK V

MIT 6.875 & Berkeley CS276 Foundations of Cryptography Lecture 17 HOW TO CONSTRUCT NIZK IN

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

UL HPC School 2017 Overview & Challenges of the UL HPC Facility at the Belval & EuroHPC

The Privacy of Secured Computations Adam Smith Penn State Crypto & Big Data Workshop