Qiulan Huang, Gongxing Sun, Zhanchen Wei, Qiao Yan Institute of High - PowerPoint PPT Presentation

Qiulan Huang, Gongxing Sun, Zhanchen Wei, Qiao Yan Institute of High Energy Physics, CAS ISGC 2018 Mar 23, 2018

n Overview of the Traditional Computing System n Problems & Challenges n New Computing System with Hadoop n Activities and Evaluation n Hadoop Status in LHAASO n Summary

HTCondor ~ 15000CPU Cores ~11PB(Lustre/EOS)

n Traditional computing architecture has certain limitations in scalability, fault tolerance and so on l Communication bottleneck: All data transmission pass through the central network switch l One IO server failure may cause storage system unavailable n Network I/O becomes the bottleneck for data-intensive jobs n More money should be devoted to purchase expensive facilities n In the big data era, HEP experiments require new and intelligent computing technology

No powerful network, No expensive disk arrays

Data to computa*on Computa*on to data Traditional architecture New architecture Network 10Gpbs 1Gpbs Storage Disk array Local disk Access through network, Access local disk Data access limited by network

Apache Hadoop An open-source software framework for distributed storage and distributed processing huge amount of data sets l A highly reliable distributed file system (HDFS) l Parallel computing framework for large data sets(MapReduce) l Some tools: HBase, Hive, Pig, Spark, etc l Widely adopted in the Internet industry Client JobTracker Client Client t t a H a e e e B a B t r r t a t B e r e H a a t e H TaskTracker TaskTracker TaskTracker MapTask MapTask MapTask MapTask MapTask MapTask ReduceTask ReduceTask ReduceTask

ü High scalability ü one master cluster can reach 4000 nodes ü IO intensive jobs achieve higher CPU efficiency ü Local data read/write ü Lower cost ü Without powerful network equipment ü Without expensive disk arrays ü Some HEP experiments introduced Hadoop in scientific computing ü Widely used in industry, and commercial support is available from a number of companies ü Three Hadoop software providers : Apache, Cloudera, Hortonworks ü More than 150 companies are using

l Hadoop uses streaming access data, only support sequential write and append, not support random write l Hadoop is written in Java, while C/C++ support is very limited l HEP jobs read files via FUSE or other plugins l HDFS fuse interface is not strong

• A new data access designed and implemented • Support random data access • Support files modification in HDFS • Data migration system • Move data between HDFS and other storage systems • User-friendly interface • Hide the underling details, to avoid learning mapreduce programming for users • Only one job option file needed according to the template

• One file one block • File in Root format cannot be divided • Local read data(Job completely localized execution) - No data transmission - No network I/O - Low latency DataNode getPath getLocatedBlock NameNode Client HDFS Service read getBlockPath Linux DataNode Deamon File System

① Distributed ROOT API write to HDFS directly • FileSystem NameNode ② ① Local write if only have one replica • ① ⑤ ② - No data transmission ③ Client HDFS Service ⑤ - No network I/O ④ ③ ⑤ - Low latency Linux ⑦ DataNode Deamon File System ⑦ Support random write • ⑥ ⑥ DataNode DataNode ② ③ ④ ① getBlockPath writeBlock createFile addBlock ⑤ ⑥ ⑦ Complete Copy blockReceivedReport

n HDFS ü 1 NameNode, 5 DataNode (6*6TBdisks, Raid5) ü 1Gigabit Ethernet n Lustre ü 1 Metadata server, 5 OSS servers with 2 Disk Arrays(24*3TB,Raid6) ü 10 Gigabit Ethernet

n ROOT tool l Root Write: $ROOTSYS/test/Event EventNumber 0 1 1 l Root Read: $ROOTSYS/test/Event EventNumber 0 1 20 ROOT Sequence Read ROOT Write T T i m i m e/s e/s 1000 5000 10000 20000 30000 40000 50000 1000 5000 10000 20000 30000 40000 50000 Event Number Event Number HDFS Lustre HDFS Lustre • Compared to Lustre, write event performance of HDFS improved 10% and read performance increased 2~3 times

n Real job HDFS l Cosmic ray simulation job(corsika) l Detector simulation job(Geant4) l ARGO reconstruction job(medea++) n Result and analysis l The CPU efficiency of CPU intensive job (corsika and Geant4) is up to 100%. The performance of HDFS and Lustre is comparable l The CPU efficiency of IO intensive job(medea++) is 100% with HDFS, while Lustre 67% with Lustre l IO intensive job needs large IO over network and the Lustre cient service consumes additional system overhead, which affect job execution

n Job execution time l Count the job execution time of medea++ job l Job running on HDFS is one third of Lustre Running Time 2000 1500 Time/s 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Job HDFS Lustre

A good supplement of Hadoop computing cluster in HEP n provide import mode/output mode n Move data between HDFS and other storage systems n Mapreduce & GridFTP n High parallelism l The performance of data transfer is up to 115MB/s each datanode l System front-ends CLI Web pages 115MB/s regular migration extemporary migration Task management layer Task monitoring layer task pretreatment task dispatch Migration service layer Data migration layer Import mode Output mode Map Other system HDFS split DataSet GridFTP server path-map server GridFTP file creation server . . . file reading . . . GridFTP path-map server server GridFTP

n Submit jobs hsub + queue + jobType+jobOptionFile + jobname n Descriptions queue：queue name(ybj、default) jobTpye: MC(simulation job),REC(Reconstruction job), DA(Analysis job) jobOptionFile：Job option file jobname: job name

18-3-22

l LHAASO（Large High Altitude Air Shower Observatory） l Study the problems in Galactic cosmic ray physics l ~2PB raw data per year l Started to take data in 2018 18-3-22

l Hadoop cluster l 5 Login nodes，1 Master node and 5 computing nodes，Link:1Gigbit l 120 CPU cores,140TB storage l Cosmic ray simulation(corsika), ARGO detector simulation(Geant4) and KM2A 2U HP ProLiant DL380 Gen9 ： 2 Intel Xeon E5-2630 CPU (2.4GHz,8Cores) ， 64GB RAM, 1Gigbit 64GB RAM ， 6*6TB disk, 1Gigbit 2U HP ProLiant DL380 Gen9 ： 2 Intel Xeon E5-2680 CPU (2.5 GHz,12Cores), 18-3-22

n Capacity l 119 TB used(88%) n Job statistics（2017） l 20,225 jobs（502,341 tasks) l ~212,730 CPU hours 18-3-22

18-3-22

n Start to introduce Spark into Partial Wave Analysis n Study in-memory data-sharing mechanism based on Alluxio

l Successfully applied in LHASSO experiment l Reduce the cost of facilities l Greatly improve the CPU efficiency of IO intensive jobs l Data migration tool integrated into the exist Hadoop cluster for IHEP users l Friendly interface are provided l Plan to extend the solution to Ali experiment l Plan to introduce Spark to HEP data analysis

Thank you! Any questions?

Qiulan Huang, Gongxing Sun, Zhanchen Wei, Qiao Yan Institute of High - PowerPoint PPT Presentation

Qiulan Huang, Gongxing Sun, Zhanchen Wei, Qiao Yan Institute of High Energy Physics, CAS ISGC 2018 Mar 23, 2018 n Overview of the Traditional Computing System n Problems & Challenges n New Computing System with Hadoop n Activities and

Continuous-Time Random Matching Darrell Duffie Lei Qiao Yeneng Sun Stanford S.U.F.E. N.U.S.

Continuous-Time Random Matching Darrell Duffie Lei Qiao Yeneng Sun Stanford S.U.F.E. N.U.S.

Lattice Basis Reduction Part 1: Concepts Sanzheng Qiao Department of Computing and Software

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud Jinquan Dai, Jie Huang, Shengsheng

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

1 Code from Sun Bev Crair 2 beverly.crair@eng.sun.com Sun Microsystems, Inc. Overview

Writing the Thesis/Dissertation Proposal: STEM fields Yan Huang Professor of Computer Science

Introducing Sun Global Presentation 2013 Strictly Confidential. Sun Global Investments is

Studying the Sun with satellites and eclipses THE SUN The Sun is a variable star. That

Earth-Sun Relationships Energy received from the Sun drives weather and climate, so it is

Linguistically Regularized LSTM for Sentiment Classification Qiao Qian 1 , Minlie Huang 1 ,

Greedy Algorithm and Matroid Intersections by Yan Alves Radtke July 2020 by Yan Alves Radtke

glu deployment automation platform July 2011 Yan Pujante in: http://www.linkedin.com/in/yan

SE3X03/CS4X03 Scientific Computation Sanzheng Qiao Department of Computing and Software

CS/SE3SH3 Operating Systems Sanzheng Qiao Department of Computing and Software November, 2012

Lattice Basis Reduction Part II: Algorithms Sanzheng Qiao Department of Computing and Software

Analyzing IO Usage Patterns of User Jobs to Improve Overall HPC System Efficiency Syed Sadat

Kommunikation i rymden Mats Holmstrm Institutet fr rymdfysik (IRF) SUNET TREFpunkt Kiruna

26/10/2015 Precision research of cosmic rays from space with PAMELA detector: Results and

FSPA January report Maria Martinez Casales on behalf of FSPA UEC Meeting, January 17th, 2020 1

GlusterFS GlusterFS is a free software clustered file system capable of scaling to several

A formal approach to the development of system services in embedded systems: from model to

The Kind 2 Model Checker Adrien Champion Alain Mebsout Christoph Sticksel Cesare Tinelli Kind

Data Analysis at CMS Level-1 Trigger Zhenbin Wu (University of