Qiulan Huang, Gongxing Sun, Zhanchen Wei, Qiao Yan Institute of High - - PowerPoint PPT Presentation

qiulan huang gongxing sun zhanchen wei qiao yan
SMART_READER_LITE
LIVE PREVIEW

Qiulan Huang, Gongxing Sun, Zhanchen Wei, Qiao Yan Institute of High - - PowerPoint PPT Presentation

Qiulan Huang, Gongxing Sun, Zhanchen Wei, Qiao Yan Institute of High Energy Physics, CAS ISGC 2018 Mar 23, 2018 n Overview of the Traditional Computing System n Problems & Challenges n New Computing System with Hadoop n Activities and


slide-1
SLIDE 1

Qiulan Huang, Gongxing Sun, Zhanchen Wei, Qiao Yan

Institute of High Energy Physics, CAS ISGC 2018 Mar 23, 2018

slide-2
SLIDE 2

n Overview of the Traditional Computing System n Problems & Challenges n New Computing System with Hadoop n Activities and Evaluation n Hadoop Status in LHAASO n Summary

slide-3
SLIDE 3

~15000CPU Cores

~11PB(Lustre/EOS)

HTCondor

slide-4
SLIDE 4

n Traditional computing architecture has certain limitations in

scalability, fault tolerance and so on

l Communication bottleneck: All data transmission pass through the

central network switch

l One IO server failure may cause storage system unavailable

n Network I/O becomes the bottleneck for data-intensive jobs n More money should be devoted to purchase expensive

facilities

n In the big data era, HEP experiments require new and

intelligent computing technology

slide-5
SLIDE 5

No powerful network, No expensive disk arrays

slide-6
SLIDE 6

Data to computa*on Computa*on to data Traditional architecture New architecture Network

10Gpbs 1Gpbs

Storage

Disk array Local disk

Data access

Access through network, limited by network Access local disk

slide-7
SLIDE 7

Apache Hadoop

An open-source software framework for distributed storage and distributed processing huge amount of data sets

l A highly reliable distributed file system (HDFS) l Parallel computing framework for large data sets(MapReduce) l Some tools: HBase, Hive, Pig, Spark, etc l Widely adopted in the Internet industry

JobTracker

TaskTracker

MapTask MapTask ReduceTask H e a r t B e a t Client Client Client

TaskTracker

MapTask MapTask ReduceTask H e a r t B e a t

TaskTracker

MapTask MapTask ReduceTask H e a r t B e a t

slide-8
SLIDE 8

ü High scalability

ü one master cluster can reach 4000 nodes

ü IO intensive jobs achieve higher CPU efficiency

ü Local data read/write

ü Lower cost

ü Without powerful network equipment ü Without expensive disk arrays

ü Some HEP experiments introduced Hadoop in scientific computing ü Widely used in industry, and commercial support is available from a

number of companies

ü Three Hadoop software providers : Apache, Cloudera, Hortonworks ü More than 150 companies are using

slide-9
SLIDE 9

l Hadoop uses streaming access data, only support sequential write

and append, not support random write

l Hadoop is written in Java, while C/C++ support is very limited l HEP jobs read files via FUSE or other plugins l HDFS fuse interface is not strong

slide-10
SLIDE 10
  • A new data access designed and implemented
  • Support random data access
  • Support files modification in HDFS
  • Data migration system
  • Move data between HDFS and other storage systems
  • User-friendly interface
  • Hide the underling details, to avoid learning mapreduce

programming for users

  • Only one job option file needed according to the template
slide-11
SLIDE 11
  • One file one block
  • File in Root format

cannot be divided

  • Local read data(Job

completely localized execution)

  • No data transmission
  • No network I/O
  • Low latency

Client HDFS Service Linux File System

NameNode

DataNode DataNode Deamon

getPath getLocatedBlock getBlockPath read

slide-12
SLIDE 12
  • ROOT API write to HDFS directly
  • Local write if only have one replica
  • No data transmission
  • No network I/O
  • Low latency
  • Support random write

Client Distributed FileSystem HDFS Service Linux File System NameNode DataNode DataNode DataNode Deamon ① ① ① ② ② ③ ③ ④ ⑤ ⑤ ⑤ ⑥ ⑥ ⑦ ⑦

① ② ⑤ ③ ④ ⑥ ⑦

createFile addBlock getBlockPath writeBlock Complete Copy blockReceivedReport

slide-13
SLIDE 13

n HDFS

ü 1 NameNode, 5 DataNode (6*6TBdisks, Raid5) ü 1Gigabit Ethernet

n Lustre

ü 1 Metadata server, 5 OSS servers with 2 Disk Arrays(24*3TB,Raid6) ü 10 Gigabit Ethernet

slide-14
SLIDE 14

n ROOT tool

l Root Write: $ROOTSYS/test/Event EventNumber 0 1 1 l Root Read: $ROOTSYS/test/Event EventNumber 0 1 20

1000 5000 10000 20000 30000 40000 50000

ROOT Write T i m e/s

Event Number

HDFS Lustre

1000 5000 10000 20000 30000 40000 50000

ROOT Sequence Read T i m e/s

Event Number

HDFS Lustre

  • Compared to Lustre, write event performance of HDFS improved 10% and

read performance increased 2~3 times

slide-15
SLIDE 15

n Real job

l Cosmic ray simulation job(corsika) l Detector simulation job(Geant4) l ARGO reconstruction job(medea++)

n Result and analysis

l The CPU efficiency of CPU intensive job

(corsika and Geant4) is up to 100%. The performance of HDFS and Lustre is comparable

l The CPU efficiency of IO intensive

job(medea++) is 100% with HDFS, while 67% with Lustre

l IO intensive job needs large IO over

network and the Lustre cient service consumes additional system overhead, which affect job execution

HDFS

Lustre

slide-16
SLIDE 16

n Job execution time

l Count the job execution time of medea++ job l Job running on HDFS is one third of Lustre

500 1000 1500 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Time/s Job

Running Time

HDFS Lustre

slide-17
SLIDE 17

n

A good supplement of Hadoop computing cluster in HEP

n

provide import mode/output mode

n

Move data between HDFS and other storage systems

n

Mapreduce & GridFTP

l

High parallelism

l

The performance of data transfer is up to 115MB/s each datanode

System front-ends CLI Web pages Task monitoring layer Task management layer task pretreatment task dispatch Data migration layer

Map

server server server Other system

. . .

server Migration service layer Import mode Output mode split DataSet file creation path-map path-map file reading server regular migration extemporary migration

. . .

HDFS GridFTP GridFTP GridFTP GridFTP

115MB/s

slide-18
SLIDE 18

n Submit jobs

hsub + queue + jobType+jobOptionFile + jobname

n Descriptions

queue:queue name(ybj、default)

jobTpye: MC(simulation job),REC(Reconstruction job), DA(Analysis job) jobOptionFile:Job option file jobname: job name

slide-19
SLIDE 19

18-3-22

slide-20
SLIDE 20

l LHAASO(Large High Altitude Air Shower Observatory)

l Study the problems in Galactic cosmic ray physics l ~2PB raw data per year l Started to take data in 2018 18-3-22

slide-21
SLIDE 21

l Hadoop cluster

l 5 Login nodes,1 Master node and 5 computing nodes,Link:1Gigbit l 120 CPU cores,140TB storage l Cosmic ray simulation(corsika), ARGO detector simulation(Geant4) and

KM2A

18-3-22

2U HP ProLiant DL380 Gen9:2 Intel Xeon E5-2630 CPU (2.4GHz,8Cores),64GB RAM, 1Gigbit 2U HP ProLiant DL380 Gen9:2 Intel Xeon E5-2680 CPU (2.5 GHz,12Cores), 64GB RAM,6*6TB disk, 1Gigbit

slide-22
SLIDE 22

n Capacity

l 119 TB used(88%)

n Job statistics(2017)

l 20,225 jobs(502,341 tasks) l ~212,730 CPU hours

18-3-22

slide-23
SLIDE 23

18-3-22

slide-24
SLIDE 24

n Start to introduce Spark into Partial Wave Analysis n Study in-memory data-sharing mechanism based on

Alluxio

slide-25
SLIDE 25

l Successfully applied in LHASSO experiment l Reduce the cost of facilities l Greatly improve the CPU efficiency of IO intensive jobs l Data migration tool integrated into the exist Hadoop cluster for

IHEP users

l Friendly interface are provided l Plan to extend the solution to Ali experiment l Plan to introduce Spark to HEP data analysis

slide-26
SLIDE 26

Thank you! Any questions?