Computing at and Grid Application for Belle Experiment S. Nishida - - PowerPoint PPT Presentation

computing at and grid application for belle experiment
SMART_READER_LITE
LIVE PREVIEW

Computing at and Grid Application for Belle Experiment S. Nishida - - PowerPoint PPT Presentation

Computing at and Grid Application for Belle Experiment S. Nishida (KEK) ISGC2006 @ Academia Sinica, Taipei May 3, 2006 S. Nishida (KEK) ISGC2006 Grid Application for Belle 1 May 3, 2006 Contents Introduction New B Factory


slide-1
SLIDE 1

May 3, 2006 Grid Application for Belle ISGC2006 1

  • S. Nishida (KEK)

Computing at and Grid Application for Belle Experiment

  • S. Nishida (KEK)

ISGC2006 @ Academia Sinica, Taipei May 3, 2006

slide-2
SLIDE 2

May 3, 2006 Grid Application for Belle ISGC2006 2

  • S. Nishida (KEK)

Contents

  • Introduction
  • New B Factory Computer System in KEK
  • Grid Application at Nagoya University
  • Conclusion
slide-3
SLIDE 3

May 3, 2006 Grid Application for Belle ISGC2006 3

  • S. Nishida (KEK)

Introduction

Belle

KEKB Linac 3km

Belle Experiment

“B factory” experiment at KEK (Japan).

  • Asymmetric e+e− collider

(3.5 GeV on 8GeV)

  • e+e−→ϒ(4S)→BB (1.1nb)
  • Circumference 3km
  • World highest luminosity
  • Maximun Beam Current

 LER (e+) 2.0A  HER (e−) 1.36A

KEKB collider

  • Mt. Tsukuba
slide-4
SLIDE 4

May 3, 2006 Grid Application for Belle ISGC2006 4

  • S. Nishida (KEK)

Belle Detector

General purpose detector for various B/charm/τ physics.

slide-5
SLIDE 5

May 3, 2006 Grid Application for Belle ISGC2006 5

  • S. Nishida (KEK)

Belle Collaboration

13 countries, 57 institutes, ~400 collaborators

IHEP, Vienna ITEP Kanagawa U. KEK Korea U. Krakow Inst. of Nucl. Phys. Kyoto U. Kyungpook Nat’l U. EPF Lausanne Jozef Stefan Inst. / U. of Ljubljana / U. of Maribor

  • U. of Melbourne

Aomori U. BINP Chiba U. Chonnam Nat’l U.

  • U. of Cincinnati

Ewha Womans U. Frankfurt U. Gyeongsang Nat’l U.

  • U. of Hawaii

Hiroshima Tech. IHEP, Beijing IHEP, Moscow Nagoya U. Nara Women’s U. National Central U. Nat’l Kaoshiung Normal U. National Taiwan U. National United U. Nihon Dental College Niigata U. Osaka U. Osaka City U. Panjab U. Peking U.

  • U. of Pittsburgh

Princeton U. Riken Saga U. USTC Seoul National U. Shinshu U. Sungkyunkwan U.

  • U. of Sydney

Tata Institute Toho U. Tohoku U. Tohuku Gakuin U.

  • U. of Tokyo

Tokyo Inst. of Tech. Tokyo Metropolitan U. Tokyo U. of Agri. and Tech. Toyama Nat’l College

  • U. of Tsukuba

Utkal U. VPI Yonsei U.

Lots of contribution from Taiwan.

more contribution in computing area, please!

slide-6
SLIDE 6

May 3, 2006 Grid Application for Belle ISGC2006 6

  • S. Nishida (KEK)

Luminosity

Produce large amount of B mesons!! peak luminosity 1.63 × 1034 cm-2s-1

570 fb-1 1 fb-1~106 BB

2006/6 1999/6 2004/6 2002/6

Integrated Luminosity (fb-1)

200 400 600

  • We will install Crab

Cavity this summer, which (hopefully) increases the luminosity (twice).

Integrated Luminosity

1 fb-1 / day

slide-7
SLIDE 7

May 3, 2006 Grid Application for Belle ISGC2006 7

  • S. Nishida (KEK)

Results from Belle

Unitarity Triangle has been precisely determined!!

φ1 = β, φ2 = α, φ3 = γ

sin2φ1 from b→ccs

B+ B-

φ3 from B→DK Dalitz Analysis φ2 from B→ππ, ρπ, ρρ |Vub| from b→ulυ

  • Success of B factory experiments!
  • Various B decay modes are studied (also charm, τ)
slide-8
SLIDE 8

May 3, 2006 Grid Application for Belle ISGC2006 8

  • S. Nishida (KEK)

Results from Belle

B

Β→τυ

  • Precise measurements of elements of CKM triangle
  • Observation/search of rare decays (e.g. Β→ργ, τ→µγ)

Higher Luminosity (larger amount of data) opens the possibility

  • f various interesting measurements.

Studies of New Physics Example: full reconstruction technique

  • Reconstruct one of two B mesons
  • Useful for B decays with neutrino
  • Need enormous number of B meson pairs

and hence CPU powers, disks... Observation of Β→τυ

slide-9
SLIDE 9

May 3, 2006 Grid Application for Belle ISGC2006 9

  • S. Nishida (KEK)

Luminosity Scenario

We are here

Integrated luminosity will be 1~3 /fb in coming several years.

570 /fb

super KEKB present KEKB Luminosity is almost doubled every year Necessary computing resources are also doubled.

Toward Super KEKB

slide-10
SLIDE 10

May 3, 2006 Grid Application for Belle ISGC2006 10

  • S. Nishida (KEK)

New B Factory Computing System

New B Factory Computer System has just started its operation on March 23, 2006!! In order to deal with increasing data, we have moved from

  • expensive, most reliable components

to less expensive, reasonably reliable components

  • Solaris to Linux
  • direct access tape storage to HSM
  • fixed to extensible
  • closed to reasonably open
slide-11
SLIDE 11

May 3, 2006 Grid Application for Belle ISGC2006 11

  • S. Nishida (KEK)

Comparison with old systems

80+16FS 11 3+(9)

Work Group server [# of hosts]

128PC 23WS +100PC 25WS +68X

User Workstation [# of hosts]

3,500 (3.5PB) 620 160

Tape Library Capacity [TB]

1,000 (1PB) ~9 ~4

Disk Capacity [TB]

~42,500

(PC)

~1,250

(WS+PC)

~100

(WS) Computing Server [SPECint2000 rate]

2006- (6years) 2001- (5years) 1997- (4years)

Performance \ Year

  • Great Improvement! c.f.) Moore’s Law: 1.5y=twice, 4y=~6.3, 5y=~10
  • Upgrade is planned in 2009, though the contract is 6 years.
  • Belle has additonal computing resources (next page)
slide-12
SLIDE 12

May 3, 2006 Grid Application for Belle ISGC2006 12

  • S. Nishida (KEK)

Additonal (Belle's) Resources

We now obtain high-performance computer system; but we didn't suddenly switch to the “less expensive” system. 350TB disks 1.5PB tapes 934 CPUs 20units/20TB We have been testing such system for several years.

  • Linux based PC clusters
  • S-ATA disk based RAID

drives

  • S-AIT tape drives

1000TB disks 3.5PB tapes 2280 CPUs new B computer for comparison These resources have been essential for Belle (production/analysis)

slide-13
SLIDE 13

May 3, 2006 Grid Application for Belle ISGC2006 13

  • S. Nishida (KEK)

Overview of the New B Computer

Storage

Computing Servers Workgroup Servers reserved for Grid On-line Reconstruction Farm

slide-14
SLIDE 14

May 3, 2006 Grid Application for Belle ISGC2006 14

  • S. Nishida (KEK)

Computing Servers

  • DELL Power Edge 1855

Xeon 3.6GHz × 2 Memory 1GB

  • Made in Taiwan [Quanta]
  • WG: 80 servers (for login)

Linux (RHEL)

  • CS: 1128 servers

Linux (CentOS)

  • total: 45662 SPEC CINT

2000 Rate. equivalent to 8.7THz CPU will be increased by ×2.5 (i.e. to 110000 SPEC CINT 2000 Rate) in 2009.

1 enclosure = 10 nodes / 7U space 1 rack = 50 nodes

slide-15
SLIDE 15

May 3, 2006 Grid Application for Belle ISGC2006 15

  • S. Nishida (KEK)

Storage System (Disk)

  • Total 1PB

with 42 file servers (1.5PB in 2009)

  • SATAII 500GB disk

× ~2000 (~1.8 failure/day ?)

  • 3 types of RAID

(to avoid problems)

  • HSM = 370 TB

non-HSM = 630 TB

ADTX ArrayMasStor LP 15drive/3U/7.5TB Nexan SATA Beast 42drive/4U/21TB SystemWorks MASTER RAID B1230 16drive/3U/8TB (made in Taiwan)

slide-16
SLIDE 16

May 3, 2006 Grid Application for Belle ISGC2006 16

  • S. Nishida (KEK)

Storage System (Tape)

  • Backup
  • 90TB + 12drv + 3srv
  • LTO3 400GB/volume
  • NetVault
  • HSM: PetaSite (SONY)
  • 3.5PB + 60drv + 13srv
  • SAIT 500GB/volume
  • 30MB/s drive
  • Petaserve
slide-17
SLIDE 17

May 3, 2006 Grid Application for Belle ISGC2006 17

  • S. Nishida (KEK)

Usage of the B Computer

  • nline

reconstruction farm

rawdata + “DST” data “MDST” data (four vector, PID info etc.)

production Users' analyes (MC) MC production (all the numbers are for 500 fb-1) hadron 120TB + others ~ 1PB 2.5THz (to finish in 6 months) 2THz (to finish in 2 months) The new system has sufficient CPU and storage resources (at least for now) HSM non-HSM The location

  • f the data files

is managed by a postgres database.

slide-18
SLIDE 18

May 3, 2006 Grid Application for Belle ISGC2006 18

  • S. Nishida (KEK)

Usage of the B Computers

Workgroup Server

80 servers (user login) ~5 persons /server 16 servers 80TB (for users' home dir.)

Workfile Server Cluster1

(for users)

Cluster2 Cluster3

Computing Servers (CS)

Storage

non-HSM

Storage

HSM 3 LSF (batch system) clusters ~1200 servers

e.g. hadron data e.g. raw data

1PB 3.5PB Data are transfered from storage servers to CS using Belle home grown simple TCP/socket application

slide-19
SLIDE 19

May 3, 2006 Grid Application for Belle ISGC2006 19

  • S. Nishida (KEK)

Status and Plan

Now still at the stage of setting up the environment

  • Transfering existing data from the old system.
  • User environment not prepared yet.
  • Full operation ~ summer.

Grid activity at Belle

  • With help from the KEK computing research center, we

have applied for Belle VO.

  • In the KEK pre-production LCG site, we have successfully

run the Belle simulation.

  • Several servers in the new B computer are reserved for Grid

study, but are not used yet.

  • Remote institutes will receive more advantage.

Some institutes (Australia, Taiwan, Nagoya...) already involved.

slide-20
SLIDE 20

May 3, 2006 Grid Application for Belle ISGC2006 20

  • S. Nishida (KEK)

Computing at Nagoya Univ.

Nagoya is the instituite that has largest computing resources for Belle (except KEK).

  • 900GHz equivalent Linux PC clusters.
  • >130 TB RAID disks
  • 1Gbps networks
  • 270GHz equivalent Linux PC clusters.
  • 400TB Virtual Disk systems
  • Fujitsu VD800 + LT270
  • LTO tapes with cache disk (4.5T)
  • 1Gbps + a few Gbps networks

+ newly introduced (Jan 2006)

  • Direct connection to KEK B computer (1Gbps)
  • Batch queue system using Sun Grid Engine.
slide-21
SLIDE 21

May 3, 2006 Grid Application for Belle ISGC2006 21

  • S. Nishida (KEK)

Computing at Nagoya Univ.

  • Analysis of Belle data.
  • Monte Carlo production for Belle.
  • Development for new detector.

Target Efficient data management system:

  • User-/manager- friendly system.
  • Copy data w/o any changes for user
  • Fast recovery from disk fault.

Typical usage in Belle

  • Read data in the file servers from

many PCs.

  • Write output data to file servers.
slide-22
SLIDE 22

May 3, 2006 Grid Application for Belle ISGC2006 22

  • S. Nishida (KEK)

SRB (Storage Resource Broker)

  • Storage management system
  • Advantages
  • File location is recorded to DB automatically.
  • Not need special operation (modification of web page)
  • User can search for by “ls” like commands.
  • Real data can be moved by replication.
  • Manager can copy (or move) data without any change for user.
  • Easy to change the location.
  • In case of data loss, the system switch to use replica.
  •  Fast recovery from disk fault.
  • User can read data in KEK easily.
  • No need to copy data
  • Also KEK users can read data in Nagoya easily.
slide-23
SLIDE 23

May 3, 2006 Grid Application for Belle ISGC2006 23

  • S. Nishida (KEK)

SRB Test

  • Inside firewall.
  • Direct connection to KEK B-computers via SINET.
  • Federation only with KEK (No route to other sites)
  • SRB 3.3.1 on CentOS4.1
  • Use MRTG to monitor network traffic.

Test1 write from clients to srbsrv read from srbsrv Test2

slide-24
SLIDE 24

May 3, 2006 Grid Application for Belle ISGC2006 24

  • S. Nishida (KEK)

SRB Test

Sput without -m option (through srbmcat)

random files same files

10 20 30 200 400 600

Number of parallel jobs

  • No failure
  • Speed is changed at N=10
  • Effect of disk cache?

Sput with -m option (parallel transfer) 20 10

Number of parallel jobs Time (sec) Time (sec)

200 400 600

  • Slow...
  • Job failure (1.1% at N=50)

with -m without -m

slide-25
SLIDE 25

May 3, 2006 Grid Application for Belle ISGC2006 25

  • S. Nishida (KEK)

SRB Test

Read from srbsrv1 to many RH7.3 clients using “basf” (program for Belle analysis).

  • Data transfered through srbmcat.
  • No error.
  • But slow (~12MB/s)
  • due to network switch?

10 20 30 40 50

Number of simultaneous execution

100 300 500 700

Time (sec)

Summary and Plan

  • SRB is tested with Belle analysis style.
  • Basically works, but need to be tuned for better performance.
  • More test with larger scale and with Belle application.
  • Nagoya is now trying to participate to LCG. They can submit

jobs, but data sharing, Belle program etc. are not tested yet.

slide-26
SLIDE 26

May 3, 2006 Grid Application for Belle ISGC2006 26

  • S. Nishida (KEK)

Summary

  • Great performance of KEKB; Belle has accumulated >500 fb-1.
  • High luminosity → various physics ← Computing
  • Computing resources will be more critical at Super KEKB.
  • New B Factory Computer System
  • High performance with “less expensive reasonably reliable”

system.

  • Still setting up. (More effort on Grid will be made after

stable operation of the system)

  • SRB is being tested at Nagoya Univ.
  • It works.
  • But need more tuning.
slide-27
SLIDE 27

May 3, 2006 Grid Application for Belle ISGC2006 27

  • S. Nishida (KEK)

Backup

Backup

slide-28
SLIDE 28

May 3, 2006 Grid Application for Belle ISGC2006 28

  • S. Nishida (KEK)

Certificate Authority

  • Fundamental mechanism for Grid security for resources

sharing over network

  • CA issues digital certificates (PKI based) for users, host

machines, and services

  • The certificates issued by CA’s under IGTF are valid for

Grid world-wide.

  • IGTF: International Grid Trust Federation
  • KEK CA is operational since Feb. 2006.
  • KEK CA Shares Grid CA service with AIST CA and

NAREGI CA in Japan IGTF

Mutual trust relation CA CA Digital Certificate (a pair of keys; public key and private key)

slide-29
SLIDE 29

May 3, 2006 Grid Application for Belle ISGC2006 29

  • S. Nishida (KEK)

Virtual Organization (VO)

  • Mechanism to authorize a user to user the resources
  • Resource site manager gives the authorization for members
  • f the VO (virtual organization)
  • Members of a VO can access and use resources distributed

world wide which accepts the VO.

  • VO’s in LCG(LHC Computing Grid)/EGEE are composed of

various scientific research groups, such as, Atlas,CMS, Babar, Zeus, Biomed, Earth Science, etc.

Available Resources in the VO

VOMS Server User must register to the VOMS

slide-30
SLIDE 30

May 3, 2006 Grid Application for Belle ISGC2006 30

  • S. Nishida (KEK)

Belle data and its Sharing with SRB

  • Data accumulated so far
  • 1.5 PB including Simulation data
  • Recent data acquisition rate ~ 1.0TB/day
  • SRB servers for real data storage system

has been implemented in Aug. 2005.

  • Current active data sharing
  • among KEK, U. Melbourne (Australia),

and Nagoya Univ.

  • Target storage space 120 TB
  • files registered to MCAT ~ 423 files as of Sep. 6
slide-31
SLIDE 31

May 3, 2006 Grid Application for Belle ISGC2006 31

  • S. Nishida (KEK)

Belle SRB System

slide-32
SLIDE 32

May 3, 2006 Grid Application for Belle ISGC2006 32

  • S. Nishida (KEK)

Pre-production LCG site for Belle

 Pre-production site was built up with LCG2.7 March, 2006  Certification by APROC for registration to GOCDB has been done at the end of March  New VO: Belle has been registered to the LCG/EGEE as a global VO.  Initial collaboration sites expected:

 Melbourne, ASGC, Krakow, Jozef Stefan Institute (Slovenia), IHEP Vienna  Nagoya U.

(JP-KEK-CRC-01)

slide-33
SLIDE 33

May 3, 2006 Grid Application for Belle ISGC2006 33

  • S. Nishida (KEK)

LCG 2.7 Node

Site: JP-KEK-CRC-01 VO: Belle, dteam