SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. - - PowerPoint PPT Presentation

scidac software infrastructure for lattice gauge theory
SMART_READER_LITE
LIVE PREVIEW

SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. - - PowerPoint PPT Presentation

SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower All Hands Meeting BNL, March 22-23 , 2007 SciDAC-2 kickoff workshop Oct27-28, 2006 http://super.bu.edu/~brower/workshop Progress report: Sept 15, 2006 to Feb 1, 2007


slide-1
SLIDE 1

SciDAC Software Infrastructure for Lattice Gauge Theory

Richard C. Brower All Hands Meeting BNL, March 22-23 , 2007

Code distribution see http://www.usqcd.org/software.html

SciDAC-2 kickoff workshop Oct27-28, 2006 http://super.bu.edu/~brower/workshop Progress report: Sept 15, 2006 to Feb 1, 2007 http://super.bu.edu/~brower/scc.html

slide-2
SLIDE 2

QUIZZ

THIS IS THE 50th ANIVERSITY OF WHAT?

slide-3
SLIDE 3

FORTRAN IS 50 YEARS OLD!

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

Major Participants in SciDAC Project

Ted Bapty Vanderbilt Amitoj Singh Ludmila Levkova Eric Neilsen Carleton DeTar * Utah Jim Simone Subhasish Basak Don Holmgren * FNAL Steve Gottlieb Indiana Massimo DiPierro DePaul Xien-He Sun IIT Bob Mawhinney * Columbia Balint Joo Efstratios Efstathiadis Jie Chen Enno Schloz Robert Edwards * Chulwoo Jung BNL Chip Watson * JLab Mike Clark Ying Zhang * James Osborn Rob Fowler North Carolina Rich Brower * BU Joy Khoriaty Dru Renner Andrew Pochinsky MIT Doug Toussaint Arizona

* Software Committee: Participants funded in part by SciDAC grant

slide-9
SLIDE 9

Institutions Oversight

  • BNL/Columbia

Mawhinney/ Chulwoo Jung

  • JLab

Edwards/Watson

  • FNAL/ITT/Vanderbuilt

Holmgren/Simone

  • BU/MIT

Brower/Pochinsky

  • DePaul/NorthCarolina

DiPierro/Zhang

  • Arizona/Indiana/Utah

DeTar/Gottlieb/Toussaint

slide-10
SLIDE 10

Optimized Dirac Operators, Inverters

Level 3

QDP (QCD Data Parallel) Lattice Wide Operations, Data shifts

Level 2

QMP (QCD Message Passing) QLA (QCD Linear Algebra)

Level 1

QIO Binary/ XML Metadata Files

SciDAC-1 QCD API SciDAC-1 QCD API

C/C++, implemented over MPI, native QCDOC, M-via GigE mesh Optimised for P4 and QCDOC Exists in C/C++

ILDG collab

slide-11
SLIDE 11

QOP (Optimized in asm)

Dirac Operator, Inverters, Force etc

QDP (QCD Data Parallel)

Lattice Wide Operations, Data shifts

QMP

(QCD Message Passing)

QLA

(QCD Linear Algebra)

QIO

Binary / XML files & ILDG

SciDAC-2 QCD API SciDAC-2 QCD API

QMC

(QCD Multi-core interface)

Uniform User Env

Runtime, accounting, grid,

QCD Physics Toolbox

Shared Alg,Building Blocks, Visualization,Performance Tools

Level 4 Workflow

and Data Analysis tools

Application Codes:

MILC / CPS / Chroma / RoleYourOwn Level 3 Level 2 Level 1

SciDAC-1/SciDAC-2 = Gold/Blue

PERI TOPS

slide-12
SLIDE 12

Some current activities & Priorities

Common Runtime Env. “Practical Meta-facility”

File transfer, Batch scripts, Compile targets

Fuller use of API in application code. Round table: Software vs software Porting API to new Machines

BG/L & BG/P: QMP and QLA using XLC & Perl script

Cray XT3 & XT4: Opteron, 32 bit SSE, etc.

slide-13
SLIDE 13

Workflow and Data Analysis

Automate campaign to combine lattices, propagators

to extract physical parameters. (FNAL Jim Simone & ITT)

Tool Box (shared algorithms / building blocks)

RHMC, eigenvector solvers, etc

Visualization and Performance Analysis

Exploitation of Multi-core Plans for a QMC API (JLab Jie Chen/ Edwards)

slide-14
SLIDE 14

Status of QMP on BG/L

  • based on QMP/MPI code base
  • added --with-qmp-comms-type=BGL option
  • native BG/L point-to-point (send/receive)
  • uses MPI for everything else (collectives)
  • requires barriers (MPI_Barrier) around some

collectives (broadcast, binary_reduction)

  • mostly done -- still needs cleanup & testing &

(more)optimization

James Osborn

slide-15
SLIDE 15

Performance of QMP on BG/L (contiguous quad-aligned buffers)

1e1 1e2 1e3 1e4 1e5 1e6 1 10 100 1000 10000

2.5 1.07 5.49 1.85 7.68 2.3

Ping pong test

2 nodes-MPI 2 nodes-native 8 nodes-MPI 8 nodes-native 64 nodes-MPI 64 nodes-native

bytes round trip time / 2 (microseconds)

slide-16
SLIDE 16

Status of QLA on BG/L

  • previous version had a single 440 asm routine
  • now has a 440d asm version of same routine
  • development version now uses XLC v8 and C99

complex types (along with necessary alignment and disjoint hints) to make use of 440d

  • has passed full testsuite running on BG/L
  • BAGEL routines may still be useful

James Osborn, Joy Khoriaty & Andrew Pochinsky

slide-17
SLIDE 17

Performance of QLA on BG/L (QOPQDP – asqtad inverter)

4^4 6^4 8^4 100 200 300 400 500 600 700 800 900 1000

1 node

  • ld - float

new - float

  • ld - double

new - double

slide-18
SLIDE 18

Performance of QLA on BG/L (QOPQDP – Wilson inverter)

4^4 6^4 8^4 100 200 300 400 500 600 700 800 900 1000

1 node

  • ld - float

new - float

  • ld - double

new - double

slide-19
SLIDE 19

Performance of QMP+QLA on BG/L (QOPQDP – asqtad inverter)

4^4 6^4 8^4 50 100 150 200 250 300 350 400 450 500 550 600

64 nodes

  • ld - float

new QLA - float new QMP+QLA - float

  • ld - double

new QLA - double new QMP+QLA - double

slide-20
SLIDE 20

Performance of QMP+QLA on BG/L (QOPQDP – Wilson inverter)

4^4 6^4 8^4 50 100 150 200 250 300 350 400 450 500 550 600 650

64 nodes

  • ld - float

new QLA - float new QMP+QLA - float

  • ld - double

new QLA - double new QMP+QLA - double

slide-21
SLIDE 21

Software Committee

  • Rich Brower (chair) brower@bu.edu
  • Carleton DeTar detar@physics.utah.edu
  • Robert Edwards edwards@jlab.org
  • Don Holmgren djholm@fnal.gov
  • Bob Mawhinney rdm@phys.colmubia.edu
  • Chip Watson watson@jlab.org
  • Ying Zhang zhang@cs.uiuc.edu
slide-22
SLIDE 22

QLA on Opterons (kaon)

10 100 1000 10000 100000 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 4000

staggered matrix-vector product

pion - C pion - SSE kaon - C kaon - SSE